7 Phase 4: Project definition

If you have followed the steps outlined in phases 1-3, you will have now determined that the project is viable. You will have confirmed that, in principle, the project makes contextual sense and that it is likely to bring value to your stakeholder, and you are now ready to think in terms of the specifics about how you will deliver the work. Congratulations…this represents a major crossroads for any project, and you should be proud of yourself for having had the discipline to work through Phases 1 – 3 carefully. Now you are ready to start your project in earnest!

But hold on! Before you start writing code and delving into the work, you must plan the project – that is what Phase 4 is about. This is the design phase, where you lay out in detail the approach you will take, the steps that will be included and the resources that will be required. We often hear people say things such as, “Data science is research…how can I say how long it will take if I don’t know what I’m going to find?” This is a valid concern – in an ideal world, you would have all the time and money you need. Sadly, this is an example of the difference between the ideal world and reality: if you are working with a client or a manager, you will need to inform that person what will be required in terms of time and money. Can you be absolutely certain that your estimations are sufficient? No. But you can make reasonable estimations with the information you have, and this is where the art of project design comes in.

For the remaining of this chapter we will refer to the main stakeholder as the client, if however you are working within a company than this will refer to your manager.

Designing a project plan and timeline accomplishes a few important things. It helps you to organise your thoughts and think in concrete terms about how you can attack the problem at hand. It gives all of your project’s stakeholders a picture of how the project will evolve and progress and shows them where the important milestones and decision points lie. It also helps the practitioners break the larger body of work into smaller, more manageable tasks. In short, it helps everyone understand the deliverables and the course of action. Your plan does not need to be cast in stone – changing a plan can always be done, but operating without a plan will invariably be slower than operating with one.

This phase uses a different type of thinking than is normally found in scientific research, so it can be especially difficult for those data scientists who come from research backgrounds. While research generally uses deductive reasoning (the reasoning of logic) and inductive reasoning (the reasoning of forming general rules from observations), the synthesis required for design thinking uses intuition, or more accurately, abductive reasoning – the reasoning of developing one of several possible solutions for a problem. More specifically it is seeking the simplest solution to a problem without any formal validation. For a more detailed explanation of how designers think, we recommend the book Design Thinking by Nigel Cross. For the purposes of this writing, what is important is to understand that this relates to the process of identifying a set of required functionalities or purposes and crafting a solution that satisfies those purposes.

What is abductive reasoning?

Abductive reasoning starts with an observation or set of observations and then seeks to find the simplest and most likely conclusion from the observations. This process, unlike deductive reasoning, yields a plausible conclusion but does not positively verify it. Abductive conclusions are thus qualified as having a remnant of uncertainty or doubt, which is expressed in terms such as “best available” or “most likely”.

For many, project design is difficult, ambiguous, stressful and downright unpleasant. Indeed, this phase requires you to commit to delivering a given bit of work in exchange for a certain budget, and you want to get it right – who wouldn’t find that responsibility stressful? You inherently can’t know exactly what it will take to deliver a successful outcome because that insight only really comes from the work itself. But if you have done a good job of scoping the project in Phases 2 and 3, then you should have a fairly good idea of the complexity of the task in front of you.

7.1 Developing a project plan

We suggest you begin by thinking at a high level. Ask yourself questions such as: where are you starting from, where do you want to be when the project is completed, and what are the logical steps you need to take to get there? Consider what functionalities are required, and which ones would be nice to have but are not critical? What are the novel concepts and approaches you will have to develop? Are there logical intermediate steps along the way? Does something depend on something else being completed first?

In most cases, we break a project down into stages with aims and milestones. We find it useful to clearly define what work will be carried out in each stage and what will be delivered at the end of it. Sometimes a “deliverable” is merely a short report, a small presentation or a conversation with your client. In other cases, it can be a piece of software or a tangible functionality that can be demonstrated. Exactly what the deliverable at the end of each milestone looks like is project-dependent.

In the best cases, each milestone in itself is a valuable step forward for the stakeholder. This de-risks the entire project. For example, a project may be aimed at building an interactive data analytics dashboard with four milestones along the way. Milestone 1 may be an in-depth exploratory analysis that can yield important insights about the data. If their project were to end there, the work would still have brought value. While this is not always possible, we suggest you keep this in mind as something to aim for when crafting your project timeline and stages/milestones.

Throughout this process, it can be useful to consider pivot points or alternative outcomes. For example, many projects have milestones that are inherent points for decision-making. If that is the case for your project, be sure to communicate that clearly in your project plan. When making these decisions, try to balance flexibility and open-mindedness with a clear view of business value.

In the earlier scoping phases, you will have identified the project requirements in terms of outcomes and deliverables. If these include data products that require engineering or deployment (as opposed to projects primarily focused on the generation of insights and models), you will need to include corresponding milestones for these steps as well. Data cleaning, code refactoring, optimisation and productisation can be time-consuming, as can deployment of your product and the development of sound ETL pipelines – be sure to budget for these in your project plan and think carefully about what a sensible development plan looks like.

7.2 Skills/expertise required

As you plan your project, the required technical and non-technical skills should start to become clear. Many of these will be general data science skills, such as coding ability, familiarity with machine learning algorithms or a knowledge of how to make compelling and informative data visualisations. Others are more specialised, such as natural language processing or graph theory. And yet other skills are more niche and therefore harder to come by, such as domain-specific experience. For example, Jon recently worked on a project that required an in-depth knowledge of how the three-dimensional geometry of a material affects the shape of electrical waveforms; most data scientists (including Jon) will not have this particular experience! Ensuring you have the industry-specific experience to deliver the project or make sure that you have willing sponsors within the company who will set aside time to work with you on understanding any industry quirks and nuances.

Your approach to meeting the skills required for a project is, of course, dependent upon the composition of your team. If you are designing a project that you will work on individually, then it is up to you to make sure you have the skills required in your repertoire. If you don’t, you may need to consider bringing in outside help or budgeting for the time it will take to level-up your shortfall. Often projects are designed for teams of data scientists, in which case you will want to make sure that the skills required are matched collectively by the team. Data science, like software development, is often a team sport.

“If you want to go fast, go alone. If you want to go far, go together.” – African Proverb

7.3 Determining the cost

You will almost certainly have to put a price tag on your project. For us, this is essentially a calculation based on how long we think the project will take and the billing rate for the staff. Often data scientist bill on a per-day basis based on their experience: “junior”, “mid-level” and “senior” data scientists each have different rates. In our approach we focus on the project duration and the skills required, and then calculate the cost of the project as a whole. For instance, we may decide that a project requires a considerable amount of data wrangling and NLP, and that a team consisting of a junior data scientist (for the data wrangling) and a senior NLP expert for 5 weeks is likely to be a good formula. We would then calculate the project cost based on the day rates of these two individuals. We prefer this method to one in which we try to fit the project duration or staff rates into a pre-determined project budget: it keeps us more honest and shields us from subconscious biases that could cause us to build the delivery around the price, not the other way round.

As discussed above, getting the right project cost can be difficult. If your estimate is too low you may not have the time needed to complete the work; if it’s too high, you risk losing the contract. The latter case is especially relevant if you are competing against other providers, for example in the case of a proposal written in response to an RFP (request for proposals). For other projects, the cost factor might not be important: internal project within a company often overrun without any negative consequences. For those to whom the cost factor is important, we don’t have a magic formula that tells how to strike the right middle ground. What we can do is highlight some considerations that we find helpful in the process and offer our reassurance that it gets easier to make this judgment as you become more experienced.

Breaking the project into milestones as described above can help: it’s easier to estimate the time required for smaller tasks than for larger ones. Understanding the supporting data will also help, which is a major reason for the scoping work described in Phase 3. This is also a place where having a solid network to turn to for advice might be very useful. At Pivigo, for example, our data team discusses each project plan that is being written to get as many different points of view as possible. If you have access to a network of peers, going through such a sense-check can be a very valuable process. Experience is key, so it may be useful to look at different case studies or to ask colleagues about similar projects and how long they took.

It can also help to anticipate places where the work is at a higher risk for delays. For example, you may have made assumptions about the data structure based on your scoping work, only to find out that some of these assumptions are not correct. The data coming in may have changed in structure or location, or unanticipated factors may have worked to introduce more missing values than you expected. You can’t anticipate every possible roadblock, but if you can identify places where your progress is most vulnerable to problems, you can make a conservative estimate.

We also encourage you to consider the state of your relationship with your client when budgeting for a project. Have you worked with this company or person before? If you have, and if the project was successful, you probably have earned a degree of trust that can go a long way in convincing them that your proposal is reasonable. In contrast, if the business is new to data science, or new to you, you have yet to earn their trust and may want to be more cautious in how you cost your project. Similarly, if you feel that your client has the potential for being demanding and hard to please, it may be a good idea to err on the side of caution by budgeting a bit generously. In these cases, the costs of underestimating the required time are high.

On the other hand, if the project is exploratory, if your client is just looking to see the “art of the possible” or if your project is restricted to creating insights, you may feel a bit braver and choose a slightly lower budget: the costs of underestimating the time required are lower. Or, to think of it in another way, even if you only accomplish 95% of what you would have liked, that is still very valuable for the business.

The table below highlights some of benefits and risks to over-budgeting or under-budgeting your project.

Pricing	Potential_benefits	Risks
Under-budgeting	Your proposal may be more attractive to the stakeholders. The potential to show your client good value-for-money Get a “foot in the door”/opportunity to prove yourself	Your client may be skeptical that you can deliver You may not be able to deliver You are vulnerable to expected problems Setting the expectation too low for the follow-on work (assuming you get it)
Over-budgeting	Extra time padding can give you a greater chance of success Greater chance of overdelivering and delighting the client Greater chance of more work, due to happy client	Your price may be too high/losing the work

It can also be useful to think about aspects of the project that are “must-haves” versus those that are “nice-to-haves”. At the very least, your budget should give you enough time to safely deliver the must-haves. Taking a slightly more risky approach to the nice-to-haves may be more aligned with your client’s or manager’s appetite. This is especially true if your client is on a tight budget. In such cases, an approach we often take is to write a proposal with several costs: a cost for the essential (must-have) work and additional add-on costs for the nice-to-have bits. We find this resonates with clients who are nervous about spending a lot of money on a project that has yet to show good ROI (return-on-investment). We also recommend treating your proposal as a step in a back-and-forth conversation with the client: invite the client to comment on the content and the plan, and be open to the possibility of changing the plan if the client is not comfortable with the initial version. While you may not be willing to make sacrifices in your rate (and generally you should not be), you can adjust the scope of the project to better align with your client’s needs, wishes, concerns and budget. This will not only help you to find the right balance for your client, but it will also help to build trust by showing that you are trying to work with them to produce something that is useful and has good value-for-money.

We also encourage you to mention any other costs that the client may incur. For instance, if your work requires the use of a virtual machine or the creation of a database that will be hosted on the cloud, these are costs that the client will want to know about. Hosting your solution can also bring with it security concerns and other maintenance costs or complications. Be sure to be as thorough as possible when breaking down the costs to your client. This will help them to understand the total expense of the project. It will also help in your efforts to build up a trusting relationship with them. Or, to put it another way, surprising your client with unexpected expenses can undermine the trust that you are striving to build. In short, our advice is to view your work with your project stakeholders as a relationship that has to be built and nurtured, and do your best to approach it with empathy for all the people who are involved.

7.4 Managing the project

At this point, you have a project plan and you have accounted for the technical skills that will be needed to bring it to fruition. However, a final consideration is still outstanding: how will you manage the project?

As above, if you are working on a project independently, then this is probably a fairly easy question to answer. If you are designing the project for a team, then planning for the project’s management will be critically important. Either way, we suggest that you take some time to consider exactly how you will work and how you will interact with your client.

Project management is a large field – we cannot give it justice in this small section. When done well, a project flows smoothly and has a clear road-map and an efficient system for sharing the workload. Everyone is happy with the work and they all feel like they are contributing towards a common goal. When done poorly or not at all, a project can stagnate and become aimless, deadlines are frequently missed and the project outcomes may not align with the project goals. Good project management cannot turn every project into a masterpiece, but it can go a long way in keeping a project on-track, focused and successful.

For some teams, a scrum approach can be useful, although you should bear in mind that this is designed for software development and there do exist significant differences between this field and data science. In our work, we tend to adopt an agile methodology (Scrum or Kanban), in which we work in discrete sprints and organise our tasks in the form of concrete issues. Other approaches also exist, such as the more traditional waterfall model. Tools such as Kanban boards and Gantt charts can be very helpful in planning out your project and breaking down the major phases of the project into tangible, bite-sized pieces. Even if you are a team of one, we have found that the formality of organising work in this way can be very helpful in keeping the focus on project priorities.

In addition to planning out how you will work and communicate with your team, you should also make a plan for how you will interact with your client. You should consider how you will communicate on a day-to-day basis (we have found Slack to be very useful) as well as how and when you will give progress reports and project updates.

We also encourage you to think about how you intend to structure your codebase. We generally use Git and GitHub for version control and code sharing, and we would encourage you to build your repository’s directory structure before any code is added. We highly recommend the “cookiecutter” family of project templates and have found the Python-based data science template to be a good fit for most of our Python-based projects. It includes a guide to best practices for directory structure and naming conventions and contains a built-in make functionality that can be useful for managing the steps required to build the requisite datasets and functions for your project. For R-based projects, one option is the ProjectTemplate, although others also exist. At the minimum, we encourage you to work within the framework of an R Project.

7.5 Evaluate the plan

If you have followed the steps above, you will now have defined your project. You will have a project plan that includes milestones, a timeline, potential pivot points and clearly-defined deliverables. You will have determined the composition of the team needed to execute the project and have thought about how the team will work together and liaise with the client. You will have also created a budget for the project based on the time required, cost of staff and any other costs required. You probably can’t wait to send your proposed plan off to the client and get to work!

But before you do, we encourage you to look back at the four levels of project evaluation outlined previously. Think about the business case and the larger context of the project in relation to the business strategy, and ask yourself if what you have planned will be valuable to the client. If you are not convinced that it will be, you may need to reconsider your plan.

As a parting piece of advice, we suggest that you take the time to make a short list of ways that the project is likely to bring value and to add a sentence or two to your proposal highlighting this. While it may seem obvious to you, and you may feel that it should be obvious to the client, it can help to remind them about why your project will have a positive impact for their business and reassure them that this proposal is a worthwhile investment.

Bear in mind that if your client is a company, the person you have been liaising with may not be the final decision-maker or the one who controls the money. The proposal you write may be passed on to executives in the company who you have never met and who don’t have any understanding of the project. Explicitly explaining why your proposed project is likely to bring value to them on both the business and contextual levels will make it easier for that person to say “yes”.

7.6 Closing thoughts

Designing a project plan can be hard work, but we strongly encourage you to take the time to do it well. Many early-career data scientist struggle with the fact that the process may be loosely defined. This can be a struggle for anyone – sitting down at a blank sheet of virtual paper and figuring out what to say is not easy. To help get you started, we have included an example project proposal in the next chapter, showing one way we might structure our proposals and project plans. We hope it helps!