10 Phase 6: Handover
Finding a new owner
Sometimes this phase is simple: if, for instance, the project aim is insight or the deliverable is a report, then nothing further needs to be done. However for more complex projects this, can be the most difficult phase.
If you are creating a system, remember to deploy the solution according to the plan. It is often not enough to hand over source code as crucial details are often in the deployment. Does your solution scale? Make sure to stress test your application so that it doesn’t crash unexpectedly.
For the long term success of any project, there must be a long term strategy in place. The software will always need to be maintained and depending on what you developed models might ingest new data and the results will slowly start to drift over time. How will the organisation deal with this? Is there a plan in place to maintain these models over time and who will do this?
The concept ‘drift’ explained
Predictive models are trained to recognise patterns and relationships in data and, in turn, to make predictions based on these patterns. Yet the world is not static: data will change over time as new observations come in, and patterns that may exist today may no longer exist in the same way in the future. If a model is not allowed to learn from this evolution in data relationships, model performance will decline over time. In short, your model will be making predictions based on an outdated understanding of the world. This deterioration of model performance is what we mean by model drift.
To combat model drift, models should be monitored for performance changes and periodically re-trained on fresh data. Exactly how often retraining should be performed depends on the dataset in question and is best tested empirically.
Oftentimes those who develop a system are not those who need to maintain it long term. To make sure this goes as smoothly as possible, be sure to write clear concise code, follow recognised coding standards, write detailed documentation and include stringent test coverage. Regarding documentation, it is best practice to write your comments and documentation alongside the actual code. Doing this will mean your documentation is more likely to stay up to date. If documentation is in a different location than the code, then a developer making changes would also need to remember to update the documentation.
Many great tools can generate beautiful looking documentation from comments added in your code. For python, we can highly recommend Sphynx as it is widely used and gives you loads of great styles to choose from. Regarding proper coding style, we recommend you follow PEP8 for python and for R, the R style guide. We recommend you install a linter into your text editor so that your code will be constantly checked against these coding standards. Coding standards make your code far more readable so it is recommended to adhere to them as much as possible, We also find it useful to use a formatter, for python we use Black. This will automatically format your python code to meet the PEP8 guidelines. You will never need to worry about code formatting again as a formatter can automatically add the correct spacing, line breaks and other coding standards. It is, however, important to note that once you use a formatter, you can’t realistically override the formatting and you will need to accept some style changes as is. Every time you run your formatter these changes will be undone so there is no point sticking to your preferred format.
When you reach the end of your project, find the person responsible for maintaining the system and run through how it works. Do a complete handover running through both the code and the documentation, adding comments and notes to clarify points that are not clear to the person you are handing over to.
If handover to an actual person is not possible for whatever reason, for example if a permanent data scientist is not yet hired, then make sure you make your documentation as clear as possible. Do not leave it to chance and write it as if you are writing for a very junior person. More documentation is recommended in these cases.
Make sure the test cases are understood and implemented in such a way that they are run regularly. There are plenty of continuous integration tools that do this for you. This will stop any performance altering changes from taking place. To avoid the negative consequences of model drift, recalculate the model performance on new data regularly.