How to Deliver Successful Data Science Consulting Projects | by Hans Christian Ekne | Jul, 2024

0


Given the above similarities and differences between data science consulting and other classes of consulting, it is natural to ask how we might adapt our approach to ensure the long term success and viability of our projects. Apart from the obvious elements such as quality deliverables, timely project delivery and strong stakeholder management, what are the other components that need to be in place to succeed?

Ensuring Robust Data Products

While management consulting typically focuses on immediate organizational changes and one-off deliverables, data science consulting requires a long-term perspective on robustness and sustainability. This has a couple of consequences, and you can and will be judged on the continued performance of your work and should take steps to ensure you deliver good results not just at the moment of handover, but also potentially for years to come. (This is similar to IT consulting, where ongoing performance and maintenance are essential.)

For instance, I’ve built data products that have been in production for over 6 years! I have seen the direct effects of having data pipelines that are not robust enough, leading to system crashes and erroneous model results. I have also seen model variables and labels drift significantly over time, leading to degradation of system performance and in some cases completely wrong insights.

Image by the author using DALL-E

I know that this is obviously not the most sexy topic, and in a project with tight budgets and short timelines it can be hard to make the argument to spend extra time and resources on robust data pipelines and monitoring of variable drift. However, I strongly compel you to spend time with your client on these topics, integrating them directly into your project timeline.

– Focus on long-term sustainability.
– Implement robust data pipelines.
– Monitor for model and variable drift continuously.

I have written about one aspect of data pipelines (one-hot encoding of variables) in a previous article that aims to illustrate the topic and provide solutions in Python and R.

Documentation and Knowledge Transfer

Proper documentation and knowledge transfer are critical in data science consulting. Unlike analytics consulting, which might involve less complex models, data science projects require thorough documentation to ensure continuity. Clients often face personnel changes, and well-documented processes help mitigate the loss of information. I have on multiple occasions been contacted by previous clients and asked to explain various aspects of the models and systems we built. This is not always easy — especially when you haven’t seen the codebase for years— and it can be very handy to have properly documented Jupyter Notebooks or Markdown documents, describing the decision process and analysis. This ensures that any decisions or initial results can easily be traced back and resolved.

– Ensure thorough documentation.
– Use Jupyter Notebooks, Markdown documents or similar.
– Facilitate knowledge transfer to mitigate personnel changes.

Building End-to-End Solutions

Building end-to-end solutions is another key consideration in data science consulting. Unlike analytics consulting, which might focus on delivering insights and reports, data science consulting needs to ensure the deployability and operationalization of models. This is similar to IT consulting, where integration into existing CI/CD pipelines is crucial.

I’ve seen companies waste years from the development of a model to its production deployment due to personnel changes and unfinished integration tasks. If we had insisted on seeing the project through to full production ready status, the client would have had the full benefits of the model much earlier than they ended up doing. This can be significant when project costs can be in the millions of euros.

– Build deployable models.
– Ensure operationalization.
– Integrate into existing CI/CD pipelines.

Visual Artifacts

Including visual artifacts, such as dashboards or widgets, helps demonstrate the value created by the project. While management consulting deliverables include strategic plans and assessments — usually in the form of a one-y power point deck — data science consulting benefits from visual tools that provide ongoing insights into the impact and benefits the solution has. These artifacts serve as reminders of the project’s value and help in measuring success, similar to the role of visualizations in analytics consulting.

One of my most successful projects was when we built a pricing solution for a client and they started using the dashboard component directly in their monthly pricing committee meetings. Even though the dashboard was only a small fraction of the project it was the only thing that management and the executives in the company could interact with and thus provided a powerful reminder of our work.

– Create visual artifacts like dashboards.
– Demonstrate project value visually.
– Use artifacts to measure success and stay relevant to the client.

Evaluating Organizational Maturity

Evaluating organizational maturity before building the project is essential to avoid over-engineering the solution. Tailoring the complexity of the solutions to the client’s maturity level ensures better adoption and usability. Always remember that when you are finished with the project, ownership usually shifts to internal data scientists and data engineers. If the client has a team of 20 data scientists and a modern data infrastructure ready to integrate your models directly into their existing DevOps, that’s amazing, but frequently not the case. Consider instead the scenario where you are developing a tool for the company with 20 employees, a fresh a data scientist and and over worked data engineer. How would you adapt your strategy?

– Assess organizational and analytical maturity.
– Avoid over-engineering solutions.
– Tailor complexity to client readiness.

Following Best Practices in IT Development

Following best practices in IT development is becoming increasingly important and often required in data science consulting. Unlike analytics consulting, which might not involve extensive coding, data science consulting should stay true to software development practices to ensure scalability and maintainability. This is similar to modern IT consulting, where writing modular, well-documented code and including sample data for testing are essential practices.

This also ties back to the previous point around documentation and knowledge transfer. Properly documented and structured code, packaged into easy to install software packages and libraries is much easier to maintain and manage than 1000s of lines of spaghetti code. When personnel changes occur, you will be in a much better spot if the code has been properly developed.

– Follow IT development best practices.
– Write modular and well-documented code.
– Include sample data for testing.

Leave a Reply

Your email address will not be published. Required fields are marked *