Maximizing ROI in data science

We had the pleasure of presenting a joint speech at the Data Innovation Summit in Stockholm about a month ago, where we discussed the opportunities and challenges of maximizing ROI in data science initiatives. It was an incredible experience to present in front of a packed audience at one of North Europe's top data conferences. Continuing the conversation, here's our attempt to delve a bit deeper.

Lasse and Fernanda at DIS 2023

The Woe

This is where disappointment in AI/ML comes from

In the middle of the excitement over generative models, it might be forgotten that the bread and butter of ML/AI still consists of the analysis of tabular business data. While advanced analytics have been applied across businesses for quite some time already, it is not guaranteed that such projects have been or will be successful, let alone profitable.

Unfortunately, it is not clear what is the reality about the success rate of data science projects. In 2019, Gartner predicted that through 2022, only 20% of analytic insights will deliver business outcomes. In contrast, the Data and Analytics Leadership Annual Executive Survey 2023 (by NewVantage Partners) states that in 2022 over 90% of data leaders claimed that their companies had delivered measurable business value. Whatever the truth might be, the relevant question is what are the most common factors leading to AI/ML projects not being cost-effective?

In general, successful projects rely on good design as well as project management. If a project is not well scoped, with clear and attainable goals, it is much harder to steer the project in the desired direction. Data science always involves some degree of exploration, and findings can lead to a change of scope and direction, which calls for constant expectation management.

Miscommunication between users, developers and other project stakeholders can waste weeks, if not months, of development efforts. Not including users early and often in the development process can lead to unclear business needs, lack of prioritization, erroneous assumptions, and too large deliverables. As a result, the risk of project failure is high.

Ambitious team members may struggle to focus on creating smaller, testable solutions instead of complex, over-customized code. Sometimes, working together with users can mean being less efficient by not using every minute of the workday developing, but will minimize the overall risk of failure.

Below, we will address in more detail what can be done to improve the cost-effectiveness, quality as well as minimize the risk of AI/ML projects.

The Way

This is how to improve efficiency

DRY becomes exciting

Successful ML-projects are built on a solid foundation, provided by a data platform. When starting from scratch, ca. 80% of time is spent wrangling with the data. If a data platform can serve the relevant data in a standardized format, this wrangling time can basically be factored out from project costs. Besides this, errors in data quality may have already been identified and fixed at the source, which will translate to model accuracy.  

The data platform can contain a dedicated analytics data store, or a feature store built on the data platform. This data store should support the publishing of data sets for model training and scoring in production (either batch or online). What goes into the published data via feature engineering is of course subject to iteration, so it is likely that not all data related work can be eliminated.

In data science it is not possible to fail fast without having easy access to relevant data in a usable form. However, we need more than that to be able to show value early. 

Data manipulation, model training, evaluation, deployment, and scoring—steps needed in virtually all ML projects—are typically handled with code. While basically every project has some peculiarities needing custom logic, most of the code should be standardized and reusable. This not only saves time, but also facilitates efficient collaboration and division of labor as well as improves the quality of the solution (with minimal room for error). There is a good reason why DRY is a fundamental principle in software development.

There are numerous ML platforms and several open-source libraries available for setting up an MLOps framework (see e.g. https://mymlops.com). Sure, industry/use-case specific logic is most likely needed, but there is really no excuse for not getting reusable ML pipelines in place. It is mostly about making decision about the approaches and technologies involved.

Eventually—in order to make large scale automation possible, this standardization has to be done. The time from idea to value is much faster with a good framework (involving MLOps). This is only way to really make sure that even those projects with small business-cases will give reasonable ROI.

We need to KISS more

Fair enough, there is nothing new in the KISS principle (dating back to the sixties). Still, it is very hard to keep in mind when conducting a cool data science project. Fancier and more advanced methods constantly pouring in, and it can be difficult to resist temptation to not to try them all. Sometimes innovative approaches are needed, but in most cases, it is not worth the time and effort.

After meeting the requirements set to a solution/end-product, the focus should be that the means are as simple as possible. Increasing complexity hinders maintainability, increases cost of operation, and reduces transparency. Transparency, in particular, should not be overlooked as it is an important element in frameworks that guide the trustworthiness of AI/ML solutions.

User-Centric design is not only for consumer products

Like other software products, the user needs to be kept at the center when defining the requirements, deciding how to meet them, and selecting how to present the results in AI/ML solutions. If the user does not understand how and why the solution works, they are unlikely to trust it and use it.

In a recent project, we were doing forecasting analytics. The users wanted to use a specific way to measure accuracy which did not reflect how the project was progressing. However, other measures meant we would not get enough feedback as they would not understand them. It was important to find common ground so we could keep communication channels open to discuss new things discovered and setting the next direction.

Given the likelihood of failure, this communication channel was even more important, as we could constantly test our assumptions. This helped us to fail quickly, and in other cases avoid moving in the wrong direction at all. When certain predictions of the model were way off, business users were able to point out quite quickly potential reasons, giving us a shortlist of areas to explore. This would have never happened had we focused on building something cool instead of building something users could quickly adopt.

The Wisdom

Summary & take-home messages

To get measurable business value from data science projects, failing fast is instrumental. A streamlined framework for testing the viability of new ideas and taking them into production both accelerates time to market but also makes less profitable use-cases worth pursuing. At best, everything is designed around the fail-fast principle.

Investing into the right resources pays off. Everyone has probably seen a Venn-diagram illustrating that good, fast, and cheap does not exist. However, we claim that with good support, this trilemma can be achievable. In addition to a solid framework and a proper data platform, the right know-how and skilful people are also needed (probably the hardest one to come by).

User centricity is key to success: an amazing solution that no one uses is just a passion project (and probably not that amazing). Efficiency does not come from building chunks of complex solutions, but multiple smaller releases users can quickly adopt, test, and give feedback on.

In a nutshell, our message can be crystallized in the following points:

  1. Design for early ROI, by promoting quick prototyping (fail fast)
  2. Manage expectations with communication and collaboration with stakeholders/end-users
  3. Define clear fit-for-purpose and attainable goals to guide project execution
  4. Invest in the right resources so that skilled people can concentrate on their strengths
  5. Adopt/Develop a standard framework for AI/ML to avoid repetition and promote quick prototyping