A glimpse into our internal Data Science Roundtables – case demand forecasting

A company with as great a deal of data science professionals as we do, demands professional development support accordingly. What we have to offer for our advanced analytics enthusiasts in terms of events are the monthly data science roundtables, as well as quarterly open meetups. The aim is to provide a relaxed and comfortable atmosphere where employees can spar together, share thoughts and ideas, learn, and grow together.

Our monthly data science roundtables are casual discussions on Friday afternoons with drinks in our hands. Every enthusiastic employee gets a chance to come forward with a topic from their area of interest and act as the discussion moderator. So far, we have seen walk-throughs of best practices, challenges, lessons learned from previous projects, and demos of data science solutions in tools such as Dataiku and Snowpark. When it comes to methodologies, this is an opportunity to discuss intriguing algorithms, solutions for common problems, well-known novelties, or what are the characteristics of a superb solution.

Moreover, in the quarterly open meetups, we are featured by external presenters. In fact, during the spring we are getting the technical professionals from Databricks to tell us more about their tools. And a week ago, we had a similar event with Snowflake about Snowpark.

The discussion topics for these events stem from our people and the communal areas of interest. It is an awesome thing to see co-workers’ proactiveness in initiating conversations on relevant topics and taking them up to the roundtables. This gives a chance to get to hear new and valuable insights. So far, I would say we have met our goals with the roundtable discussions. We can see an upward spiral in people contributing to their topic ideas and taking interesting discussions further under the flashlight to be discussed in depth with interested co-workers.

It is great that our Kaitonians have been active with these events. I feel like part of the reason is that we tried to make it as appealing as possible for people to take responsibility for a particular month’s roundtable discussion. The context was aimed to be set in a way that a well-polished presentation for the event is not needed, but the event should be a low-effort hosting of a conversation, as people are busy with customer work as it is. I believe this lowers the threshold for people to come forward with their favorite topics. Further, the appeal possibly comes from the social aspects of the event. In the era of Teams-meetings and remote working, in-person laid-back encounters with like-minded people can feel quite refreshing and boost on-site working.

Demand Forecasting

Recently I also got a chance with my co-worker Jonas to take our topic up for a discussion. We discussed the challenges of time series data and some aspects of creating forecasting algorithms for it. This included discussions of data granularity and dealing with the possible intermittent nature of data on a high granularity level, as this trait makes some popular forecasting algorithms unsuccessful.

For example, when creating forecasts on a monthly versus weekly level, one should take a deeper look into the mathematics of the forecasting algorithm. On a similar note, outlier and missing value interpolation become a vital issue also with time-structured data. It’s essential to understand to what extent the used algorithms can solve this by themselves and when can and should the ingoing data be modified pre-forecasting to achieve the desired conclusions from the output. As there may lie a trade-off between the desired level of resolution and the desired level of uncertainty for the forecasts. For example, we took up the topic of mitigating accuracy drop on longer horizon forecasts with windowing functions or group-level forecasts, the latter of which would be one of the many ways of using hierarchical structures of time series.

Other interesting problems that we brushed on include creating time series forecasts with little or no temporal data, which can, for example, be solved with various related deep learning algorithms. This is another aspect that highlights the importance of method selection and the differences between using univariate regression models with external features and multivariate time series algorithms.

Rarely do these univariate temporal algorithms have inbuilt possibilities for applying the model for predicting the very beginnings of the data. On the other hand, their strengths are their simplicity, which often leads to explainability and maintainability, as the algorithms don’t have such complex structures. But with a univariate input and output, they might lack scalability in the case of having multiple forecastable time series.

What we discussed that also demands extra consideration with temporal data is the validation. The time series cross-validation has a very different structure than for non-temporal shaped data, at least in the case of many of the univariate regression models, where the data points cannot be randomly put into the validation set. This leads to a limited version of cross-validation than what would be accessible without the temporal nature. So, to what extent does the time-series cross-validation reveal the possible overfitting, as the accuracy of the folds varies, as the amount or temporal location of training data varies? Also, considering the properties of temporal data with many zero values, a validation metric that might work well on a low granular level might be lousy on a higher level.

Related to the extensiveness of validation, we considered the overall communication of the data science solutions results to the end users. Associated with this topic, another co-worker of ours, Lasse, wrote an insightful blog post you should definitely check out. There he discusses how the quality of a data science solution can be measured.

So ultimately, I found this demand forecasting roundtable to be a success. As data scientists and analytics professionals, meaningful and inspiring discussions between peers are crucial for our development, and we encourage everyone to attend and make the time for them. Learning from each other makes us more effective, as getting into exciting discussions with your peers is inspiring and fun. The people we are surrounded by in our work are tremendously important. As employees of all talent areas join the meetups, we have had great discussions across the role barriers and through the workforce of all levels. When an issue arises, it is a great way to learn when you know whose brains to pick.