Is Data Mesh the answer to the problem of scale and speed in the Analytics & AI arena?

We @Kaito run internal learning events called brown bag lunches. This time I took the initiative to talk about a concept I have been thinking and researching on for a while – Data Mesh. I wrote this blog post to highlight some of the topics discussed on our internal session. 

So, let’s demystify some foundational aspects of Data Mesh! 

Data Mesh’s objective is to solve the problem of scale and speed which the world of analytics and AI is facing empathetically and ultimately the organization becomes data driven and can compete and thrive on data. It is an evolving and developing concept. It is widely recognized that there are fundamental problems in scaling data use across complex organizations. Companies are investing heavily and focusing their strategy on data, but the results seem to hit plateau. Data Mesh concept tackles these problems in principally different and ambitious way. Data Mesh concept is coined by Zhamak Dehghani who proposed the approach in her book Data Mesh – Delivering Data-Driven Value at Scale 

It is important to acknowledge that the concept is meant to solve problems that data initiatives face in complex and large-scale environments. The approach proposes fundamental changes at organizational level and might be excessive step for many smaller organizations. There are still many interesting concepts in Data Mesh worth exploring for everybody. 

Data-Mesh-demystified-2

Data Mesh is addressed differently in different contexts, at times as a socio-technical approach, an enterprise data strategy, analytical architecture, or an operating model. And all these are correct!

Data Mesh - A nudge

Data Mesh is essentially the nudge to think of data differently, treat it differently, share it differently, question the existing paradigm, question current nomenclature, and this takes us to a different trajectory.  

Data Mesh questions some of the existing nomenclature in the data world. E.g., the current use of water analogy associated with data like data lake, lakehouse, data swamp, pipeline, data flow etc. We love to use these analogies to simplify things and making them softer and more familiar, there is however hidden power in names and analogies. Therefore, it is sometimes good to question them. 

Let’s check on some nomenclature shifts from other disciplines. A few years ago, more and more organizations in Finland started to use the word “esihenkilö” (Manager/Supervisor who is a person) rather than “esimies” (Manager/Supervisor who is a man) . Similarly, disciplined agile delivery focuses on the term “consumable product” rather than “shippable product”. The difference between “consumable” and “shippable” can sound small, but in practice can mean everything in how successful the implementation is. The difference between a shippable and a consumable product can be for example just a user guide in a situation and yet without a user guide even the best product can be just non-usable.  

Now coming back to the water analogy, because of use of this, we are psyched to think that for the data to provide insights it must travel from point A to point B. Data Mesh approach questions this. It suggests as another alternative, insights could be made possible at the point of origin where data is produced, because there the data is best understood.  

It questions the existing model of two separate planes, operational and analytical connected via fragile pipelines as it serves varied sources and use-cases. 

Data Mesh - A Socio-technical approach

As this nudge takes us to a new trajectory, there emerges a decentralized socio-technical approach. Decentralized is discussed soon, but why should we call it a socio-technical approach when it solves the problem of the technical world?  

Because it solves it empathetically which results in elevated user experience at each touch point while it also solves the challenges of analytics and AI world. This approach has solution to the problems which each role faces. Currently HR manager finds it so hard to hire and retain expensive data engineers. Data scientists spend approximately 25% of time cleaning and preparing the data sets. Data developers with generalist skills and the capability to contribute to building business transformations, but the ever-changing long list of the tech stack makes it hard for them to get into the market. Data leaders are struggling that data culture is still elusive despite huge investments. Data mesh approach solves these challenges which people are facing. Hence it is a socio-technical approach.  

Data Mesh – An enterprise data strategy, an operating model 

Data Mesh can be used as an element of enterprise data strategy to articulate the target architecture and operating model hence it can be addressed as strategy, architecture, and operating model as well!

There are common failure symptoms in large and complex organizations that appear when the centralized approach to scaling data initiatives start to crumble. There are also specific point solutions and ideas to address many of these, but the argument behind Data Mesh is that paradigm shift is needed.

Failure-cases

Data Mesh – A socio-technical paradigm shift is needed.

Socio-technical

Organizational shift from central ownership to domain ownership:  

Ownership and accountability shouldn’t rest with the central data team who processes it. Ownership and accountability of data should shift to where data originates and is known the best. To the domain that has most influence on it. The domain can be source aligned, aggregate or fit-for-use purpose domain.  

Architectural shift from monolithic platform towards distributed mesh:

To support the organizational shift, architectural shift is needed to move away from collecting data in a monolithic platform to enable connecting data through a distributed mesh of data products through standardized protocols for higher insights.  

Technical shift from data as a byproduct of code to data and code as one autonomous unit:

Future solutions should treat data and code as one autonomous unit and this technical shift not only supports the organizational shift intended but also helps build the futuristic visions of the mesh experience plane which Data Mesh approach envisions.  

Operational shift from manual governance to federated computational model:  

Shift from the current top-down centralized manual approach towards governance to federated computational one. This result is not only automation of the process but the dynamic and timely updating of governing rules during execution time where the policies are decided by a qualified and right set of people.    

Principal shift from data as an asset towards data as a product:  

One of the most fundamental shifts behind the Data Mesh concept is the recognition that we shouldn’t think data as an asset and move towards thinking data as a product that can be shared and consumed for user delight. Here again there is problem with nomenclature. Assets are collected and kept but products are shared, enhanced to solve worldly problems. 

Infrastructural shift from separated analytics and operational world towards integrated approach: 

To bridge the gap between operational plane and analytical plane integrated infrastructure capability platform is pivotal. Infrastructure also caters the domain agnostic cognitive load and automates services so that domain can operate autonomously without handholding.    

Paradigm shift need to be supported with strong principles

The-four-principals-of-Data-Mesh-2

Data Mesh proposes a major shift from traditional way of organizing data - domain ownership. Shifting ownership to the specific domains that understands the data and business processes behind it sounds logical but leads to challenges that must be solved. Decentralized approaches are always more complex and that why they need to be supported with sound principles to make them work. 

The four principles are

  • Domain ownership
  • Data as a product
  • Self-serve data platform
  • Federated computational governance

Data Mesh is based on four founding principles, but it is essentially one principle and others are there to support that one principle. From paradigm shift, principally, shift is from data as an asset to data as a product. If data must be thought as a product, then it must be treated as a product. Domain-driven design approach of software development focusing on the area that the product references has greatly benefitted product development and has successfully tackled complexity issue.

Hence this approach has been adapted and generalized in the context of Analytics and AI by Data Mesh approach which resulted in the first principle “Domain Ownership”. With the increase in complexity and number of domains which is natural in complex real-world scenario arises concerns of accessibility and usability. To tackle this comes to the rescue the principle of data as a product. To qualify to be called a data product it must be discoverable, addressable, understandable, trustworthy, natively accessible, interoperable, valuable on its own, and secure. 

Decentralized approach with clear goals and boundaries sounds great, but at the same time really challenging. To make this ambitious approach possible we need a lot of automation and support of technology.  

Governance and security being important aspect for any successful data platform inclusion of federated computational governance principle helps this distributed domain architecture have an operating model so that domains have autonomy and agility and yet capability of global interoperability of the mesh.    

The approach Data Mesh has taken towards current data world problem, I call it as a perfect example of Leader’s approach.The difference between leadership and management is leadership is doing what is right and management is doing it right way.

Data mesh approach is a paradigm shift and expects organization to move towards distributed domain architecture, laid down upfront a list of characteristics of a data product to be on a mesh, brought the right set of team together for governance upfront, approaches the change empathetically for all users at each touch point by reducing the cognitive load etc.  

This will not only elevate the user experience, enhance ROI, reduce the friction points, but also bravely pointed out the right things that should be in place even though it needs futuristic tools and platform to enable it.  

Credits : Data Mesh – Delivering Data-Driven Value at Scale by Zhamak Dehghani