Data Platform Automation - from art to industry

Data is now widely used in almost any function and in all industries, but many decision makers are not really confident in the data they use. Nor that they can derive actionable information from it.

The value of a data platform is in transforming data into information that the business can use.
Trust in the information supporting enterprise wide decisions is essential.

We guide our customers to achieve trust in their information by transforming the raw data into information in a rigorous, audit-able way, while being agile and capitalising on the existing analytical skill in the company.

A 2020 study by Trifacta [1] found that 75% of C level executives are not confident in the quality of their data.

This is concerning because of the already wide and ever increasing use of data. 
Having this problem despite the huge amount of investment in data platforms means that just throwing money and time at it is not enough, and a radical shift needs to be done.

We propose to shift data platform development from art to industry, very much like the general software development industry has, in the last 10 years, with the DevOps movement.
You can call it DataOps if you like buzzwords.

The signature mark of modern software development is the adoption of automation and best practices to put the development team at the center and enable the team components to focus on delivering the solution and guaranteeing its robustness, while minimising the work on menial tasks.

Our data platform automation development, based on open tools and cloud database like Snowflake, incorporates the following enablers for an industrial level result:

  • Methodologies and best practices
    • Sound SW engineering process
      (GIT, PR & code reviews, Continuous Integration)
    • Data Vault 2.0 open and widely adopted standard,
      proven at multiple scales from government to corporations
    • Web based and/or local development,
      to accomodate different type of users
  • Process Automation
    • From development process (branch & PR)
    • to continuous integration (PR merge)
    • and production deployment (one click / API)
  • End-to-end testing to ensure data quality
    • Declarative testing (schema and Data Quality)
    • Promote business rule testing

The advantages for our customers fall in these categories:

  • Development Agility
    • Embrace change, deliver on business needs
    • Add easily new sources and new outputs
    • Deliver quickly, with no structural re-work to implement changes
    • Continuous quality assurance, thanks to test coverage
  • Cost efficiency
    • Do in days or weeks, not months
    • Reduce scope of changes
    • No high investments upfront
    • Cost scale with usage
  • Enterprise scale performance and usability
    • Proven patterns (process and technical)
    • Petabyte scale (MPP, Insert only…)
    • Live documentation & lineage
  • Deployment in hybrid clouds
    • Cloud agnostic (AWS, Azure, Google)
    • Move data seamlessly Cross Cloud
    • SAAS fruition
  • Full Audit-ability
    • Source data ingested “as is”
    • No updates (insert only)
    • Row level data lineage
    • Secure store + flex delivery
  • Freedom from vendor lock in
    • Open standards and tools
    • No platform lock in

This is our proposed road to information driven decision making.
Agile and business centric, rooted in strong software engineering best practices and automation.

If you are interested in the topic, please follow our live workshop that we presented at the Data Innovation Summit in 2020 or ask us for a demo.

Video content:

  • 1-8   min Case introduction
  • 8-15  min Architecture presentation
  • 15-60 min Live demo
  • 60-67 min Recap and Closing
  • 67-87 min Q&A - written question and live explanations

Roberto Zagni
Principal Consultant @ Kaito Insights

[1] Obstacles to AI & Analytics Adoption in The Cloud