Hello, data Shokunin-deshi!
How do you prevent broken data pipelines and scale data platforms that serve the business? In this pilot episode of Data Breakthroughs, I talk with Ilya Vladimirskiy — a fractional data leader who’s implemented Data Mesh and led data strategy at companies like Zalando, Ada Health, and Carfax Europe.
The pilot episode of Data Breakthroughs features Ilya Vladimirskiy, a Fractional Data Leader with a proven track record of tackling complex data challenges. With over 15 years of experience, Ilya has led initiatives in Data Mesh implementation and built data platforms that drive effective decision-making at companies like Zalando, Ada Health, and Carfax Europe. His hands-on experience in scaling data teams and navigating diverse tech stacks (GCP, AWS, etc.) makes him an ideal guest to discuss practical solutions for preventing data pipeline breaks. If you're looking for real-world strategies from a data leader who's been in the trenches, you won't want to miss this episode!
We dove deep into the problem, sharing experiences and starting to brainstorm potential solutions. It was an insightful discussion, and it also helped me refine the podcast format to bring you even more value in future episodes.
What to expect in Data Breakthroughs going forward:
Problem-First Focus: We'll tackle challenges directly from data consumers, ensuring our discussions address real-world business needs.
Categorized Challenges: Episodes will be organized by domain (analytics, infrastructure, strategy) to align with guest expertise.
Actionable Solutions: While exploring the problem is key, our main goal is to deliver practical takeaways you can implement.
Balanced Discussions: As your host, I'll ensure a perfect blend of technical depth and business relevance within our 45-minute format.
Diverse Perspectives: We'll embrace the many ways to solve data challenges – there's no one-size-fits-all!
This pilot episode is crucial in shaping Data Breakthroughs into a valuable resource for our community. And that's where you come in!
Core Data Concepts
Data Pipeline: A set of processes that move data from one or more sources to a destination, often involving extraction, transformation, and loading (ETL).
Schema: The structure of data, defining the types of data (e.g., text, numbers, dates) and their relationships within a dataset.
Data Warehouse: A central repository for storing and analyzing large amounts of data, often used for business intelligence and reporting.
Data Lake: A storage repository that holds a vast amount of raw data in its native format until it is needed.
ETL (Extract, Transform, Load): A process used to move data from various sources into a data warehouse or other destination.
Extract: Reading data from various sources.
Transform: Converting data into a usable format.
Load: Writing the transformed data into the destination.
Data Governance: The overall management of the availability, usability, integrity, and security of data in an enterprise.
Data Contract: An agreement between data producers and consumers that defines the format, structure, and quality of data being exchanged.
Data Mesh: A decentralized approach to data management that emphasizes domain ownership and self-service data infrastructure.
API (Application Programming Interface): A set of rules and specifications that software programs can follow to communicate with each other.
SLA (Service Level Agreement): A commitment between a service provider and a client that defines the level of service expected.
Roles and Teams
Data Producer: An individual or team responsible for creating or generating data (e.g., software engineers, application developers).
Data Consumer: An individual or team that uses data for analysis, reporting, or decision-making (e.g., data analysts, business users).
Data Engineer: A professional who designs, builds, and maintains data pipelines and data infrastructure.
Data Analyst: A professional who analyzes data to identify trends, patterns, and insights to support business decisions.
Systems and Technologies
ERP (Enterprise Resource Planning): Business management software that integrates various functions like finance, HR, and supply chain management.
BI (Business Intelligence): Technologies and strategies used to analyze business data and provide actionable insights.
AWS (Amazon Web Services): A cloud computing platform offering various services, including data storage and processing.
GCP (Google Cloud Platform): A suite of cloud computing services offered by Google.
SQL (Structured Query Language): A programming language used to manage and manipulate relational databases.
Key Concepts from the Discussion
Data-driven decision-making: Using data to inform business strategies and actions.
Data-informed decision-making: Using data as one factor among others (e.g., experience, intuition) in making decisions.
Observability: The ability to monitor and understand the internal state of a system based on its outputs.
Schema validation: The process of ensuring that data conforms to a predefined schema or structure.
Cross-functional teams: Teams comprising members from different functional areas (e.g., engineering, data, business).
I need your help to make Data Breakthroughs even better:
For Data Consumers: Are you struggling with a data challenge in your organization? Submit your problem to be featured on an upcoming episode of Data Breakthroughs! We're looking for real-world data challenges across:
Data Analysis & Reporting
Data Engineering & Infrastructure
Data Governance & Quality
Machine Learning & AI Implementation
Organizational Data Strategy
Business Intelligence & Dashboarding
Real-time Data Processing
Data Integration & ETL
🔗 Submit Your Data Challenge Here
For Data Professionals: Are you a data practitioner who enjoys solving complex problems? Join us as a guest on Data Breakthroughs to collaborate on solving real business challenges with data. We're looking for professionals with experience in:
Data Leadership
Data Engineering
Data Science
Analytics Engineering
Data Architecture
Data Product Management
Why is this podcast so important today? In our increasingly data-driven world, the gap between those who produce data and those who consume it can lead to misunderstandings, inefficiencies, and ultimately, a lack of trust in the data itself. Data Breakthroughs aims to bridge this gap by fostering open conversations on practical solutions and real-world impact.
Thank you again to Ilya for being an amazing first guest!
Stay tuned for more updates, and mark your calendars: the first official episode of Data Breakthroughs drops on May 2nd!
Best,
Lior
🎯 Why Listen:
If you work in data engineering, analytics, or product, this episode gives you field-tested solutions for common data infrastructure problems.
🔗 Links Mentioned:
Ilya's LinkedIn: https://www.linkedin.com/in/bkmy43/
Bruin: https://getbruin.com/
Data Mesh Manager: https://www.datamesh-manager.com/
Open standards for data contracts
--------
Whiteboard diagram from our episode
Connect & Contribute
Share this post