governance

Andrey Kozichev

If It's Not in Code, It Doesn't Exist: Building Data Platforms the Right Way

The Problem: Technical Debt in the Data Layer

Have you ever inherited a datastore that's grown organically over years? Where creating a new instance or copy is only possible via full dump and restore? Where nobody knows how the schema evolved or why certain decisions were made? Where you can't tell which columns, tables, and fields are actually in use versus leftovers from long-forgotten migrations?

Or perhaps you've inherited 500+ data models with blurred relationships, no lineage, no logic, no history, and no tracking?

Sound familiar?

This is the reality of modern data engineering. Data professionals often miss out on development experience and don't embrace version control—not because they're careless, but because historically, all they were given was access to databases. There was normally no meaningful definition for "development" of data models and transformation code. While organizations focused on speeding up development practices and introducing DevOps, data teams got neglected. They were fewer in number, siloed, sitting away from the critical path on data warehouses and offline reporting.

But things are changing. The data tools have developed rapidly over the past five years, and this process will accelerate with AI adoption.

The complexity of modern data platforms, the velocity of change, and the scale at which we operate have made ad-hoc approaches obsolete. Software development learned these lessons decades ago - version control, code review, automated testing, CI/CD aren't nice-to-haves, they're survival mechanisms.

A Brief History of "Everything as Code"

Let me take you back. In 2010, that's when I saw my first real Infrastructure as Code implementation at Yahoo. Before that, when I was managing less than 20 servers. I can still remember their public IP addresses. Back then creating a CVS repository out of the /etc folder seemed like cutting-edge innovation. Otherwise it was expect scripts for managing Cisco switches, or the classic pattern of named.conf, named.conf_backup, named.conf_0517? We've come a long way since then.

At Yahoo, all infrastructure changes were packaged into RPM-like packages and distributed via a network of repositories using rsync over SSH. To speed things up, you could fire off hundreds of SSH sessions simultaneously to trigger configuration daemons on remote Linux and BSD machines - just to update your default shell. Configuration was stored in SVN - big step forward from CVS.

Fast forward through iterations of Puppet, Chef, Ansible, and then the cloud-vendor specific languages like CloudFormation, ARM templates and Boto3. Finally, we have cross-platform solutions like Terraform and container orchestration standards like Kubernetes manifests, plus everything else in between from raw python to opinionated DSL frameworks.

Why GUI is Overrated (and Dangerous)

GUIs work fine when you're configuring simple things: choose a region, pick a size, click deploy. But modern data infrastructure isn't simple anymore.

Consider setting up a Snowflake warehouse: compute size, auto-suspend settings, auto-resume, scaling policy, resource monitors, statement timeout, query acceleration, multi-cluster settings. That's a dozen parameters minimum. Or configuring an S3 bucket: versioning, lifecycle policies, encryption settings, access policies, CORS rules, replication, logging, object lock, inventory configurations. Easily 20+ decisions.

You spend an hour clicking through the console, finally get it right. Then what? You can't version it. You can't review it. You can't replicate it to another environment. You can't see what changed between your dev and prod configurations. When something breaks, you're clicking through tabs trying to remember what you set.

Modern tools without Infrastructure as Code support have no future.

When I encounter new technology, I check the Terraform provider documentation before I even open the Cloud Console. If I can't define it as code, I question whether I should be using it at all. This has saved me countless times from adopting tools that would become maintenance nightmares.

This is especially true with Microsoft's stack - it's too easy to mess up when everything relies on clicking through interfaces. Linux taught us an important lesson: everything is a file. In the cloud era, that translates to: everything is code.

The "We'll Automate Later" Fallacy

Here's the line I hear constantly: "We'll implement it manually first, then automate later."

But what does "automate" actually mean to these teams? Writing it as code. This is perhaps the biggest misunderstanding in platform engineering.

The illusion that GUI implementation is faster is complete nonsense. Think about it this way: you don't hire Java developers who tell you that writing Java is hard and slow, so they'll first build something in a visual tool and port it to Java later. That would be absurd. Why would we accept this in data engineering?

Yes, there's a place for prototyping. Even Java developers use UI tools to mock up something visual. But that prototype never leaves their laptop - it's not deployed to production. This is not the case with infrastructure, pipelines, or cloud services. Once clickops infrastructure is deployed, it's "done," and the organization is at the mercy of backlog prioritization to port the clickops rubbish into proper code.

Here's the uncomfortable truth: that backlog item never gets prioritized. Technical debt is easy to accumulate and hard to pay down. The manual implementation becomes the permanent implementation. New features always take priority over "make that thing we built properly codified." Meanwhile, your data platform becomes increasingly fragile, undocumented, and impossible to replicate.

The Real Question

Why do we think it's acceptable for data engineers don't know their primary development languages - whether that's Terraform, SQL or Python? We don't accept this anywhere else in software engineering. A backend developer must know their language. A frontend developer must know JavaScript. Why are data and cloud engineers allowed to say "I'll click through the console and figure out the code part later"?

They are developers. Code is their primary interface. The GUI is the fallback, not the default.

When you start with code:

You're forced to understand what you're building
You can test it before deploying
You can version and review it
You can replicate it across environments
You build muscle memory in the right tools

When you start with clicks, you're building technical debt from day one.

Building Data Platforms with Code-First Principles

At MetaOps, we've embraced a simple philosophy: If it's not in code, it doesn't exist.

Here's what that looks like in practice:

1. Version Control is the Foundation

Everything starts from a Git repository:

Schema definitions and migrations (Liquibase, Flyway)
dbt models and transformations
Pipeline configurations
Infrastructure definitions
Documentation

No exceptions.

2. Schema Changes Must Be Immutable

Schema changes are applied only via version-controlled pipelines. We use tools like Liquibase and Flyway to ensure:

Every change is tracked
Changes can't be modified after deployment
Rollback procedures are well-defined
Multiple environments stay in sync

No more "quick fixes" directly in the production database.

3. Idempotency: Run It a Thousand Times, Get the Same Result

Here's a fundamental principle of infrastructure as code: idempotency. Apply your configuration once or a hundred times - you get the same result. No side effects. No accumulated drift. No "oops, I ran that twice and now production is broken."

This is where code-based approaches fundamentally differ from clickops:

With manual clicks:

Run it twice → you might create duplicate resources
Apply in wrong order → system is in inconsistent state
Re-run after failure → unpredictable results
Someone else runs it → different interpretation, different outcome

With idempotent code (Terraform, Liquibase, Flyway):

Run it 1,000 times → same result every time
Infrastructure converges to desired state automatically
Failed deployments can safely retry
Everyone gets identical results

Terraform checks current state and applies only what's needed. Liquibase tracks which migrations ran and skips duplicates. Flyway's checksum validation ensures migrations haven't changed. This isn't just convenient - it's foundational to reliable infrastructure.

The practical benefit? You can re-run deployments without fear. CI/CD pipeline failed halfway through? Just run it again. Not sure if the last deployment completed? Run it again. This transforms deployment from a high-stakes, sweaty-palms event into a routine, boring operation.

Clickops can never be idempotent because humans aren't idempotent. We make mistakes. We forget steps. We interpret instructions differently. Code, on the other hand, executes exactly the same way every single time.

4. Data Transformations as Code

All dbt models and python code live in version control and are deployed via CI/CD pipelines with:

Automated testing
Code review requirements
Environment-specific configurations
Clear deployment history

5. Cloud Configuration in Terraform

We use Terraform extensively for data platform configuration:

Airbyte connections and configurations
Fivetran syncs and transformations
Snowflake resources
BigQuery datasets and tables
Databricks workspaces and clusters

The temptation to use the UI is strong - don't give in. Every click in a UI is technical debt you're accumulating.

6. Application Code in Containers

Python projects are:

Versioned in Git
Packaged in Docker containers with pinned library versions
Deployed via CI/CD pipelines
Tested automatically
Versioned and tagged

7. GitOps for Kubernetes Deployments

Every tool deployed on Kubernetes uses Flux to manage deployments. This means:

Git is the single source of truth
Changes are automatically synced
Rollbacks are simple
Audit trails are automatic
Documentation is clear

The Mindset Shift

The hardest part isn't the tools - it's the mindset. Data engineers need to adopt software development practices:

Don't get attached to instances, databases, or storage. Only data is valuable. Everything else should be rebuildable from code.
Embrace immutability. Once deployed, configurations shouldn't change. Deploy new versions instead.
Test everything. If you're not testing your dbt models, schema migrations, and pipeline configurations, you're flying blind.
Review everything. Code review isn't just for application developers. Your data transformations and infrastructure changes need review too.

The Results: What We've Achieved

After implementing these practices on a project, the results speak for themselves:

80% Reduction in Manual BAU Work

The most immediate impact: we've reduced manual business-as-usual work by 80% compared to traditional approaches. No more clicking through consoles. No more documenting manual steps. No more "tribal knowledge" gatekeeping.

Consistency Across Client Infrastructure

The biggest gain isn't just efficiency - it's the consistency we bring to client infrastructure. There's no more puzzling over how to reproduce problems. Replaying scenarios becomes trivially easy. You messed up your test dataset or just want to start from a clean slate? No problem - it's just a pipeline run away.

Stability and Natural Barriers

Nothing happens at random anymore. When you write things as code, the room for mistakes is quite limited. It provides a natural barrier for someone who doesn't understand the system. You can't accidentally click the wrong button or fat-finger a production change. The code either works or it doesn't - and if it doesn't, the CI/CD pipeline catches it before it reaches production.

Systems Are Super Easy to Adopt

The old "copy and change" approach works exceptionally well. Turns out, you keep reusing the same or similar patterns. Copying a file and modifying it is much easier than "clickops". New team members can:

Copy an existing Terraform module for a similar service
Adapt a dbt model from another project
Reuse pipeline configurations across environments
See exactly what changed between versions with Git diff

Additional Benefits

When everything is code:

New team members understand systems by reading repositories, not by asking veterans
Disaster recovery becomes: clone repos, run deployment scripts
Environment parity is guaranteed - dev, staging, and prod are configured identically
Compliance and auditing are built-in through Git history
Experimentation is safe - branch, test, merge, or discard
Knowledge doesn't leave with people - it's captured in repositories

Moving Forward

If you're still clicking through cloud consoles to configure your data platform, or applying ad-hoc schema changes you're building on sand. The next migration, the next infrastructure change, the next team member who doesn't know about your undocumented trigger - that's when things fall apart.

Start small:

Pick one manual process and codify it
Put your next dbt model in version control with tests
Define your next cloud resource in Terraform instead of the console
Write a migration script instead of running ALTER TABLE statements manually

The goal isn't perfection from day one. The goal is to move in the right direction: toward systems that are reproducible, auditable, and resilient.

Because in the end, if it's not in code, it doesn't exist. And if it doesn't exist in code, it will eventually stop existing altogether.

Humans aren't idempotent. We make mistakes. We forget steps. We interpret instructions differently. Code, on the other hand, executes exactly the same way every single time.