Auraa is an agentic data platform built natively on Databricks, and it rests on a foundational architectural principle that we have written down as our zeroth Architecture Decision Record (ADR-000), titled Metadata Is Data.
ADR-000 - Metadata Is Data. All platform metadata and configuration - user roles, tool registries, agent definitions, data quality rules, project settings, execution logs, lineage - flows through the same bronze → silver → gold medallion as business data, governed by Unity Catalog, isolated by tenant path, with Delta Lake as the single write authority. Operational state gets the same rigor as customer records.
This isn’t a slogan. It’s an invariant. Every persistent state change - a new role assignment, a tool registration, a configuration update - is written to Delta. Nothing writes operational state to a side database, a local file, or an in-memory-only store. That gives us one infrastructure to back up, one access-control model (Unity Catalog), one audit trail, one quality regime applied uniformly to business data and the metadata that governs the platform.
It also gives us history for free. Every governance write lands in Delta, so the medallion architecture already captures the full event stream in bronze and lets us time-travel across silver and gold. “What tools were registered at 2pm yesterday?” is a SELECT ... AS OF TIMESTAMP query, not a log grep - and we never had to design, populate, or maintain a separate audit table to make it work. The lakehouse is the audit table.
Then we tried to serve APIs from it.
Databricks Apps is a powerful capability - it lets you ship APIs that run next to the data, inside the same governance perimeter as everything else. But Delta-via-Spark is not a sub-second point-read engine. JVM startup, query planning, no connection pooling. Every API request that needed an auth check, a tool definition, or a project setting was paying SQL Warehouse cold-start latency - roughly 2–6 seconds on Serverless, longer on Pro or Classic - before it could do real work. For a web app rendering interactive views, that’s the difference between something that feels alive and something that requires patience. For an MCP server fielding agent tool calls, the cost is real but more forgiving - LLMs already take a beat to respond, so the warehouse delay tends to hide inside latency we’d already accepted.
The obvious fix - just put a Postgres next to it - would have broken ADR-000. Two write authorities, schema drift, audit gap, governance fracture. Not an option. The whole point of treating metadata as data is to not have a second master.
Databricks ships Synced Tables: managed declarative pipelines (Lakeflow, formerly Delta Live Tables) that mirror Delta into Lakebase, the platform’s managed Postgres serving layer. One-directional sync (Delta → Lakebase), so the write-authority invariant held. On paper, perfect. We adopted it.
In practice it didn’t hold up. Continuous mode meant a streaming Lakeflow pipeline running 24/7 per table, and we had 35+ governance tables - compute cost stayed lit at zero writes. Snapshot and triggered modes were misleadingly named: they still needed an external trigger, and Lakebase went stale between syncs. There was no read-your-own-write story for API callers, so we ended up adding explicit waits or falling back to Delta for the next read - both ugly. And the sync was a blind mirror with no hook for validation, deduplication, or enrichment at propagation time. Too much infrastructure, not enough control.
We retired Synced Tables and moved propagation into the application layer. A single Governance Writer fans out every repository write to three destinations as one unit of work:
- Silver (Delta) - INSERT INTO REPLACE WHERE, fatal on failure
- Lakebase (Postgres) - INSERT … ON CONFLICT DO UPDATE, non-fatal, runs in parallel with silver
- Bronze (Delta) - append-only audit event, non-fatal, fires after silver
Delta is still the contract. Lakebase is downstream, populated at write time - not via a pipeline. The ADR-000 write-authority invariant still holds, because nothing writes to Lakebase except this one application-managed path, and the metadata-is-data principle still holds end-to-end: the same governed Delta tables that flow through the medallion are what the API reads, just served through a faster surface.
API callers get read-your-own-write, connection pooling, and sub-second reads from the same Unity-Catalog-governed state. Auth checks, tool registry lookups, project settings - all served from Lakebase with no warehouse cold-start in the hot path. Thirty-five DLT pipelines turned off. Silver Delta is still there for what it’s good at: batch joins, time travel, cross-table analytics, dashboards. Same logical state, two physical surfaces, consumer picks based on access pattern.
Worth saying out loud why this pattern works for us: governance writes are rare. A tenant onboards, a tool gets registered, a role gets granted - we see hundreds to low thousands of these per day. The reads against the same state happen on every API request, orders of magnitude more often. Spending a few extra milliseconds on a rare write to make every read fast is a trade that compounds. Run this same pattern against a high-throughput ingestion workload (billions of rows a day, every one carrying real-time analytical weight) and the math flips. Right tool for the job - not every job.
One honest caveat
Schema discipline is now our problem instead of the platform’s. Every additive Delta column has to land in Postgres DDL too, with matching types and nullability. We pay that cost in code review and migration scripts, where Synced Tables would have absorbed it for us. It’s the trade we’d make again, because latency in the read path of every API call is far more expensive than discipline at the schema boundary.
Building AI-native data platforms on Databricks?
See how Covasant designs governed, production-grade lakehouse architectures that balance performance, scalability, and operational simplicity without compromising on single-source-of-truth principles.