Stay Updated With Latest Blogs, Industry Insights & More

You Already Have a Message Bus: Why We Stopped Using Kafka

Written by Alan Dennis | Jun 12, 2026 7:31:52 AM

 
 

Every data platform eventually needs to coordinate asynchronous events. A tenant finishes provisioning — the identity service needs to be notified. An ingestion job completes — the data quality check should start. A policy violation is detected — an alert needs to fire without blocking the detection path.

The conventional answer is a message bus: Apache Kafka, AWS SQS, Azure Service Bus, Google Cloud Pub/Sub. You stand one up, configure your topics, write producers and consumers, manage offsets, tune retention policies, monitor your brokers, and hope you hired someone who knows what they're doing when things go wrong.

We did this. Then we asked a question we probably should have asked earlier: why are we running a separate message bus when we already have Delta Lake?

The Insight: Delta Is Already a Log

Apache Kafka's core innovation — the one that made it the dominant distributed log — is treating a message queue as an append-only log with consumer-controlled offset tracking. Producers write records. Consumers read from an offset, advance that offset on success, and retry from the last committed offset on failure. This gives you at-least-once delivery with consumer-controlled progress.

Delta Lake has all of the same properties:

  • Append semantics. Delta writes are ACID-transactional. Every row appended commits fully or not at all.
  • Change tracking. Delta's Change Data Feed records every row insert as a versioned change event. A consumer reads CDF from version N forward and gets exactly the events published since its last checkpoint.
  • Consumer-controlled progress. A consumer stores its last-processed Delta version in a checkpoint table. It reads from version + 1 on the next cycle. If it crashes, it resumes from the last committed checkpoint on restart.
  • Multi-subscriber support. Each consumer has its own checkpoint, advancing independently. No consumer group coordination.
  • Permanent, queryable history. Messages are rows. You can SQL-query the full event history at any time.

The gap between "has the right properties" and "works as a production event bus" is latency and publishing ergonomics. That's what DeltaBus adds.

How DeltaBus Works

DeltaBus is a publish-subscribe event bus implemented entirely on Delta Lake and checkpoint-based Change Data Feed consumption. Its central data structure is a messages table — a CDF-enabled Delta table where each row is an event.

Publishing is dual-mode and transparent to the caller. When a Databricks ZeroBus endpoint is configured, the ZeroBus SDK provides gRPC streaming ingest with ~5-second acknowledged delivery. When it's not (or if the endpoint goes down), an in-memory buffer flushes to Delta every 5 seconds via a single batch write. Both modes produce identical outcomes: a row in the messages table, visible to consumers.

Consuming works through batch CDF polling. A consumer defines a topic filter, a handler function, and a stable consumer ID. On each poll cycle, it reads CDF changes from the last committed checkpoint forward, dispatches messages to the handler, and advances the checkpoint only after successful processing. Crash mid-processing? The consumer resumes from the last committed version on restart. Dead-lettered messages (after max retries) are preserved with full error context in a dedicated table for inspection and replay.

The entire system uses three Delta tables: messages (the event store), checkpoints (consumer progress), and dead_letters (failed messages). That's it.

The Economics Are Hard to Ignore

Here's a comparison at 10 million events per day — a moderate platform operations volume:

Solution Estimated Monthly Cost
Self-managed Kafka $2,800–8,000
Managed queue service (SQS, Service Bus) $650–3,000
DeltaBus ~$50

DeltaBus costs approximately $50/month in Delta storage, with no per-message ingestion fee and no idle compute. Kafka requires broker clusters, coordination services, and continuous monitoring. Managed queue services eliminate the operational burden but charge per-message and apply TTL-based retention policies that delete event history — making compliance reporting and forensics significantly harder.

DeltaBus retains events permanently. That 10-million-event-per-day history is queryable by SQL at any time. "Which components published the most events last week?" is a five-line SQL query against the messages table, not a pipeline into an external observability tool.

What DeltaBus Is — and Isn't — For

DeltaBus is the right choice for platform operations events: tenant provisioning, workflow coordination, ingestion lifecycle, audit logging, compliance tracking. For these workloads, 5–10 second end-to-end latency is entirely acceptable, permanent event history is a feature (not a liability), and Unity Catalog governance over the event stream is essential.

DeltaBus is not the right choice for sub-second latency requirements, billions of events per second, cross-workspace federation, or request-reply patterns. These workloads call for dedicated streaming infrastructure. DeltaBus doesn't try to compete with Kafka at its own game — it eliminates Kafka from the 80% of the workload where Kafka is operational overhead without corresponding benefit.

Governance That Extends to the Event Stream

One of the less-discussed costs of an external message bus is the governance gap. Events published to Kafka or SQS exist outside your Unity Catalog governance boundary. They can't be governed by table-level ACLs, included in data lineage, or queried by your SQL tools.

DeltaBus events live inside your Databricks workspace as rows in a Delta table. Unity Catalog ACLs apply to the messages table exactly like any other data asset. The event stream participates in the same governance model as your Bronze, Silver, and Gold tables. For regulated industries where every data access needs to be auditable, this matters.

Production Reality

Auraa's Covasant platform runs DeltaBus in production across all platform operations workflows. Tenant provisioning commands, ingestion lifecycle events, data quality notifications, and audit records all flow through DeltaBus. The messaging infrastructure has consumed effectively zero engineering time after initial setup — because there is nothing to manage. The Delta tables are governed by Unity Catalog. The SQL Warehouse is managed by the workspace. The operational surface area is the messages table, the checkpoints table, and the dead_letters table.

Before adopting DeltaBus, the honest question to ask is: does my workload have requirements that justify a dedicated message bus? For platform operations at lakehouse scale, the answer is almost always no.

Rethinking your event infrastructure on Databricks?

See how Covasant builds lakehouse-native platforms that cut operational overhead and keep the event stream inside the same governance boundary as your data, without standing up a separate message bus.

Talk to Our Expert

Frequently asked questions

Can Delta Lake replace Kafka as a message bus?

Delta Lake can replace Kafka for platform operations events because it already has the core properties of a distributed log: ACID append semantics, Change Data Feed for versioned change tracking, consumer-controlled progress via checkpoints, independent multi-subscriber support, and permanent SQL-queryable history. The gap between having those properties and being a production event bus is latency and publishing ergonomics, which is what DeltaBus adds on top.

How does DeltaBus work?

DeltaBus is a publish-subscribe event bus built on Delta Lake with checkpoint-based Change Data Feed consumption. Publishing is dual-mode: a ZeroBus gRPC endpoint gives roughly 5-second acknowledged delivery, and an in-memory buffer flushing to Delta every 5 seconds serves as the fallback. Consuming is batch CDF polling, where each consumer reads from its last committed checkpoint, dispatches to a handler, and advances the checkpoint only after success. The whole system uses three Delta tables: messages, checkpoints, and dead_letters.

How much cheaper is DeltaBus than Kafka or a managed queue?

At 10 million events per day, DeltaBus costs about $50 per month in Delta storage with no per-message fee and no idle compute. Self-managed Kafka runs roughly $2,800 to $8,000 per month, and managed queue services like SQS or Service Bus run about $650 to $3,000. DeltaBus also retains event history permanently rather than deleting it on a TTL, so compliance reporting and forensics stay queryable by SQL.

When should you still use Kafka instead of DeltaBus?

DeltaBus is not the right choice for sub-second latency, billions of events per second, cross-workspace federation, or request-reply patterns; those call for dedicated streaming infrastructure. It is the right choice for platform operations events such as tenant provisioning, workflow coordination, ingestion lifecycle, audit logging, and compliance tracking, where 5 to 10 second latency is acceptable and permanent governed history is a feature.

How does DeltaBus close the governance gap of an external message bus?

Events on an external bus like Kafka or SQS sit outside the Unity Catalog governance boundary, so they can't be governed by table-level ACLs, included in lineage, or queried by SQL tools. DeltaBus events live inside the Databricks workspace as rows in a Delta table, so Unity Catalog ACLs apply to the messages table exactly like any other data asset and the event stream participates in the same governance model as Bronze, Silver, and Gold tables.