Infrastructure

Managed Kafka

Assistance-operated Apache Kafka for event streaming, CDC, integrations, and real-time data pipelines


Managed Kafka is for teams building event-driven services, integration pipelines, CDC flows, analytics ingestion, or replayable event logs. Assistance operates the Kafka platform while your teams own event contracts, producers, consumers, and business semantics.

Best-fit use cases#

Use caseWhy Kafka fits
Event-driven microservicesDurable topics decouple producers and consumers while preserving event history
Change data captureStream database changes into analytics, search, caches, or downstream systems
Integration busStandardize movement of events between internal services and external systems
Real-time analyticsFeed clickstream, activity, telemetry, or operational data into processing systems
Log and audit pipelinesRetain ordered, replayable event records for downstream analysis

What Assistance operates#

AreaIncluded managed service responsibility
ProvisioningCluster sizing, broker setup, storage configuration, network placement, secure defaults, and bootstrap details
ReliabilityReplication settings, broker health, controller health, backup/retention strategy where applicable, and runbooks
CapacityPartition, storage, throughput, consumer lag, and broker utilization monitoring
MaintenanceKafka version lifecycle guidance, patch planning, maintenance windows, rolling upgrades, and rollback planning
SecurityTLS, SASL, ACL model, service accounts, credential rotation support, and audit-friendly access practices
GovernanceTopic naming, retention defaults, partition guidance, schema registry practices, and onboarding workflow
SupportSeverity-based platform support and escalation for covered production clusters

Ownership boundary#

ResponsibilityAssistance ownsCustomer owns
Kafka runtimeBroker operations, upgrades, monitoring, capacity, and platform incident triageProducer and consumer application behavior
TopicsGuardrails, creation workflow, partition/retention recommendationsTopic purpose, event ownership, business retention requirements
SchemasRegistry operation and compatibility policy setup where includedSchema design, evolution approval, producer/consumer compatibility
ConnectorsPlatform operation when Kafka Connect is includedSource/sink credentials, data mapping, connector business behavior
IncidentsBroker/platform failures and service statusBad events, poison messages, consumer bugs, duplicate handling

Deployment options#

OptionWhen to choose it
Assistance physical serversDevelopment, integration testing, lower-cost internal event platforms, and CI environments
Customer cloud accountProduction platforms that must live near cloud-native applications and data services
Cloud-managed Kafka operationsAssistance operates MSK, Confluent, Azure Event Hubs Kafka API, or similar services where preferred
HybridDevelopment Kafka on Assistance infrastructure with production Kafka in cloud

Reliability and support model#

TopicManaged Kafka approach
AvailabilityMulti-broker design and target availability scoped by topology, provider, and support tier
DurabilityReplication factor, min in-sync replicas, retention, and compaction policies designed around data criticality
RecoveryRecovery expectations documented for broker loss, topic misconfiguration, and retention-related scenarios
PerformanceThroughput, latency, partitions, storage, and consumer lag monitored continuously for covered services
ResponseP1 response targets scoped in support agreement; 24/7 critical response available for covered production clusters

Onboarding#

1. Streaming assessment#

We review event sources, consumers, throughput, retention, ordering needs, data sensitivity, replay requirements, expected growth, and integration targets.

2. Platform design#

Assistance proposes broker count, storage, replication, networking, ACLs, topic standards, schema registry approach, monitoring, and support model.

3. Producer and consumer onboarding#

We document connection details, topic request workflow, ACLs, schema rules, consumer lag dashboards, and runbook expectations for new services.

4. Operate and govern#

After go-live, we monitor broker health, lag, storage, throughput, and topic growth. Governance rules prevent unbounded retention, partition sprawl, and undocumented data ownership.

Supported capabilities#

  • Apache Kafka broker clusters and KRaft/ZooKeeper lifecycle planning depending on version and environment
  • Topic and partition governance
  • Schema Registry with Avro, JSON Schema, or Protobuf where included
  • Kafka Connect operations for scoped source/sink connectors
  • Mirror or replication patterns for migration and disaster recovery where appropriate
  • Metrics and alerting for brokers, topics, partitions, and consumers

Not included by default#

  • Designing every event contract or business data model
  • Rewriting producers or consumers for idempotency and compatibility
  • Guaranteeing delivery semantics for application code outside Kafka
  • Unlimited retention, topics, partitions, connectors, or throughput outside the plan
  • Owning downstream data correctness after consumers process events

Getting started#

Frequently asked questions#

Is Kafka the right choice for simple background jobs? Not always. Redis queues, a database-backed queue, or a managed cloud queue can be simpler. Kafka is best when you need durable replayable streams, multiple consumers, and event history.

Who creates topics? We define a topic request workflow. Assistance can create topics and enforce defaults, while your team identifies ownership, purpose, retention, schema, and consumers.

Can you operate MSK or Confluent instead of self-hosted Kafka? Yes. We can operate cloud-managed Kafka services in your account or tenancy when that is the better fit.

How do you handle schema changes? Schema governance is part of onboarding when Schema Registry is included. Your teams own schema design and compatibility decisions; Assistance operates the registry and policy mechanism.

What SLA applies to Kafka? Availability and response targets are scoped by cluster topology, provider dependencies, and support tier. We define these before production onboarding.