Ship a streaming pipeline before lunch
Run it locally against a print sink, then point the same YAML at Postgres in production.
A streaming runtime built in Rust on Arrow and DataFusion. Write plugins that run on the same zero-copy data plane as the built-ins. Define pipelines in YAML and ship in seconds.
sources:
raw.transactions:
type: kafka
topic: raw.event.transaction
transforms:
large_transactions:
type: sql
primary_key: id
sql: |
SELECT *
FROM raw.transactions
WHERE amount > 1000
sinks:
pg.large_transactions:
from: large_transactions
type: postgres
schema: public
table: large_transactions
primary_key: idExtending a typical engine means maintaining a custom build, or running your logic out of process and paying a serialization round trip on every message. Streamling loads your Rust as a native DataFusion operator. It reads the same Arrow batches as the built-ins, and the runtime wraps it in checkpointing and backpressure.
A plugin you build for one pipeline (a Postgres enrichment lookup, a partner-API sink) drops into any other by id. Teams build a library of domain operators, and the runtime treats them like natives.
WebAssembly-sandboxed JS/TS transforms run a process(input) function per record. Or use full DataFusion SQL with hundreds of built-in functions.
Most app data lives behind APIs. Streamling uses HTTP for enrichment transforms and webhook sinks, not just Kafka.
Postgres, ClickHouse, and Kafka are core sinks. Schema and tables are created automatically, and upsert semantics are tracked through the engine.
No coordinator or ZooKeeper. Tens of thousands of messages per second on 0.5 of a CPU core.
Streamling makes opinionated trade-offs so application teams don't have to babysit infrastructure. Less to learn. Less to operate.
Every operator, built-in or plugin, is a pull-based DataFusion ExecutionPlan passing Arrow RecordBatches over Tokio streams. One representation end to end is why SQL, TypeScript, HTTP handlers, and native plugins compose freely, and why you get tens of thousands of messages per second on half a CPU core.
No distributed shuffle or coordinator. Scale out horizontally with Kafka consumer groups. Each instance picks up its own partitions, naturally.
Operators are stateless by default. A pluggable state backend stores small bits of metadata, like Kafka offsets, in SQLite or Postgres.
Markers flow source → transforms → sinks. Sinks flush before sources commit. Upsert-aware sinks make redelivery harmless.
Need to filter against a list of IDs? Point a SQL transform at a Postgres-backed dynamic table. Update the lookup data without restarting the pipeline.
Live inspection of in-flight data. Print and blackhole sinks for debugging. Instant startup. OpenTelemetry built in. validate mode for CI and agentic tools.
Kafka, Postgres, ClickHouse, HTTP, SQL, TypeScript. These are just the built-in connectors, more are available as plugins.
sources:
ethereum.raw_blocks:
type: kafka
topic: mainnet.raw.blocks
# Offsets committed only after sinks flush.process(input) function.U256/I256 support. Upserts.ReplacingMergeTree upserts. Gzip compression.sources:
ethereum.raw_blocks:
type: kafka
topic: mainnet.raw.blocks
# Offsets committed only after sinks flush.Write a plugin for what's specific to you: read and enrich from Postgres, poll a partner API, push to your warehouse. Reuse it across every pipeline by id. Sources, transforms, sinks, UDFs, and topology preprocessors all share one trait.
Plugins load from a shared library at startup and run on the zero-copy data plane, not in a subprocess behind a serialization hop. The runtime hands your code the same Arrow batches and the same guarantees as the built-ins: backpressure, checkpoint markers, commit ordering, upsert propagation. The S3, MySQL, and SQS sinks above are plugins.
abi_stableRecordBatch over FFIuse streamling_plugin::{register_plugin_sink, SinkPlugin};
use streamling_core::{Result, PluginError, CheckpointEpoch};
use arrow::record_batch::RecordBatch;
pub struct SqsSink { /* ... */ }
#[async_trait]
impl SinkPlugin for SqsSink {
async fn initialize(&self) -> Result<(), PluginError> {
// open connection, register schema, etc.
Ok(())
}
async fn process_batch(&self, data: RecordBatch) -> Result<(), PluginError> {
// your sink logic — react to the incoming RecordBatch
self.client.send_batch(data).await?;
Ok(())
}
async fn process_checkpoint_marker(&self, epoch: CheckpointEpoch)
-> Result<(), PluginError> { /* prepare */ Ok(()) }
async fn process_checkpoint_finalizer(&self, epoch: CheckpointEpoch)
-> Result<(), PluginError> { /* commit */ Ok(()) }
}
register_plugin_sink!("sqs", SqsSink);Streamling has run in production for months as the engine for Goldsky's flagship product. It powers thousands of pipelines moving blockchain data for some of the largest teams in crypto.
Streamling powers Turbo, our newest product, which in turn powers products at Polymarket, Stripe, Phantom, and many more. Since launch, teams have built thousands of pipelines on it.
Run it locally against a print sink, then point the same YAML at Postgres in production.