Open source, by Goldsky

Columnar streaming with native-speed plugins

A streaming runtime built in Rust on Arrow and DataFusion. Write plugins that run on the same zero-copy data plane as the built-ins. Define pipelines in YAML and ship in seconds.

Star on GitHub
pipeline.yaml
sources:
  raw.transactions:
    type: kafka
    topic: raw.event.transaction

transforms:
  large_transactions:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM raw.transactions
      WHERE amount > 1000

sinks:
  pg.large_transactions:
    from: large_transactions
    type: postgres
    schema: public
    table: large_transactions
    primary_key: id
Why Streamling

Extend without forking

Extending a typical engine means maintaining a custom build, or running your logic out of process and paying a serialization round trip on every message. Streamling loads your Rust as a native DataFusion operator. It reads the same Arrow batches as the built-ins, and the runtime wraps it in checkpointing and backpressure.

Write once, run in every pipeline

A plugin you build for one pipeline (a Postgres enrichment lookup, a partner-API sink) drops into any other by id. Teams build a library of domain operators, and the runtime treats them like natives.

Write transforms in TypeScript or SQL

WebAssembly-sandboxed JS/TS transforms run a process(input) function per record. Or use full DataFusion SQL with hundreds of built-in functions.

HTTP endpoints are first-class

Most app data lives behind APIs. Streamling uses HTTP for enrichment transforms and webhook sinks, not just Kafka.

Write to the databases you already use

Postgres, ClickHouse, and Kafka are core sinks. Schema and tables are created automatically, and upsert semantics are tracked through the engine.

Single binary

No coordinator or ZooKeeper. Tens of thousands of messages per second on 0.5 of a CPU core.

Architecture

Pragmatic by design

Streamling makes opinionated trade-offs so application teams don't have to babysit infrastructure. Less to learn. Less to operate.

COLUMNAR, ZERO-COPY ARROW

One data plane

Every operator, built-in or plugin, is a pull-based DataFusion ExecutionPlan passing Arrow RecordBatches over Tokio streams. One representation end to end is why SQL, TypeScript, HTTP handlers, and native plugins compose freely, and why you get tens of thousands of messages per second on half a CPU core.

BUILT ON DATAFUSION

Single-node execution

No distributed shuffle or coordinator. Scale out horizontally with Kafka consumer groups. Each instance picks up its own partitions, naturally.

STATE BACKEND OPTIONAL

Stateless

Operators are stateless by default. A pluggable state backend stores small bits of metadata, like Kafka offsets, in SQLite or Postgres.

CHANDY–LAMPORT CHECKPOINTS

At-least-once delivery

Markers flow source → transforms → sinks. Sinks flush before sources commit. Upsert-aware sinks make redelivery harmless.

LIVE LOOKUPS, NO STATE

Dynamic tables, not joins

Need to filter against a list of IDs? Point a SQL transform at a Postgres-backed dynamic table. Update the lookup data without restarting the pipeline.

BUILD → VALIDATE → RUN

Delightful DX

Live inspection of in-flight data. Print and blackhole sinks for debugging. Instant startup. OpenTelemetry built in. validate mode for CI and agentic tools.

Connectors

Snap together the things you already use

Kafka, Postgres, ClickHouse, HTTP, SQL, TypeScript. These are just the built-in connectors, more are available as plugins.

Sources
Kafka
Avro + Schema Registry. Consumer groups. Tracked offsets.
kafka.yaml
sources:
  ethereum.raw_blocks:
    type: kafka
    topic: mainnet.raw.blocks
    # Offsets committed only after sinks flush.
Transforms
Sinks
kafka.yaml
sources:
  ethereum.raw_blocks:
    type: kafka
    topic: mainnet.raw.blocks
    # Offsets committed only after sinks flush.
Built-ins cover the common cases. Everything else is a plugin, and plugins run at native speed with the same guarantees.Write one
Plugin system

Your code is a first-class operator

Write a plugin for what's specific to you: read and enrich from Postgres, poll a partner API, push to your warehouse. Reuse it across every pipeline by id. Sources, transforms, sinks, UDFs, and topology preprocessors all share one trait.

Plugins load from a shared library at startup and run on the zero-copy data plane, not in a subprocess behind a serialization hop. The runtime hands your code the same Arrow batches and the same guarantees as the built-ins: backpressure, checkpoint markers, commit ordering, upsert propagation. The S3, MySQL, and SQS sinks above are plugins.

Stable ABI via abi_stable
Arrow RecordBatch over FFI
Backpressure handled for you
OpenTelemetry metrics built in
Async runtime provided
State backend access
sqs_sink.rs
use streamling_plugin::{register_plugin_sink, SinkPlugin};
use streamling_core::{Result, PluginError, CheckpointEpoch};
use arrow::record_batch::RecordBatch;

pub struct SqsSink { /* ... */ }

#[async_trait]
impl SinkPlugin for SqsSink {
    async fn initialize(&self) -> Result<(), PluginError> {
        // open connection, register schema, etc.
        Ok(())
    }

    async fn process_batch(&self, data: RecordBatch) -> Result<(), PluginError> {
        // your sink logic — react to the incoming RecordBatch
        self.client.send_batch(data).await?;
        Ok(())
    }

    async fn process_checkpoint_marker(&self, epoch: CheckpointEpoch)
        -> Result<(), PluginError> { /* prepare */ Ok(()) }

    async fn process_checkpoint_finalizer(&self, epoch: CheckpointEpoch)
        -> Result<(), PluginError> { /* commit  */ Ok(()) }
}

register_plugin_sink!("sqs", SqsSink);
In production

The engine behind Turbo Pipelines at Goldsky

Streamling has run in production for months as the engine for Goldsky's flagship product. It powers thousands of pipelines moving blockchain data for some of the largest teams in crypto.

1000s
of production pipelines
150+
chains streamed in real time

Streamling powers Turbo, our newest product, which in turn powers products at Polymarket, Stripe, Phantom, and many more. Since launch, teams have built thousands of pipelines on it.

Jeff Ling
Jeff Ling
Co-founder & CTO, Goldsky
GOLDSKY

Ship a streaming pipeline before lunch

Run it locally against a print sink, then point the same YAML at Postgres in production.

Read on GitHub