.

Technology

Apache Spark Structured Streaming

A scalable, fault-tolerant stream processing engine built on the Spark SQL optimizer for end-to-end exactly-once guarantees.

Structured Streaming treats live data streams as tables that continuously append, allowing developers to run SQL queries or DataFrame operations with the same logic used for batch processing. The engine handles checkpointing and write-ahead logs to ensure data integrity (exactly-once semantics) even during node failures. It integrates natively with Kafka, Kinesis, and Delta Lake, supporting sub-second latency via the micro-batch model or the Continuous Processing mode for millisecond-level requirements. By leveraging the Catalyst optimizer, it automatically handles complex tasks like event-time windowing and late data watermarking without manual state management.

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects