AWS Kinesis Data Firehose  sandbox 

What is Kinesis Data Firehose?

  • Streaming ETL or streamig data pipeline. Not this [[data-pipeline|Data Pipeline]] though!
  • Load streaming data into OpenSearch, Redshift, S3, or 3rd party HTTP endpoints.
  • Batch, compress, buffer and encrypt data before loading to minimize storage needs at destination
  • Read streaming data from [[kinesis-data-streams|Kinesis data streams]]
  • Scales elastically with demand
  • Replicates data across AZ for high availability and durability.
  • Use Kinesis Agent installed servers to stream data to Firehose. Supports both linux/windows.
  • Built-in data format conversions into parquet ot ORC

Use-cases

  1. Say you want to ingest lots of data from S3 to OpenSearch cluster. The incoming data pattern is not linear, so there can be spikes. When ingesting via lambda, there could be timeouts or errors from OpenSearch if a large amount of data is pushed in a short window. Kinesis data firehose can smoothen out these spikes.