Previous
Query from code
A data pipeline runs a scheduled MQL aggregation query against your captured data and stores the results as precomputed summary documents. Instead of querying 86,000 raw sensor readings to compute an hourly average, you query a single summary document that the pipeline already computed.
Pipelines are useful when:
Pipelines are not necessary when:
Pipelines run in the cloud against data that has already been synced. They reduce query time, not bandwidth or storage volume. To reduce what gets sent from the machine, see Filter at the edge.
A pipeline has four parts:
An MQL aggregation query. A sequence of MongoDB aggregation stages ($match, $group, $project, and others) that transforms raw documents into summary documents. You write the query; the pipeline runs it automatically.
A cron schedule. Determines how often the pipeline runs. The schedule also determines the query time window: an hourly schedule (0 * * * *) scopes each run to the previous hour of data. A 15-minute schedule (*/15 * * * *) scopes each run to the previous 15 minutes. Schedules are in UTC.
A data source. Either standard (the raw readings collection containing all historical data) or hotstorage (the hot data store containing a rolling window of recent data).
A pipeline sink. The destination collection where results are stored. Each pipeline has its own sink. You query pipeline results by specifying the pipeline_sink data source type and the pipeline’s ID.
When a pipeline’s cron schedule triggers:
Each run processes exactly one time window with no gaps and no overlaps between consecutive runs.
When you enable backfill on a pipeline, Viam processes historical time windows that the pipeline missed. This is useful in two scenarios:
When backfill is disabled, each time window is processed exactly once. Late-arriving data is not incorporated into past summaries.
$match filters.| Source type | What it queries | When to use |
|---|---|---|
standard | The raw readings collection containing all historical tabular data | Default. Use for aggregations over any time range. |
hotstorage | The hot data store containing a rolling window of recent data | Use when your pipeline only needs recent data and you want lower query latency. |
pipeline_sink | The output of another pipeline | Use when chaining pipelines: one pipeline produces summaries, another aggregates those summaries further. Requires the source pipeline’s ID. |
Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!