Previous
Examples and tips
Configuration fields, execution behavior, and limits for data pipelines. For an overview of how pipelines work, see Data pipelines overview.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Pipeline name. Must be unique within the organization. |
organization_id | string | Yes | Organization UUID. |
schedule | string | Yes | Cron expression in UTC. Determines both when the pipeline runs and the query time window. See Cron schedule. |
mql_binary | array | Yes | MQL aggregation pipeline as an array of stage objects. See Supported MQL operators. |
enable_backfill | bool | Yes | Whether to process historical time windows. See Backfill behavior. |
data_source_type | enum | No | Data source to query. Default: standard. See Data source types. |
The schedule field uses standard five-field cron syntax: minute hour day-of-month month day-of-week. All times are UTC.
The schedule determines both when the pipeline runs and the time range it queries. Each run processes the time window between the previous two schedule ticks.
| Schedule | Frequency | Query time range per run |
|---|---|---|
0 * * * * | Hourly | Previous hour |
0 0 * * * | Daily | Previous day |
*/15 * * * * | Every 15 minutes | Previous 15 minutes |
*/5 * * * * | Every 5 minutes | Previous 5 minutes |
For example, a pipeline with schedule 0 * * * * that triggers at 03:00 PM UTC processes data from 02:00 PM to 03:00 PM UTC. The time window is [start, end) (start inclusive, end exclusive).
Choose a schedule that matches how frequently you need updated summaries. Shorter intervals produce more granular summaries but create more pipeline sink documents.
| Type | CLI flag | Python SDK | Go SDK | Description |
|---|---|---|---|---|
| Standard | standard | TabularDataSourceType.TABULAR_DATA_SOURCE_TYPE_STANDARD | app.TabularDataSourceTypeStandard | Queries the raw readings collection. Contains all historical tabular data. |
| Hot storage | hotstorage | TabularDataSourceType.TABULAR_DATA_SOURCE_TYPE_HOT_STORAGE | app.TabularDataSourceTypeHotStorage | Queries the hot data store. Rolling window of recent data. |
| Pipeline sink | (query only) | TabularDataSourceType.TABULAR_DATA_SOURCE_TYPE_PIPELINE_SINK | app.TabularDataSourceTypePipelineSink | Queries the output of another pipeline. Requires a pipeline_id. |
| Status | Value | Description |
|---|---|---|
UNSPECIFIED | 0 | Unknown or not set. |
SCHEDULED | 1 | Run is queued. Execution begins after a 2-minute delay. |
STARTED | 2 | MQL query is executing against the data source. |
COMPLETED | 3 | Run finished successfully. Results are in the pipeline sink. |
FAILED | 4 | Run encountered an error. Check the error_message field on the run. |
If a run stays in STARTED for more than 10 minutes, it is automatically marked as FAILED and a new run is created for that time window.
Each pipeline run record contains:
| Field | Type | Description |
|---|---|---|
id | string | Run identifier. |
status | enum | Current status. See Run statuses. |
start_time | timestamp | When the run started executing. |
end_time | timestamp | When the run completed or failed. |
data_start_time | timestamp | Start of the data time window this run processed (inclusive). |
data_end_time | timestamp | End of the data time window this run processed (exclusive). |
error_message | string | Error details if the run failed. Empty on success. |
When enable_backfill is true:
standard data source, backfill may provision an Atlas Data Federation instance for faster historical queries.When enable_backfill is false:
Backfill does not apply to windows missed while a pipeline was disabled. If you disable a pipeline for 3 hours and re-enable it, those 3 hours are not backfilled.
Each pipeline stores its output in a dedicated sink collection named sink-<pipeline-id>. Each result document includes metadata:
{
"_viam_pipeline_run": {
"id": "run-id",
"interval": {
"start": "2025-03-15T14:00:00.000Z",
"end": "2025-03-15T15:00:00.000Z"
},
"organization_id": "org-id"
},
"location": "warehouse-a",
"avg_temp": 23.5,
"count": 3600
}
The _viam_pipeline_run field is added automatically. Your pipeline’s $project output fields appear alongside it.
To query the sink, use data source type pipeline_sink with the pipeline’s ID. See Query pipeline results.
Deleting a pipeline deletes the sink collection and all its data. Export results before deleting if you need to preserve them.
| Limit | Value |
|---|---|
| Maximum output documents per run | 10,000 |
| MQL execution timeout | 5 minutes |
| Execution start delay | 2 minutes after scheduled time |
| Hung run detection | 2x execution timeout (currently 10 minutes) in STARTED state |
| Backfill batch size | 10 concurrent time windows |
| Backfill throttle | 2-minute delay between batches |
Only organization owners can create, modify, and delete data pipelines. Query access to pipeline results follows the same permissions as other data queries. See Permissions.
Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!