Previous
Configure a vision pipeline
Your vision service is configured and running, but you need to do something useful with the results. This how-to shows you how to retrieve detections programmatically, filter them to reduce noise, and extract the information you need to build real applications – counting objects, triggering actions when something appears, or feeding positions into a control loop.
A detection is a single recognized object in an image. Each detection includes:
| Field | Type | Description |
|---|---|---|
class_name | String | The label assigned by the model (for example, “person”, “dog”, “stop-sign”) |
confidence | Float (0.0-1.0) | How confident the model is in this detection |
x_min | Integer | Left edge of the bounding box in pixels |
y_min | Integer | Top edge of the bounding box in pixels |
x_max | Integer | Right edge of the bounding box in pixels |
y_max | Integer | Bottom edge of the bounding box in pixels |
The bounding box coordinates use the image coordinate system: (0,0) is the top-left corner, x increases to the right, y increases downward. The bounding box is axis-aligned (not rotated).
Detections also include normalized coordinates (x_min_normalized, y_min_normalized, x_max_normalized, y_max_normalized) as floats between 0.0 and 1.0, representing positions relative to image dimensions. These are useful when you need resolution-independent coordinates.
A single call to the detection API can return zero, one, or many detections. The number depends on how many objects the model finds in the frame.
The vision service provides two methods for getting detections:
Every detection has a confidence score between 0.0 and 1.0. A score of 0.95 means the model is very confident; a score of 0.3 means it is guessing. Models often produce many low-confidence detections that are noise rather than real objects.
Choosing the right threshold depends on your application:
Not every detection task requires a trained ML model. Viam includes a built-in color_detector vision service model that detects regions of a specified color. This is useful for simple tasks like finding a red ball or detecting a blue marker. It requires no model training, no GPU, and minimal configuration.
The simplest way to get detections is to let the vision service capture an image and run the model in one call.
import asyncio
from viam.robot.client import RobotClient
from viam.services.vision import VisionClient
async def main():
opts = RobotClient.Options.with_api_key(
api_key="YOUR-API-KEY",
api_key_id="YOUR-API-KEY-ID"
)
robot = await RobotClient.at_address("YOUR-MACHINE-ADDRESS", opts)
detector = VisionClient.from_robot(robot, "my-detector")
# Get detections directly from the camera
detections = await detector.get_detections_from_camera("my-camera")
for d in detections:
print(f"{d.class_name}: {d.confidence:.2f} "
f"at ({d.x_min},{d.y_min})-({d.x_max},{d.y_max})")
await robot.close()
if __name__ == "__main__":
asyncio.run(main())
package main
import (
"context"
"fmt"
"go.viam.com/rdk/logging"
"go.viam.com/rdk/robot/client"
"go.viam.com/rdk/services/vision"
"go.viam.com/utils/rpc"
)
func main() {
ctx := context.Background()
logger := logging.NewLogger("detect")
machine, err := client.New(ctx, "YOUR-MACHINE-ADDRESS", logger,
client.WithDialOptions(rpc.WithEntityCredentials(
"YOUR-API-KEY-ID",
rpc.Credentials{
Type: rpc.CredentialsTypeAPIKey,
Payload: "YOUR-API-KEY",
})),
)
if err != nil {
logger.Fatal(err)
}
defer machine.Close(ctx)
detector, err := vision.FromProvider(machine, "my-detector")
if err != nil {
logger.Fatal(err)
}
detections, err := detector.DetectionsFromCamera(ctx, "my-camera", nil)
if err != nil {
logger.Fatal(err)
}
for _, d := range detections {
fmt.Printf("%s: %.2f at (%d,%d)-(%d,%d)\n",
d.Label(), d.Score(),
d.BoundingBox().Min.X, d.BoundingBox().Min.Y,
d.BoundingBox().Max.X, d.BoundingBox().Max.Y)
}
}
If you already have an image – from a file, from a previous capture, or from a different camera – you can run detection on it directly.
from viam.components.camera import Camera
from viam.services.vision import VisionClient
camera = Camera.from_robot(robot, "my-camera")
detector = VisionClient.from_robot(robot, "my-detector")
# Capture images from the camera
images, _ = await camera.get_images()
# Run detection on the first image
detections = await detector.get_detections(images[0])
for d in detections:
print(f"{d.class_name}: {d.confidence:.2f}")
cam, err := camera.FromProvider(machine, "my-camera")
if err != nil {
logger.Fatal(err)
}
detector, err := vision.FromProvider(machine, "my-detector")
if err != nil {
logger.Fatal(err)
}
// Capture an image first
images, _, err := cam.Images(ctx, nil, nil)
if err != nil {
logger.Fatal(err)
}
img, err := images[0].Image(ctx)
if err != nil {
logger.Fatal(err)
}
// Run detection on the captured image
detections, err := detector.Detections(ctx, img, nil)
if err != nil {
logger.Fatal(err)
}
for _, d := range detections {
fmt.Printf("%s: %.2f\n", d.Label(), d.Score())
}
In practice, you almost always want to filter out low-confidence detections. Apply a threshold before processing results.
CONFIDENCE_THRESHOLD = 0.7
detections = await detector.get_detections_from_camera("my-camera")
# Filter to only high-confidence detections
confident_detections = [
d for d in detections
if d.confidence >= CONFIDENCE_THRESHOLD
]
print(f"{len(confident_detections)} of {len(detections)} "
f"detections above {CONFIDENCE_THRESHOLD} threshold")
for d in confident_detections:
print(f" {d.class_name}: {d.confidence:.2f}")
confidenceThreshold := 0.7
detections, err := detector.DetectionsFromCamera(ctx, "my-camera", nil)
if err != nil {
logger.Fatal(err)
}
total := len(detections)
var confident []objectdetection.Detection
for _, d := range detections {
if d.Score() >= confidenceThreshold {
confident = append(confident, d)
}
}
fmt.Printf("%d of %d detections above %.1f threshold\n",
len(confident), total, confidenceThreshold)
for _, d := range confident {
fmt.Printf(" %s: %.2f\n", d.Label(), d.Score())
}
When your model detects multiple object types, you may only care about specific classes.
TARGET_CLASSES = {"person", "dog"}
detections = await detector.get_detections_from_camera("my-camera")
targets = [
d for d in detections
if d.class_name in TARGET_CLASSES and d.confidence >= 0.6
]
for d in targets:
width = d.x_max - d.x_min
height = d.y_max - d.y_min
print(f"{d.class_name}: {d.confidence:.2f}, "
f"size {width}x{height} pixels")
targetClasses := map[string]bool{"person": true, "dog": true}
detections, err := detector.DetectionsFromCamera(ctx, "my-camera", nil)
if err != nil {
logger.Fatal(err)
}
for _, d := range detections {
if targetClasses[d.Label()] && d.Score() >= 0.6 {
bb := d.BoundingBox()
width := bb.Max.X - bb.Min.X
height := bb.Max.Y - bb.Min.Y
fmt.Printf("%s: %.2f, size %dx%d pixels\n",
d.Label(), d.Score(), width, height)
}
}
Most real applications need continuous detection, not a single snapshot. Run detections in a loop with a short delay to avoid overwhelming the system.
import asyncio
import time
CONFIDENCE_THRESHOLD = 0.7
detector = VisionClient.from_robot(robot, "my-detector")
while True:
start = time.time()
detections = await detector.get_detections_from_camera("my-camera")
confident = [d for d in detections if d.confidence >= CONFIDENCE_THRESHOLD]
elapsed = time.time() - start
if confident:
names = [f"{d.class_name}({d.confidence:.2f})" for d in confident]
print(f"[{elapsed:.2f}s] Detected: {', '.join(names)}")
else:
print(f"[{elapsed:.2f}s] No detections")
await asyncio.sleep(0.1)
confidenceThreshold := 0.7
detector, err := vision.FromProvider(machine, "my-detector")
if err != nil {
logger.Fatal(err)
}
for {
start := time.Now()
detections, err := detector.DetectionsFromCamera(ctx, "my-camera", nil)
if err != nil {
logger.Error(err)
time.Sleep(time.Second)
continue
}
elapsed := time.Since(start)
var confident []objectdetection.Detection
for _, d := range detections {
if d.Score() >= confidenceThreshold {
confident = append(confident, d)
}
}
if len(confident) > 0 {
for _, d := range confident {
fmt.Printf("[%v] %s: %.2f\n", elapsed, d.Label(), d.Score())
}
} else {
fmt.Printf("[%v] No detections\n", elapsed)
}
time.Sleep(100 * time.Millisecond)
}
For simple color-based detection, configure a vision service with the color_detector model instead of mlmodel. This requires no trained model.
{
"name": "red-detector",
"api": "rdk:service:vision",
"model": "color_detector",
"attributes": {
"detect_color": "#FF0000",
"hue_tolerance_pct": 0.1,
"segment_size_px": 200
}
}
| Attribute | Type | Required | Description |
|---|---|---|---|
detect_color | string | Required | Target color in hex format (for example, "#FF0000"). Cannot be black, white, or grayscale. |
hue_tolerance_pct | float | Required | How much the hue can vary from the target (must be > 0.0 and <= 1.0). A value of 0.1 means 10% tolerance. |
segment_size_px | int | Required | Minimum number of pixels a detected region must contain to count as a detection. |
saturation_cutoff_pct | float | Optional | Minimum saturation for a pixel to be considered a match. Default: 0.2. Increase to reject washed-out colors. |
value_cutoff_pct | float | Optional | Minimum brightness value for a pixel to be considered a match. Default: 0.3. Increase to reject dark regions. |
label | string | Optional | The label to assign to detections. If omitted, detections have no class name. |
camera_name | string | Optional | Default camera to use with GetDetectionsFromCamera if no camera name is specified in the request. |
The detection API works identically whether you use mlmodel or color_detector. Your code does not need to change.
If you need the image and its detections (and optionally classifications) together, use CaptureAllFromCamera instead of separate calls. This is more efficient when you need multiple result types and ensures the detections correspond exactly to the returned image.
from viam.services.vision import VisionClient
detector = VisionClient.from_robot(robot, "my-detector")
result = await detector.capture_all_from_camera(
"my-camera",
return_image=True,
return_detections=True,
return_classifications=True
)
image = result.image # The captured image
detections = result.detections # Detections for that exact image
classifications = result.classifications # Classifications too
captOpts := viscapture.CaptureOptions{
ReturnImage: true,
ReturnDetections: true,
ReturnClassifications: true,
}
result, err := detector.CaptureAllFromCamera(
context.Background(), "my-camera", captOpts, nil)
if err != nil {
logger.Fatal(err)
}
img := result.Image
detections := result.Detections
classifications := result.Classifications
This is particularly useful for data logging, visualization, or any case where you need to correlate results with the exact frame that produced them.
color_detector and point the camera at a brightly colored object.Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!