Open-vocabulary detection and segmentation

The open-vocabulary AI runs on cloud GPU (SAM2 + GroundingDINO + CLIP) and answers free-form prompts like "find every traffic sign", "segment the road surface", "highlight cracks larger than 10 cm". It is the cross-vertical edge-case handler: anything the trained models do not cover specifically can usually be addressed by a well-phrased prompt here.

Tier: Live for detection and segmentation on outdoor RGB imagery.

Run from the product

Open a survey, raster, or single image.
Click the AI icon.
Pick Open-vocabulary detection or Open-vocabulary segmentation.
Type a prompt. Examples:
- traffic sign (detection)
- road surface defect (detection or segmentation)
- vegetation overhanging the carriageway (segmentation)
- manhole cover (detection)
- corroded section of pipe (segmentation)
The estimated credit cost appears (1 credit per inference on a single image). Confirm.

Results return in 3 to 8 seconds depending on image size. Detections appear as bounding boxes; segmentations appear as coloured masks.

Run the public demo (no account required)

The public /try page at stratumly.com/try lets you upload one image and run an open-vocabulary inference without signing up. Useful for quick demos or for showing a prospect what the AI does.

5 requests per IP per day.
20 MB max upload size.
EXIF metadata is stripped before processing.

The /try demo is a single inference per request; no feedback, no persistence, no per-tenant fine-tuning. For production use, run the same models from inside a Stratumly project.

Prompting tips

The models work best on prompts that name a visible object or surface:

Good: "manhole cover", "broken bollard", "rusted railing".
Less good: "anything unsafe", "interesting features" (too abstract).

For segmentation tasks, prompt the surface or region you want masked, not the abstract concept:

Good: "road surface", "water in a flooded area", "snow cover".
Less good: "areas that need repair" (no visual concept to ground on).

If a prompt returns no detections, try:

A simpler phrasing.
Lowering the confidence threshold in the result viewer.
A different vantage on the same scene.

Combine with trained models

The trained defect-detection and segmentation models handle the head of your distribution (the common asset classes in your vertical). The open-vocabulary model handles the long tail. Common pattern:

Run defect detection across the survey.
Review the results.
For anything the trained model missed, run an open-vocabulary prompt against the specific photo.

This pattern keeps credits low (trained model is fast and cheap) while still catching unusual cases.

Provide feedback

Each open-vocabulary detection or segment has Accept / Reject controls. Accept promotes the result to a feature in your asset register or to an inspection finding. Reject (with optional reason) marks it as a false positive.

Accept / reject feedback on a recurring prompt is how an open-vocabulary inference graduates into a trained model for your tenant. After enough feedback on the same prompt class, we can train a per-tenant model that is faster and cheaper than the open-vocabulary path.

Limitations

Cloud GPU inference adds 3 to 8 seconds of latency per call. Not suitable for live video.
The model has a context-window per inference; very wide aerial imagery may need to be tiled, which the worker handles automatically but costs more credits.
Indoor imagery, thermal imagery, and night-time imagery work worse than daylight outdoor RGB.

What next?

AI overview: the full AI catalogue.
Defect detection: the trained model that handles common defect classes faster and cheaper.

Run from the product​

Run the public demo (no account required)​

Prompting tips​

Combine with trained models​

Provide feedback​

Limitations​

What next?​