Open-vocabulary detection and segmentation
The open-vocabulary AI runs on cloud GPU (SAM2 + GroundingDINO + CLIP) and answers free-form prompts like "find every traffic sign", "segment the road surface", "highlight cracks larger than 10 cm". It is the cross-vertical edge-case handler: anything the trained models do not cover specifically can usually be addressed by a well-phrased prompt here.
Tier: Live for detection and segmentation on outdoor RGB imagery.
Run from the product
- Open a survey, raster, or single image.
- Click the AI icon.
- Pick Open-vocabulary detection or Open-vocabulary segmentation.
- Type a prompt. Examples:
traffic sign(detection)road surface defect(detection or segmentation)vegetation overhanging the carriageway(segmentation)manhole cover(detection)corroded section of pipe(segmentation)
- The estimated credit cost appears (1 credit per inference on a single image). Confirm.
Results return in 3 to 8 seconds depending on image size. Detections appear as bounding boxes; segmentations appear as coloured masks.
Run the public demo (no account required)
The public /try page at stratumly.com/try lets you upload one image and run an open-vocabulary inference without signing up. Useful for quick demos or for showing a prospect what the AI does.
- 5 requests per IP per day.
- 20 MB max upload size.
- EXIF metadata is stripped before processing.
The /try demo is a single inference per request; no feedback, no persistence, no per-tenant fine-tuning. For production use, run the same models from inside a Stratumly project.
Prompting tips
The models work best on prompts that name a visible object or surface:
- Good: "manhole cover", "broken bollard", "rusted railing".
- Less good: "anything unsafe", "interesting features" (too abstract).
For segmentation tasks, prompt the surface or region you want masked, not the abstract concept:
- Good: "road surface", "water in a flooded area", "snow cover".
- Less good: "areas that need repair" (no visual concept to ground on).
If a prompt returns no detections, try:
- A simpler phrasing.
- Lowering the confidence threshold in the result viewer.
- A different vantage on the same scene.
Combine with trained models
The trained defect-detection and segmentation models handle the head of your distribution (the common asset classes in your vertical). The open-vocabulary model handles the long tail. Common pattern:
- Run defect detection across the survey.
- Review the results.
- For anything the trained model missed, run an open-vocabulary prompt against the specific photo.
This pattern keeps credits low (trained model is fast and cheap) while still catching unusual cases.
Provide feedback
Each open-vocabulary detection or segment has Accept / Reject controls. Accept promotes the result to a feature in your asset register or to an inspection finding. Reject (with optional reason) marks it as a false positive.
Accept / reject feedback on a recurring prompt is how an open-vocabulary inference graduates into a trained model for your tenant. After enough feedback on the same prompt class, we can train a per-tenant model that is faster and cheaper than the open-vocabulary path.
Limitations
- Cloud GPU inference adds 3 to 8 seconds of latency per call. Not suitable for live video.
- The model has a context-window per inference; very wide aerial imagery may need to be tiled, which the worker handles automatically but costs more credits.
- Indoor imagery, thermal imagery, and night-time imagery work worse than daylight outdoor RGB.
What next?
- AI overview: the full AI catalogue.
- Defect detection: the trained model that handles common defect classes faster and cheaper.