Skip to main content

Embeddings

The OlmoEarth platform can compute and export embedding vectors for any region and time period. These vectors offer flexibility to support a range of downstream analyses without the need for heavy compute resources.

To learn more about how embeddings are generated or compute OlmoEarth embeddings on your own hardware (see Further reading).

Configuration

Computing embeddings follows the same workflow as any other prediction. Configure your model, draw or upload an area of interest, and run it. Several parameters tailor the output:

ParameterOptions
Foundation modelOlmoEarth Nano (128-dim, ~1.4M params), Tiny (192-dim, ~6.2M params), or Base (768-dim, ~90M params)
Spatial resolution10 meter, 20 meter, 40 meter, or 80 meter
Months of temporal context1-12. Use 12 for an annual composite or fewer for seasonal or monthly snapshots
Imagery sourcesSentinel-2 (optical), Sentinel-1 (radar), or both

Output format

The output is a Cloud-Optimized GeoTIFF (COG) with one band per embedding dimension (e.g., 192 bands for Tiny). Vectors are stored as signed 8-bit integers (int8):

  • Valid values range from -127 to +127.
  • -128 is reserved for nodata (pixels with insufficient imagery coverage).

See embedding_transforms.py for quantization and dequantization code. For many tasks, the int8 values perform well.

Example use cases

  • Segmentation - Train a lightweight classifier (e.g., logistic regression) on a small number of labeled pixels and predict across the full region.
  • Similarity search - Compute cosine similarity between a query location and every other pixel to find areas with similar surface characteristics.
  • Change detection - Compare embeddings from different time periods (e.g., monthly snapshots before and after an event) to identify where surface conditions have shifted.
  • Unsupervised exploration - Apply PCA or clustering to visualize spatial structure without any labels, a useful first step when exploring a new region.

For working code covering all four use cases, see the embeddings tutorial linked below.

Choosing a configuration

  • Foundation model - Start with Tiny for most tasks. It is lightweight but still highly performant. Upgrade to Base for tasks that require finer distinctions (e.g., many land-cover classes or crop-type differentiation) at the cost of higher compute and storage.
  • Spatial resolution - Match the resolution to your task. 40 meter is a good default for regional land-cover mapping. Use 10 meter when small features matter (e.g., individual fields, urban footprints).
  • Temporal context - Use 12 months (annual composite) for stable features like land cover. Use a shorter time period (min 1 month) to focus on more temporary conditions or change detection analysis.
  • Imagery sources - Start with Sentinel-2 (optical, multispectral) only. Adding Sentinel-1 (radar) can help in areas with persistent cloud cover but may introduce noise for other tasks.

Further reading

For applications that require higher accuracy than frozen embeddings can provide, the platform also supports supervised fine-tuning.