Glossary
Machine Learning Concepts
Foundation Model
A foundation model is a machine learning model that has been pre-trained on a large amount of data. The OlmoEarth foundation model is multi-modal, meaning that it has been pre-trained on satellite data in varying resolutions and wavelengths of the electromagnetic spectrum (sources include Sentinel-1, Sentinel-2, Landsat, etc).
Rather than being trained with a single specific purpose in mind, foundation models have fundamentally broad adaptability due to the quantity and diversity of the data that has been incorporated into the model.
For the model to be useful for a particular geospatial intelligence purpose, it must be further trained on data that is specific to that purpose. This final training, also known as fine-tuning, results in a model that relies on a broad understanding of geospatial patterns and features in order to perform a given task.
Because a foundation model has already been pre-trained, it can take significantly less time and fewer training samples to make it effective when compared to the amount of time and data that would be required to train a purpose-built model from scratch.
Fine-Tuned Model
Also sometimes referred to simply as a Model. It is a foundation model that has been additionally trained on annotated data.The OlmoEarth platform provides a tool to easily upload or create annotated data, the infrastructure to fine-tune the model, and model itself. Once a model has been fine-tuned. It can be run on a given Area to produce a result, also referred to as a prediction.
OlmoEarth Concepts
Model Run
This represents the process of running a fine-tuned model on a target as defined by an Area of Interest. For example, a model may be run on a given region in order to determine the types of crops that are growing there. When the run has completed, it will produce a prediction.
Due to the amount of data and computation involved in a model run, it can take hours for the request to complete. Using the OlmoEarth Studio and API, you can monitor the progress of the model run as it completes.
Prediction
A Prediction is the output of a Model Run in the OlmoEarth Studio.
Project
A project encompasses all of the datasets, annotations, labels, fine-tuned models, predictions, and prediction results for a given geospatial intelligence task.
Area of Interest
Defines a specific geospatial region (as specified by a GeoJSON Polygon or MultiPolygon) with a start time and an end time. For example, the region might include the greater Seattle area from December 21st 2014 to January 21st, 2015.
Annotation
An annotation is a unit of ground truth that is used to fine-tune a model. In OlmoEarth it can be a point, polygon, or line and is associated with a collection of Metadata that provides further elaboration and detail to the model. For example, if a model is being fine-tuned to identify trees, an annotation might identify an individual tree with a point on the map, along with the species, height, age, and any other relevant data.
Metadata
The textual data associated with the point or polygon in an annotation. Metadata might be a Tag, string, boolean, or numerical data.
Tag
A type of textual metadata that can be added to an annotation. Tags are defined in advance, and annotators can choose which tags to add to the annotation. If creating an annotation that identifies a tree, the annotating user might be given the option to choose from a list of tags that contain species of trees.
Task
In the OlmoEarth platform, a task is a collection of annotations to be completed. It will typically define a rectangular region. In order to complete the task, a user would complete one mor more annotations within the boundaries of the region defined by the task.
Dataset
In the OlmoEarth platform, a Dataset is a collection of Tasks. The Dataset can be created by defining a geospatial region which will then be divided into a grid of sub-regions, one for each Task.
If you already have annotation data and would like to upload it directly instead of manually entering it in the OlmoEarth platform, you can structure it as a CSV or GeoJSON and upload it to automatically create a dataset with pre-populated annotations.