Glossary

Annotation – A point or polygon used for training that has an associated label. Also referred to as "ground truth," especially when collected from the field.

Annotating – The process of assigning labels to points or polygons, often by marking features on imagery. Also referred to as labeling.

Area of Interest – Defines a specific geospatial region (as specified by a GeoJSON Polygon or MultiPolygon) with a start time and an end time.

Category – A named group used in classification tasks (e.g., water, forest, buildings). Also referred to as a class.

Classification – A machine learning task that assigns a discrete class label to a pixel or window (e.g., land cover type, presence of a feature). Pixel-based classification is also referred to as segmentation.

Class – A category the model predicts in a classification task. Also referred to as a category.

Class distribution – The proportion of training examples belonging to each class in a dataset.

Dataset – A collection of annotations.

F1 – A measure of a model's accuracy that balances precision and recall into a single number.

Fine-tuning – The process of adapting a pretrained model using labeled data for a specific task.

Foundation model – A machine learning model trained on large and diverse datasets that can be adapted to many different tasks.

Ground truth – Verified labeled data used to train or evaluate a model.

Inference – The process of applying a trained model to new data to generate predictions. Also referred to as running inference.

Label – The value assigned to an annotation that the model learns to predict (e.g., crop type or biomass).

Metadata – Data associated with the point or polygon in an annotation. Metadata can be freeform or categorical text (string), boolean, or numerical data.

Model – A machine learning system that learns patterns from data to make predictions.

Model run – The process of running a fine-tuned model on an Area of Interest at a specific time (period).

Modality – A type of input data used by the model, such as radar or optical satellite imagery.

Negative examples – Training samples representing features the model should learn not to classify as the target class.

Overfitting – When a model learns patterns specific to the training data that do not generalize well to new data.

Per-pixel prediction – A prediction generated for every pixel in an image. Also referred to as pixel-level prediction.

Prediction – The output produced by a model when analyzing input data.

Project – A project encompasses all of the datasets, annotations, labels, fine-tuned models, predictions, and prediction results for a given geospatial intelligence task.

Regression – A machine learning task that predicts a numeric value (e.g., biomass or soil moisture).

Revisit rate – How often a satellite collects imagery for the same location. Also referred to as temporal resolution.

Segmentation – A machine learning task that assigns a class label to every pixel in an image, producing a continuous map of features (e.g., land cover boundaries, crop type extents). Also referred to as pixel-based classification.

Spatial context – The surrounding area of imagery provided to the model to help interpret the prediction window. Also referred to as the context window.

Spatial distribution – The geographic spread of labeled data across an area of interest.

Spatial resolution – The size of the ground area represented by a pixel in an image.

Task – A collection of annotations to be labeled or reviewed.

Task type – The type of prediction the model is trained to perform (e.g., classification, regression, segmentation). This is different than an annotation "Task" (see "Task" definition).

Temporal context – The time range of data the model uses when learning or making predictions.

Training – The process of teaching a model using labeled data.

Training data – The labeled data used to train a model.

Training split – The portion of a dataset used to train the model.

Validation split – The portion of a dataset used to monitor model performance and guide training.

Test split – The portion of a dataset used to evaluate the final model after training.

Window – A fixed area of imagery used as the unit of prediction. Also referred to as a scene or patch.

Window-level prediction – A prediction made for an entire window rather than individual pixels.