Upload a CSV Dataset

tip

OlmoEarth API Endpoints

OlmoEarth API endpoints are documented in the Interactive OlmoEarth API Browser.

You can use it to understand request and response schemas, make requests directly to the API, and generate client code in many common languages and libraries.

The following OlmoEarth API endpoints are used in this guide:

This guide explains how to create a dataset by uploading a CSV file containing point locations and associated metadata. This is useful when you already have georeferenced data and want to create annotation tasks around those locations.

Prerequisites

API Token

To get your API Token, see Authentication

Project ID

Datasets must be uploaded to a Project. To get a Project ID, view Your Projects, select your project, and copy the ID from the URL.

For example, if your project URL is:

https://olmoearth.allenai.org/projects/7e160260-5a5a-4120-ab33-8ce15998b982/tasks

Then your Project ID is 7e160260-5a5a-4120-ab33-8ce15998b982

CSV File Format

Your CSV file must include the following required columns:

latitude - Latitude coordinate in decimal degrees
longitude - Longitude coordinate in decimal degrees
task_name - Name for grouping samples into tasks

note

Only point geometries are supported for CSV uploads. For other geometry types (polygons, lines), use GeoJSON format.

Optional Columns

start_time - Start of the time range in ISO 8601 format (e.g., 2023-01-15T00:00:00Z)
end_time - End of the time range in ISO 8601 format
Any additional metadata columns you want associated with each sample

Example CSV File

Here's an example CSV file for monitoring forest health in the Pacific Northwest:

latitude,longitude,task_name,start_time,end_time,tree_species,health_status
7519,121.7453,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Douglas Fir,Healthy
7523,121.7467,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Western Hemlock,Stressed
7531,121.7489,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Western Red Cedar,Healthy
7510,121.4388,Snoqualmie_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Douglas Fir,Diseased
7522,121.4401,Snoqualmie_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Big Leaf Maple,Healthy

In this example:

Each row represents a tree location
Samples are grouped into two tasks: North_Cascades_Site_1 and Snoqualmie_Site_1
Additional metadata (tree_species, health_status) is included for each sample

Components of the Request

1. Input File

The CSV file containing your georeferenced data.

2. Sources and Time Ranges

Define the satellite imagery sources to include for each task. See Creating a Gridded Dataset for details on available sources.

3. Buffer Size

Specify the buffer size in meters around each point. This determines the size of the imagery window acquired around each location. Default is 500 meters.

Example Request

Below is a complete example using curl to upload a CSV dataset. Visit the API Endpoint Documentation for a complete schema and sample requests in other languages or libraries.

curl -X POST "https://olmoearth.allenai.org/api/v1/datasets/upload-samples" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "input_file=@forest_health_survey.csv" \
  -F 'source_time_ranges=[
    {
      "source": "sentinel2",
      "start_time": "2023-06-01T00:00:00Z",
      "end_time": "2023-08-31T00:00:00Z",
      "count": 5
    },
    {
      "source": "landsat",
      "start_time": "2023-06-01T00:00:00Z",
      "end_time": "2023-08-31T00:00:00Z",
      "count": 3
    }
  ]' \
  -F "name=Forest Health Survey - Summer 2023" \
  -F "project_id=7e160260-5a5a-4120-ab33-8ce15998b982" \
  -F "buffer_size=1000" \
  -F "resolution=10.0"

note

This endpoint uses multipart/form-data encoding since it includes file uploads. The source_time_ranges parameter must be provided as a JSON string.

Checking Dataset Status

The dataset will progress through several stages as it builds:

pending - Dataset creation has been queued
acquiring - Satellite imagery is being acquired
ingesting - Data is being processed and ingested
completed - Dataset is ready for use

You can monitor the dataset progress using GET /api/v1/datasets/{dataset_id}.

Notes

Dataset creation is an asynchronous process that may take hours depending on the number of samples and images requested
The task_name column groups samples into annotation tasks. All samples with the same task_name are grouped together
The combined geometry of all samples in a task cannot exceed 6 degrees (the UTM zone interval)
If start_time and end_time are provided in the CSV, they will be used for the annotation and task times

OlmoEarth API Endpoints

Prerequisites​

API Token​

Project ID​

CSV File Format​

Optional Columns​

Example CSV File​

Components of the Request​

1. Input File​

2. Sources and Time Ranges​

3. Buffer Size​

Example Request​

Checking Dataset Status​

Notes​