Skip to main content

Upload a CSV Dataset

tip

OlmoEarth API Endpoints

OlmoEarth API endpoints are documented in the Interactive OlmoEarth API Browser.

You can use it to understand request and response schemas, make requests directly to the API, and generate client code in many common languages and libraries.

The following OlmoEarth API endpoints are used in this guide:

This guide explains how to create a dataset by uploading a CSV file containing point locations and associated metadata. This is useful when you already have georeferenced data and want to create annotation tasks around those locations.

Prerequisites

API Token

To get your API Token, see Authentication

Project ID

Datasets must be uploaded to a Project. To get a Project ID, view Your Projects, select your project, and copy the ID from the URL.

For example, if your project URL is:

https://olmoearth.allenai.org/projects/7e160260-5a5a-4120-ab33-8ce15998b982/tasks

Then your Project ID is 7e160260-5a5a-4120-ab33-8ce15998b982

CSV File Format

Your CSV file must include the following required columns:

  • latitude - Latitude coordinate in decimal degrees
  • longitude - Longitude coordinate in decimal degrees
  • task_name - Name for grouping samples into tasks
note

Only point geometries are supported for CSV uploads. For other geometry types (polygons, lines), use GeoJSON format.

Optional Columns

  • start_time - Start of the time range in ISO 8601 format (e.g., 2023-01-15T00:00:00Z)
  • end_time - End of the time range in ISO 8601 format
  • Any additional metadata columns you want associated with each sample

Example CSV File

Here's an example CSV file for monitoring forest health in the Pacific Northwest:

latitude,longitude,task_name,start_time,end_time,tree_species,health_status
48.7519,121.7453,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Douglas Fir,Healthy
48.7523,121.7467,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Western Hemlock,Stressed
48.7531,121.7489,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Western Red Cedar,Healthy
47.7510,121.4388,Snoqualmie_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Douglas Fir,Diseased
47.7522,121.4401,Snoqualmie_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Big Leaf Maple,Healthy

In this example:

  • Each row represents a tree location
  • Samples are grouped into two tasks: North_Cascades_Site_1 and Snoqualmie_Site_1
  • Additional metadata (tree_species, health_status) is included for each sample

Components of the Request

1. Input File

The CSV file containing your georeferenced data.

2. Sources and Time Ranges

Define the satellite imagery sources to include for each task. See Creating a Gridded Dataset for details on available sources.

3. Buffer Size

Specify the buffer size in meters around each point. This determines the size of the imagery window acquired around each location. Default is 500 meters.

Example Request

Below is a complete example using curl to upload a CSV dataset. Visit the API Endpoint Documentation for a complete schema and sample requests in other languages or libraries.

curl -X POST "https://olmoearth.allenai.org/api/v1/datasets/upload-samples" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "input_file=@forest_health_survey.csv" \
-F 'source_time_ranges=[
{
"source": "sentinel2",
"start_time": "2023-06-01T00:00:00Z",
"end_time": "2023-08-31T00:00:00Z",
"count": 5
},
{
"source": "landsat",
"start_time": "2023-06-01T00:00:00Z",
"end_time": "2023-08-31T00:00:00Z",
"count": 3
}
]' \
-F "name=Forest Health Survey - Summer 2023" \
-F "project_id=7e160260-5a5a-4120-ab33-8ce15998b982" \
-F "buffer_size=1000" \
-F "resolution=10.0"
note

This endpoint uses multipart/form-data encoding since it includes file uploads. The source_time_ranges parameter must be provided as a JSON string.

Checking Dataset Status

The dataset will progress through several stages as it builds:

  • pending - Dataset creation has been queued
  • acquiring - Satellite imagery is being acquired
  • ingesting - Data is being processed and ingested
  • completed - Dataset is ready for use

You can monitor the dataset progress using GET /api/v1/datasets/{dataset_id}.

Notes

  • Dataset creation is an asynchronous process that may take hours depending on the number of samples and images requested
  • The task_name column groups samples into annotation tasks. All samples with the same task_name are grouped together
  • The combined geometry of all samples in a task cannot exceed 6 degrees (the UTM zone interval)
  • If start_time and end_time are provided in the CSV, they will be used for the annotation and task times