Upload a CSV Dataset
OlmoEarth API Endpoints
OlmoEarth API endpoints are documented in the Interactive OlmoEarth API Browser.
You can use it to understand request and response schemas, make requests directly to the API, and generate client code in many common languages and libraries.
The following OlmoEarth API endpoints are used in this guide:
This guide explains how to create a dataset by uploading a CSV file containing point locations and associated metadata. This is useful when you already have georeferenced data and want to create annotation tasks around those locations.
Prerequisites
API Token
To get your API Token, see Authentication
Project ID
Datasets must be uploaded to a Project. To get a Project ID, view Your Projects, select your project, and copy the ID from the URL.
For example, if your project URL is:
https://olmoearth.allenai.org/projects/7e160260-5a5a-4120-ab33-8ce15998b982/tasks
Then your Project ID is 7e160260-5a5a-4120-ab33-8ce15998b982
CSV File Format
Your CSV file must include the following required columns:
latitude- Latitude coordinate in decimal degreeslongitude- Longitude coordinate in decimal degreestask_name- Name for grouping samples into tasks
Only point geometries are supported for CSV uploads. For other geometry types (polygons, lines), use GeoJSON format.
Optional Columns
start_time- Start of the time range in ISO 8601 format (e.g.,2023-01-15T00:00:00Z)end_time- End of the time range in ISO 8601 format- Any additional metadata columns you want associated with each sample
Example CSV File
Here's an example CSV file for monitoring forest health in the Pacific Northwest:
latitude,longitude,task_name,start_time,end_time,tree_species,health_status
48.7519,121.7453,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Douglas Fir,Healthy
48.7523,121.7467,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Western Hemlock,Stressed
48.7531,121.7489,North_Cascades_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Western Red Cedar,Healthy
47.7510,121.4388,Snoqualmie_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Douglas Fir,Diseased
47.7522,121.4401,Snoqualmie_Site_1,2023-06-01T00:00:00Z,2023-08-31T00:00:00Z,Big Leaf Maple,Healthy
In this example:
- Each row represents a tree location
- Samples are grouped into two tasks:
North_Cascades_Site_1andSnoqualmie_Site_1 - Additional metadata (
tree_species,health_status) is included for each sample
Components of the Request
1. Input File
The CSV file containing your georeferenced data.
2. Sources and Time Ranges
Define the satellite imagery sources to include for each task. See Creating a Gridded Dataset for details on available sources.
3. Buffer Size
Specify the buffer size in meters around each point. This determines the size of the imagery window acquired around each location. Default is 500 meters.
Example Request
Below is a complete example using curl to upload a CSV dataset. Visit the API Endpoint Documentation for a complete schema and sample requests in other languages or libraries.
curl -X POST "https://olmoearth.allenai.org/api/v1/datasets/upload-samples" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "input_file=@forest_health_survey.csv" \
-F 'source_time_ranges=[
{
"source": "sentinel2",
"start_time": "2023-06-01T00:00:00Z",
"end_time": "2023-08-31T00:00:00Z",
"count": 5
},
{
"source": "landsat",
"start_time": "2023-06-01T00:00:00Z",
"end_time": "2023-08-31T00:00:00Z",
"count": 3
}
]' \
-F "name=Forest Health Survey - Summer 2023" \
-F "project_id=7e160260-5a5a-4120-ab33-8ce15998b982" \
-F "buffer_size=1000" \
-F "resolution=10.0"
This endpoint uses multipart/form-data encoding since it includes file uploads. The source_time_ranges parameter must be provided as a JSON string.
Checking Dataset Status
The dataset will progress through several stages as it builds:
pending- Dataset creation has been queuedacquiring- Satellite imagery is being acquiredingesting- Data is being processed and ingestedcompleted- Dataset is ready for use
You can monitor the dataset progress using GET /api/v1/datasets/{dataset_id}.
Notes
- Dataset creation is an asynchronous process that may take hours depending on the number of samples and images requested
- The
task_namecolumn groups samples into annotation tasks. All samples with the sametask_nameare grouped together - The combined geometry of all samples in a task cannot exceed 6 degrees (the UTM zone interval)
- If
start_timeandend_timeare provided in the CSV, they will be used for the annotation and task times