Testing: DataStore API

A guide to accessing, filtering, and integrating HDX tabular data through the Datastore API.

Note: This material is for alpha testing of a new HDX API endpoint built on CKAN's DataStore. It is not yet official documentation.

Introduction

The HDX DataStore API provides programmatic access to a number of tabular datasets hosted on HDX. It supports features like customizable queries and dataset joins to enable seamless integration into external tools and systems. By allowing real-time querying of tabular data, HDX DataStore API makes it easier for developers, analysts, and humanitarian responders to automatically integrate HDX data into their workflows.

The HDX DataStore API is powered by the CKAN DataStore extension.

The following documentation is intended for analysts and technical audiences to configure the API, troubleshoot, integrate with tools, and learn from working examples.

Building a HDX DataStore API Query

With your API token and the resource_id (see HDX Core Concepts section for more information), you have the key components needed to configure an HDX DataStore API query. The diagram below shows how these elements fit together; the different endpoints are explained in the following sections.

This diagram shows the basic elements of an HDX DataStore API query.

Available Data and Resource IDs

The DataStore API works at the resource level, meaning you query a specific file, not the entire dataset. DataStore API access is currently available for a subset of resources on HDX. We’re working on expanding coverage to include all structured CSV resources.

DataStore Resource ID: HDX UI

Here's how to find resources active on DataStore through the HDX UI:

  1. Navigate to the Available Datasets via DataStore API Page: Go to the discovery page that lists the available datasets for API access.

  2. Find your dataset: On the actual dataset page, each data resource is listed with a name, file format, and additional indicators (e.g. API access).

  3. Access DataStore API endpoint: For the individual resource available within HDX’s DataStore API, you’ll see a link to the relevant API(s) along with the resource ID. This information is essential for querying the resource programmatically.

DataStore Resource ID: HDX CKAN API

Resource ids can also be found in the HDX CKAN API response. See HDX System documentation for finding the resource_id through the HDX CKAN API.

From the response, identify the resource that corresponds to the structured data you're interested in. Sometimes datasets include multiple resources (e.g. data files, documentation). Inspect each item in the resources array and look for one where "datastore_active": true (if present), which indicates DataStore API compatibility. Note the id of that resource since this is your resource_id.

How to Query the Datastore

There are three ways that the Datastore API can be queried, each accessed via it's own endpoint. See below:

Native (datastore_search): This endpoint provides a simple, URL-based interface for querying records within a data resource. If you are starting out with APIs, this is the one you should use. It supports standard functionality including filtering by field values, text search, pagination, field selection, and sorting.

SQL (datastore_search_sql): This endpoint allows for more advanced querying using full SQL syntax. It is useful when complex conditions like aggregations, or joins within multiple resources are needed. It offers greater flexibility but also requires a proper understanding of SQL.

Information (datastore_info): This endpoint returns information about the data within the resource like column names and data types. This is helpful to better understand the data instead of calling the entire data table.

Use datastore_search when working with simple filters, browsing data, or building lightweight applications or a Power BI dashboard. Use datastore_search_sql for more analytical purposes or when multiple conditions and calculations are required. Use datastore_info for simple information on the data resources.

For more detailed information, please see the CKAN DataStore API documentation.

API Requests: Native

With the endpoint, token, and resource_id, you can now query data using standard API calls by configuring the URL like follows:

Endpoint

https://data.humdata.org/api/3/action/datastore_search

Example: Retrieve the rows in a dataset where the country is “Kenya” for 2024.

https://data.humdata.org/api/3/action/datastore_search?id=<resource_id>&filters={"country":"Kenya",”year”:2024}

Common Parameters (see the CKAN DataStore documentation for more details)

Parameter
Descriptions

resource_id

[Required] The resource unique identifier

limit

Maximum number of results to return

offset

Start index for pagination

q

Search across all data (can pass as dictionary to limit to a specific column)

filters

fields

Limit which columns are returned

sort

Sort results

For more advanced API queries, the SQL API supports a full subset of PostgreSQL syntax. It's ideal for advanced queries, aggregations, joins (within limits), and complex filters. It also requires the endpoint, token, and resource_id.

Endpoint

GET /datastore_search_sql

Parameters: sql (Required) A SQL query string. Must reference the correct resource (table name is the resource_id).

Example: WHERE clause

https:/data.humdata.org/api/3/action/datastore_search_sql?sql=SELECT+*+FROM+"id"+WHERE+country='Kenya'+AND+year=2024

Example SQL Query

SELECT country, AVG(value) FROM "id" WHERE indicator = 'population' AND year = 2024 GROUP BY country

Note: Be sure to URL-encode your SQL queries when sending them via GET (e.g. urlib.parse in Python).

API Requests: Datastore Info

This is a basic information endpoint which simply returns information about the data resource. This can be used to learn more about the column names and data types by retrieving a dictionary describing the columns and their types.

Endpoint

https://data.humdata.org/api/3/action/datastore_info

Parameters

Resource ID of datastore active resource.

Example URL

https://data.humdata.org/api/3/action/datastore_info?id=<resource_id>

Code Examples

The HDX DataStore API can be used with many different coding languages and processes in the humanitarian sector. Be mindful of query limits and performance, especially when working with large datasets. See specific code examples below.

Note that you can access the API in your browser if you are signed in to HDX.

See Python specific examples per endpoint below.

Native Query

This code searches an HDX resource where the country field is exactly ‘Kenya’, retrieves up to 5 results, and prints the total number of matches in the dataset.

import requests

api_token = "API_TOKEN"  # Replace with your actual token
resource_id = "RESOURCE_ID" # Replace with actual resource id
url = "https://data.humdata.org/api/3/action/datastore_search"

params = {
    "resource_id": resource_id,
    "filters": {"country": "Kenya"},
    "limit": 5
}

headers = {
    "Authorization": api_token
}

# Native query: Get first 5 records where country is 'kenya'
response = requests.get(url, headers=headers, params=params)
data = response.json()

print("Total results:", data["result"]["total"])

SQL Query

This code runs a SQL query on an HDX resource to return the first 5 rows where the country field contains “kenya” (case-insensitive search) and then prints how many rows were retrieved.

import requests

api_token = "API_TOKEN"  # Replace with your actual token
resource_id = "RESOURCE_ID" # Replace with actual resource id
url = "https://data.humdata.org/api/3/action/datastore_search_sql"

# SQL query: Get first 5 records where country contains 'kenya'
sql = f'SELECT * FROM "{resource_id}" WHERE country ILIKE \'%kenya%\' LIMIT 5'

headers = {
    "Authorization": api_token
}

response = requests.get(url, headers=headers, params={"sql": sql})
data = response.json()

print("Total results returned:", len(data["result"]["records"]))

Integration with Tools

The DataStore API can be integrated into various tools to enable automated, repeatable data retrieval and analysis workflows. See examples below.

Don’t see your tool listed but want help? Let us know by emailing hdx@un.org!

Power BI

Follow this section to connect HDX’s DataStore API to PowerBI as a data source. This can also be used as a Power Query in other Microsoft tools, such as Fabric.

  1. Open PowerBI Desktop. Within the PowerBI Desktop application (not the web version), go to Home → Get data. Select “Blank Query”. This will open the Power Query Editor window.

  2. Open Blank Query. Click “Advanced Editor” in the top rail.

  3. Paste PowerQuery with Custom Fields. In the editor, copy and paste this PowerQuery M script. You must replace the following with your own information.

    1. BaseUrl (Endpoint)

    2. ResourceID

    3. ApiToken (Your own API token)

  4. Set the connection to Anonymous. If prompted for credentials, choose “Anonymous”. If no prompt appears but there is an error, ensure you go to File → Options and settings → Data source settings. Find the entry for your data and clear permissions. Re-run the query which should prompt you to choose Anonymous.

  5. Apply the Query. Once the preview loads in the Power Query Editor, click “Close & Apply”. Power BI should fetch the full dataset using the pagination logic and filters.

  6. [Optional] Add a Filter. If you would like to apply a filter, you need to update the PowerQuery to include the filter in the config. See example here for an organization filter to IOM. Note this will need to be configured per the resource schema.

    1. OrgFilter (Param filter)

Python Notebook

We provide a ready-to-run Python ipynb notebook showing how to query the DataStore API (native and SQL) for Python scripts and pipelines. You can open it directly in your browser via GitHub and test queries without needing to install anything locally.

Open the Python notebook here.

Important:

  • You must provide your own API token in the notebook.

  • Never commit your token or share it publicly! Treat it like a password.

The notebook covers:

  • Searching for resources using the HDX CKAN API

  • Making a minimal native query “probe” call with limit=1

  • Use SQL query to join two resources with a common column

  • Filter in both native and SQL queries

  • Fetching data with filters and pagination

Troubleshooting and Error Handling

See troubleshooting recommendations and common errors by symptom with likely cause and fix below. Use the diagnosis flow below when a request fails or the response is empty.

Are you getting an error not listed here? Let us know! Email us at hdx@un.org

Quick Diagnosis Flow

Start with these simple steps to resolve your issues. See more detailed information per error code below.

  1. Test the resource with a minimal native call using limit equals 1.

  2. If it succeeds, add filters gradually.

  3. If native works but SQL fails, validate quoting and URL encoding of the SQL.

  4. If responses are empty, remove offset and lower limit to confirm presence of data.

  5. If performance degrades, page smaller and add filters.

If all else fails, re-confirm resource_id and datastore_active and recheck authentication with a valid API token.

Pagination

Both DataStore APIs (native and SQL) support pagination which is a technique used in API design and development to retrieve large data sets in a structured and manageable manner. The default limit is 100 records with a maximum limitation of 32,000 for both native and SQL queries.

It is recommended to use pagination to retrieve larger datasets and avoid timeouts or memory issues. To further optimize performance, use the HDX API’s filtering capabilities to limit the size of the data payload returned in each request.

Pagination is supported via the limit and offset parameters:

  • limit: Number of records to return per page (default is 100, maximum is 32,000).

  • offset: The starting point (index) for the next page of results.

This example retrieves 100 records starting from the 201st record (offset is zero-based): https://data.humdata.org/api/3/action/datastore_search?id=<resource_id>&limit=100&offset=200

To paginate through the entire dataset:

  1. Start with offset=0.

  2. Increment the offset by the limit (e.g., offset=100, offset=200, etc.) until result["records"] is empty.

SQL Query (datastore_search_sql)

Pagination is supported via SQL's LIMIT and OFFSET clauses:

Example: SELECT * FROM "<resource_id>" WHERE country = 'Kenya' LIMIT 100 OFFSET 200

Encoded URL: https://data.humdata.org/api/3/action/datastore_search_sql?sql=SELECT+*+FROM+"<resource_id>"+WHERE+country='Kenya'+LIMIT+100+OFFSET+200

  • Always use ORDER BY with pagination in SQL to ensure consistent result ordering across pages.

  • SQL-based pagination is more flexible but puts more load on the server; use filters and limits strategically.

  • Use SQL filters to retrieve only the data you need and reduce system load.

Authentication and Permissions

  • Symptom: 403 Forbidden

  • Likely cause: Missing or invalid API token in the Authorization header

  • Fix: Add a valid token and retry. Example with curl:

curl -H "Authorization: <your-api-token>" \ "https://data.humdata.org/api/3/action/datastore_search?resource_id=<id>&limit=1"

If you are testing in a browser, make sure you are logged in to HDX as described in the Authentication section.

Resource and Dataset Issues

  • Symptom: 404 Not Found or { "success": false, "error": "Not found" }

  • Likely cause: Wrong resource_id or the resource was removed or unpublished

  • Fix: Confirm the resource_id from the dataset page or via package_show and verify the resource still exists. Also verify that the resource has DataStore API access enabled. Look for datastore_active in the CKAN metadata.

  • Symptom: Empty records with total greater than zero

  • Likely cause: Using an offset beyond the number of rows or incompatible filter values

  • Fix: Remove filters and set limit to a small value to confirm data is present: .../datastore_search?resource_id=<id>&limit=5

  • Then add filters back one at a time.

SQL Query Errors

  • Symptom: { "success": false, "error": "Error parsing query" }

  • Likely cause: Unquoted resource_id or unencoded SQL

  • Fix: Quote the table name and URL encode the entire SQL: sql=SELECT%20*%20FROM%20"%3Cresource_id%3E"%20WHERE%20country%3D'Kenya'%20LIMIT%205

  • Symptom: Syntax error near ORDER or LIMIT

  • Likely cause: ORDER BY must reference valid column names and comes before LIMIT and OFFSET

  • Fix: Validate column names through a small select first, then add ORDER BY, then LIMIT and OFFSET as shown in the SQL examples.

Performance and Rate Limiting

  • Symptom: 429 Too Many Requests

  • Likely cause: High request concurrency or scraping patterns

  • Fix: Back off and retry with exponential backoff, reduce parallelism, and page through results. The user’s IP is rate limited at 60 requests per minute so plan for modest throughput.

Data Types and Schema Drift

For alerts on any changes to the data resources used within the Datastore API, enable notifications to be alerted about data schema changes. See more information here.

  • Symptom: Type errors or unexpected nulls

  • Likely cause: Mismatched types in filters or recent schema change in the source file

  • Fix: Inspect a sample of rows first to confirm types. If year is stored as text, pass "2024" rather than 2024. If the maintainer updated the schema, refresh your assumptions and adjust the query accordingly.

  • Symptom: Date filters fail to match

  • Likely cause: Inconsistent date formats in source

  • Fix: Use exact string matching that reflects the stored format, or use SQL functions where supported to normalize formats.

Feedback and Support

We are always adding new features and additional datasets into the HDX DataStore API. If you would like to request either a new feature or think a critical dataset is missing, please let us know!

Report bugs and request features by emailing hdx@un.org.

Last updated

Was this helpful?