Open Data Maker exposes a read-only HTTP API for querying available datasets. This document explains:
- How to define and execute queries as URLs
- Refining query results using option parameters
- Extracting query results in JSON and CSV format
- Detecting query errors
Each query is expressed as a URL, containing:
- The URL Path to the Open Data Maker service, e.g.
https://collegescorecard.ed.gov/
(for Open Data Maker running its own server), orhttps://api.data.gov/ed/collegescorecard/v1/
(for Open Data Maker proxied through an API gateway) - The API Version String. Currently the only supported version string is:
v1
- The Endpoint representing a particular dataset, e.g.
schools
. Endpoint names are usually plural. - The Format for the result data. The default output format is JSON (JavaScript Object Notation); CSV is also available.
- The Query String containing a set of named key-value pairs that
represent the query, which incude
- Field Parameters, specifying a value (or set of values) to match against a particular field, and
- Option Parameters, which affect the filtering and output of the
entire query. Option Parameter names are prefixed with an underscore (
_
).
Here's an example query URL:
https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=2,3&_fields=id,school.name,2013.student.size
In this query URL:
https://api.data.gov/ed/collegescorecard/v1/
is the URL Path.v1
is the API Version String, followed by/
, which separates it from the Endpoint.schools
is the Endpoint. Note the plural..json
is the Format. Note the dot between the Endpoint and Format. Also note that, since JSON is the default output format, we didn't have to specify it.- In keeping with standard URI Query String syntax, the
?
and&
characters are used to begin and separate the list of query parameters. school.degrees_awarded.predominant=2,3
is a Field Parameter. In this case, it's searching for records which have aschool.degrees_awarded.predominant
value of either2
or3
._fields=id,school.name,2013.student.size
is an Option Parameter, as denoted by the initial underscore character._fields
is used to limit the output fields to those in the given list. We strongly recommend using the_fields
parameter to reduce the amount of data returned by the API, thus increasing performance.
Here's an example of the JSON document that Open Data Maker would provide in response to the above query:
{
"metadata": {
"total": 3667,
"page": 0,
"per_page": 20
},
"results": [
{
"id": 190752,
"school.name": "Yeshiva of Far Rockaway Derech Ayson Rabbinical Seminary",
"2013.student.size": 57
},
{
"id": 407009,
"school.name": "Arizona State University-West",
"2013.student.size": 3243
},
{
"id": 420574,
"school.name":"Arizona State University-Polytechnic",
"2013.student.size": 3305
}
// ... (further results removed) ...
]
}
A successful query will return a JSON with two top-level elements:
metadata
: A JSON Object containing information about the results returned. The metadata fields are:total
: The total number of records matching the querypage
: The page number for this result setper_page
: The number of records returned in a single result set. (For more information about thepage
andper_page
fields, see the section on Pagination)
results
: A JSON Array of record objects. Due to the use of the_fields
option in this query, there are only three fields in each record - the three fields specified in the_fields
parameter. When the_fields
parameter is omitted, the full record is provided.
Let's change the query so as to generate an error when it's executed:
https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=frog&_fields=id,school.name,wombat
This is the JSON document returned:
{
"errors": [
{
"error": "field_not_found",
"input": "wombat",
"message": "The input field 'wombat' (in the fields parameter) is not a field in this dataset."
},
{
"error": "parameter_type_error",
"parameter": "school.degrees_awarded.predominant",
"input": "frog",
"expected_type": "integer",
"input_type": "string",
"message": "The parameter 'school.degrees_awarded.predominant' expects a value of type integer, but received 'frog' which is a value of type string."
}
]
}
When failing to execute a query, Open Data Maker will attempt to return a JSON error document that explains the problem. Points to note:
- The HTTP response status will be in the 400 or 500 range, indicating a problem.
- The JSON document contains a single top-level element,
errors
, containing a JSON Array of error objects. Instead of simply returning the first error encountered, Open Data Maker attempts to list all the problems that it detected. - Error objects always contain these two elements:
error
: a symbolic error code for the specific error that occurred.message
: an English description of the error.
- Error objects may also contain these fields:
input
: the provided input which triggered the error.parameter
: the parameter in which theinput
was supplied.input_type
& expected_type`: In the case of a type mismatch, these fields list the data types that were provided and expected.
Parameter names without an underscore prefix are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
For example: Use the parameter school.region_id=6
to only fetch records with a school.region_id
value of 6
.
Certain text fields in the dataset - those with the autocomplete
data type - allow querying with a list of words. To search for a given word or string of words in those fields, just provide a list of space-separated words. This will return all records where the given field contains the given words as part of a string. Note that all given words have to be at least three characters long.
For example: To search for school names containing the words New York
, use this parameter: school.name=New%20York
(%20
is a URL-encoded space) This will match all of these names:
New York College of Health Professions
American Academy of Dramatic Arts-New York
School of Professional Horticulture at the New York Botanical Garden
The New College of York
(because the parameter words don't have to be found together)Royal College of New Yorkminster
(because parameter words are matched as parts of other words)
but not this name:
New England School of Arts
(becauseYork
is missing)
To filter by a set of strings or integers you can provide a comma-separated list of values. This will query the field for an exact match with any of the supplied values.
For example: school.degrees_awarded.predominant=2,3,4
will match records with a school.degrees_awarded.predominant
value of 2
, 3
or 4
.
Note: Value lists with wildcards or floating-point numbers are not currently supported.
To exclude a set of records from results, use a negative match (also known as an inverted match). Append the characters __not
to the parameter name to specify a negative match.
For example: school.region_id__not==5
matches on records where the school.region_id
does not equal 5
.
To match on field values in a particular numeric range, use a range match. Append the characters __range
to the parameter name to specify a range match, and provide two numbers separated by two periods (..
).
For example: 2013.student.size__range=100..500
matches on schools which had between 100 and 500 students in 2013.
Open-ended ranges can be performed by omitting one side of the range. For example: 2013.student.size__range=1000..
matches on schools which had over 1000 students.
You can even supply a list of ranges, separated by commas. For example, For example: 2013.student.size__range=..100,1000..2000,5000..
matches on schools which had under 100 students, between 1000 and 2000 students, or over 5000 students.
- Both integer and floating-point ranges are supported.
- The left-hand number in a range must be lower than the right-hand number.
- Ranges are inclusive. For example, the range
3..7
matches both3
and7
, as well as the numbers in between.
You can perform extra refinement and organisation of search results using option parameters. These special parameters have names beginning with an underscore character (_
).
By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the _fields
option parameter. This parameter takes a comma-separated list of field names. For example: _fields=id,school.name,school.state
will return result records that only contain those three fields.
Requesting specific fields in the response will significantly improve performance and reduce JSON traffic, and is recommended.
By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the _page
option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use _page=1
. Remember that the total number of records available for a given query is given in the total
field of the top-level metadata
object.
You can also change the number of records returned per page using the _per_page
option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
To sort results by a given field, use the _sort
option parameter. For example, _sort=population
will return records sorted by population size, in ascending order.
By default, using the _sort_
option returns records sorted into ascending order, but you can specify ascending or descending order by appending :asc
or :desc
to the field name. For example: _sort=population:desc
Note: Sorting is only availble on fields with the data type integer
, float
, autocomplete
or name
.
When the dataset includes a location
at the root level (location.lat
and
location.lon
) then the documents will be indexed geographically. You can use the _zip
and _distance
options to narrow query results down to those within a geographic area. For example, _zip=12345&_distance=10mi
will return only those results within 10 miles of the center of the given zip code.
- By default, any number passed in the
_distance
parameter is treated as a number of miles, but you can specify miles or kilometers by appendingmi
orkm
respectively. - Distances are calculated from the center of the given zip code, not the boundary.
- Only U.S. zip codes are supported.