Data Resource
| Authors | Rufus Pollock, Paul Walsh, Adam Kariv, Evgeny Karev, Peter Desmet, Data Package Working Group |
|---|---|
| Profile | /profiles/2.0/dataresource.json |
A simple format to describe and package a single data resource such as an individual table or file. The essence of a Data Resource is a locator for the data it describes. A range of other properties can be declared to provide a richer set of metadata.
Language
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.
Descriptor
Data Resource descriptor MUST be a descriptor as per Descriptor definition. A list of standard properties that can be included into a descriptor is defined in the Properties section.
An example of a Data Resource descriptor:
{ "name": "solar-system", "path": "http://example.com/solar-system.csv", "title": "The Solar System", "description": "My favourite data about the solar system.", "format": "csv", "mediatype": "text/csv", "encoding": "utf-8", "bytes": 1, "hash": ..., "schema": ..., "sources": [ ... ], "licenses": [ ... ]}Properties
Standard properties of the descriptor are described below. A descriptor MAY include any number of properties in additional to those described below as required and optional properties.
General
The properties below are applicable to any Data Resource.
name [required]
A resource MUST contain a name property. The name is a simple name or identifier to be used for this resource.
- It
MUSTbe unique amongst all resources in this data package. - It
SHOULDbe human-readable and consist only of lowercase English alphanumeric characters plus.,-and_. - It would be usual for the name to correspond to the file name (minus the extension) of the data file the resource describes.
path or data [required]
A resource MUST contain a property describing the location of the data associated to the resource. The location of resource data MUST be specified by the presence of one (and only one) of these two properties:
path: for data in files located online or locally on disk.data: for data inline in the descriptor itself.
Single File
If a resource have only a single file then path MUST be a string that a “url-or-path” as defined in the URL of Path definition.
Multiple Files
Usually, a resource will have only a single file associated to it. However, sometimes it can be convenient to have a single resource whose data is split across multiple files — perhaps the data is large and having it in one file would be inconvenient.
To support this use case the path property MAY be an array of strings rather than a single string:
{ "path": ["myfile1.csv", "myfile2.csv"]}It is NOT permitted to mix fully qualified URLs and relative paths in a path array: strings MUST either all be relative paths or all URLs.
Inline Data
Resource data rather than being stored in external files can be shipped inline on a Resource using the data property.
The value of the data property can be any type of data. However, restrictions of JSON require that the value be a string so for binary data you will need to encode (e.g. to Base64). Information on the type and encoding of the value of the data property SHOULD be provided by the format (or mediatype) property and the encoding property.
The value of the data property MUST be either:
- JSON array or object: the data is then assumed to be JSON data and SHOULD be processed as such
- JSON string: in this case the format or mediatype properties
MUSTbe provided.
Thus, a consumer of resource object MAY assume if no format or mediatype property is provided that the data is JSON and attempt to process it as such.
For example, inline JSON:
{ "resources": [ { "format": "json", "data": [{ "a": 1, "b": 2 }] } ]}Or inline CSV:
{ "resources": [ { "format": "csv", "data": "A,B,C\n1,2,3\n4,5,6" } ]}type
A Data Resource descriptor MAY contain a property type that MUST be a string with the following possible values:
table: indicates that the resource is tabular as per Tabular Data definition. Please read more about Tabular Resource properties.
If property type is not provided, the resource is considered to be a non-specific file. An implementation MAY provide some additional interfaces, for example, tabular, to non-specific files if type can be detected from the data source or format.
$schema
A root level Data Resource descriptor MAY have a $schema property that MUST be a profile as per Profile definition that MUST include all the metadata constraints required by this specification.
The default value is https://datapackage.org/profiles/1.0/dataresource.json and the recommended value is https://datapackage.org/profiles/2.0/dataresource.json.
title
Title or label for the resource.
description
Description of the resource.
format
Would be expected to be the standard file extension for this type of resource. For example, csv, xls, json etc.
mediatype
The mediatype/mimetype of the resource e.g. text/csv, or application/vnd.ms-excel. Mediatypes are maintained by the Internet Assigned Numbers Authority (IANA) in a media type registry.
encoding
The character encoding of resource’s data file (only applicable for textual files). The value SHOULD be one of the “Preferred MIME Names” for a character encoding registered with IANA. If no value for this property is specified then the encoding SHOULD be detected on the implementation level. It is RECOMMENDED to use UTF-8 (without BOM) as a default encoding for textual files.
bytes
Size of the file in bytes.
hash
The MD5 hash for this resource. Other algorithms can be indicated by prefixing the hash’s value with the algorithm name in lower-case. For example:
{ "hash": "sha1:8843d7f92416211de9ebb963ff4ce28125932878"}sources
List of data sources as for Data Package. If not specified the resource inherits from the data package.
licenses
List of licenses as for Data Package. If not specified the resource inherits from the data package.
Tabular
The properties below are applicable to any Tabular Data Resource.
path or data [required]
If the path property is used for providing data than it MUST contain Tabular Data.
If the data property is used for providing data for a Tabular Data Resource than it MUST be an array where each item in the array MUST be either:
- an array where each entry in the array is the value for that cell in the table OR
- an object where each key corresponds to the header for that row and the value corresponds to the cell value for that row for that header.
Array of arrays example:
[ ["A", "B", "C"], [1, 2, 3], [4, 5, 6]]Array of objects example:
[ { "A": 1, "B": 2, "C": 3 }, { "A": 4, "B": 5, "C": 6 }]dialect
A Tabular Data Resource MAY have a dialect property to describe a tabular dialect of the resource data. If provided, the dialect property MUST be a Table Dialect descriptor in a form of an object or URL-or-Path.
An example of a resource with a dialect:
{ "name": "table", "type": "table", "path": "table.csv", "dialect": { "delimiter": ";" }}schema
A Tabular Data Resource SHOULD have a schema property to describe a tabular schema of the resource data. If provided, the schema property MUST be a Table Schema descriptor in a form of an object or URL-or-Path.
An example of a resource with a schema:
{ "name": "table", "type": "table", "path": "table.csv", "schema": { "fields": [ { "name": "id", "type": "integer" }, { "name": "name", "type": "string" } ] }}