Data Resource
Authors | Rufus Pollock, Paul Walsh, Adam Kariv, Evgeny Karev, Peter Desmet, Data Package Working Group |
---|---|
Profile | https://datapackage.org/profiles/2.0/dataresource.json |
A simple format to describe and package a single data resource such as an individual table or file. The essence of a Data Resource is a locator for the data it describes. A range of other properties can be declared to provide a richer set of metadata.
Language
The key words MUST
, MUST NOT
, REQUIRED
, SHALL
, SHALL NOT
, SHOULD
, SHOULD NOT
, RECOMMENDED
, MAY
, and OPTIONAL
in this document are to be interpreted as described in RFC 2119.
Descriptor
Data Resource descriptor MUST
be a descriptor as per Descriptor definition. A list of standard properties that can be included into a descriptor is defined in the Properties section.
An example of a Data Resource descriptor:
Properties
Standard properties of the descriptor are described below. A descriptor MAY
include any number of properties in additional to those described below as required and optional properties.
General
The properties below are applicable to any Data Resource.
name
[required]
A resource MUST
contain a name
property. The name is a simple name or identifier to be used for this resource.
- It
MUST
be unique amongst all resources in this data package. - It
SHOULD
be human-readable and consist only of lowercase English alphanumeric characters plus.
,-
and_
. - It would be usual for the name to correspond to the file name (minus the extension) of the data file the resource describes.
path
or data
[required]
A resource MUST
contain a property describing the location of the data associated to the resource. The location of resource data MUST
be specified by the presence of one (and only one) of these two properties:
path
: for data in files located online or locally on disk.data
: for data inline in the descriptor itself.
Single File
If a resource have only a single file then path
MUST
be a string that a “url-or-path” as defined in the URL of Path definition.
Multiple Files
Usually, a resource will have only a single file associated to it. However, sometimes it can be convenient to have a single resource whose data is split across multiple files — perhaps the data is large and having it in one file would be inconvenient.
To support this use case the path
property MAY
be an array of strings rather than a single string:
It is NOT permitted to mix fully qualified URLs and relative paths in a path
array: strings MUST
either all be relative paths or all URLs.
Inline Data
Resource data rather than being stored in external files can be shipped inline
on a Resource using the data
property.
The value of the data property can be any type of data. However, restrictions of JSON require that the value be a string so for binary data you will need to encode (e.g. to Base64). Information on the type and encoding of the value of the data property SHOULD be provided by the format (or mediatype) property and the encoding property.
The value of the data property MUST
be either:
- JSON array or object: the data is then assumed to be JSON data and SHOULD be processed as such
- JSON string: in this case the format or mediatype properties
MUST
be provided.
Thus, a consumer of resource object MAY
assume if no format or mediatype property is provided that the data is JSON and attempt to process it as such.
For example, inline JSON:
Or inline CSV:
type
A Data Resource descriptor MAY
contain a property type
that MUST
be a string with the following possible values:
table
: indicates that the resource is tabular as per Tabular Data definition. Please read more about Tabular Resource properties.
If property type
is not provided, the resource is considered to be a non-specific file. An implementation MAY
provide some additional interfaces, for example, tabular, to non-specific files if type
can be detected from the data source or format.
$schema
A root level Data Resource descriptor MAY
have a $schema
property that MUST
be a profile as per Profile definition that MUST
include all the metadata constraints required by this specification.
The default value is https://datapackage.org/profiles/1.0/dataresource.json
and the recommended value is https://datapackage.org/profiles/2.0/dataresource.json
.
title
Title or label for the resource.
description
Description of the resource.
format
Would be expected to be the standard file extension for this type of resource. For example, csv
, xls
, json
etc.
mediatype
The mediatype/mimetype of the resource e.g. text/csv
, or application/vnd.ms-excel
. Mediatypes are maintained by the Internet Assigned Numbers Authority (IANA) in a media type registry.
encoding
The character encoding of resource’s data file (only applicable for textual files). The value SHOULD
be one of the “Preferred MIME Names” for a character encoding registered with IANA. If no value for this property is specified then the encoding SHOULD
be detected on the implementation level. It is RECOMMENDED
to use UTF-8 (without BOM) as a default encoding for textual files.
bytes
Size of the file in bytes.
hash
The MD5 hash for this resource. Other algorithms can be indicated by prefixing the hash’s value with the algorithm name in lower-case. For example:
sources
List of data sources as for Data Package. If not specified the resource inherits from the data package.
licenses
List of licenses as for Data Package. If not specified the resource inherits from the data package.
Tabular
The properties below are applicable to any Tabular Data Resource.
path
or data
[required]
If the path
property is used for providing data than it MUST
contain Tabular Data.
If the data
property is used for providing data for a Tabular Data Resource than it MUST
be an array
where each item in the array MUST
be either:
- an array where each entry in the array is the value for that cell in the table OR
- an object where each key corresponds to the header for that row and the value corresponds to the cell value for that row for that header.
Array of arrays example:
Array of objects example:
dialect
A Tabular Data Resource MAY
have a dialect
property to describe a tabular dialect of the resource data. If provided, the dialect
property MUST
be a Table Dialect descriptor in a form of an object or URL-or-Path.
An example of a resource with a dialect:
schema
A Tabular Data Resource SHOULD
have a schema
property to describe a tabular schema of the resource data. If provided, the schema
property MUST
be a Table Schema descriptor in a form of an object or URL-or-Path.
An example of a resource with a schema: