Changelog
This document includes all meaningful changes made to the Data Package standard. It does not cover changes made to other documents like Recipes or Guides.
v2.0
This release includes a rich set of specification improvements to make Data Package a finished product (see announcement). All changes were reviewed and accepted by the Data Package Working Group.
June 26, 2024
Tabular Data Package (removed)
The Tabular Data Package (package.profile: "tabular-data-package"
) is removed. It does not add any benefits over defining type: "table"
(previously resource.profile: "tabular-data-resource"
) for its resources, which is more modular (#52).
package.$schema
(new)
$schema
replaces the profile
property and allows easier extension and versioning (#42).
package.contributors
(updated)
contributors
was updated:
contributor.title
is no longer required (#7).contributor.givenName
andcontributor.familyName
are new properties to specify the given and family name of contributor, if it is a person (#20).contributor.role
has been deprecated in favour ofcontributor.roles
, see further (#18).contributor.roles
is a new property that allows to specify multiple roles per contributor, rather than having to duplicate the contributor. It recommendeds to follow an established vocabulary and has suggested values that are different from the deprecatedcontributor.role
(#18).
package.version
(updated)
version
is now included in the specification, while in Data Package v1 it was erroneously only part of the documentation (#3).
package.sources
(updated)
sources
was updated:
source.title
is no longer required (#7).source.version
is a new property to specify which version of a source was used (#10).
resource.name
(updated)
name now allows any string. It previously required the name to only consist of lowercase alphanumeric characters plus .
, -
and _
. The property is still required and must be unique among resources (#27).
resource.path
(updated)
path now explicitely forbids hidden folders (starting with dot .
) (#19).
resource.type
(new)
type
allows to specify the resource type (#51). resource.type: "table"
replaces resource.profile: "tabular-data-resource"
.
resource.$schema
(new)
$schema
replaces the profile
property and allows easier extension and versioning (#42). See also resource.type.
resource.encoding
(updated)
encoding’s definition has been updated to support binary formats like Parquet (#15).
resource.sources
(updated)
sources
now inherits from a containing data package (#57).
Table Dialect (new)
Table Dialect is a new specification that superseeds and extends the CSV Dialect specification. It support other formats like JSON or Excel (#41).
dialect.schema
(new)
schema
allows extension and versioning (#42).
dialect.table
(new)
table
allows to specify a table in a database (#64).
schema.$schema
(new)
$schema
allows extension and versioning (#42).
schema.fieldsMatch
(new)
fieldsMatch allows to specify how fields in a Table Schema match the fields in the data source. The default (exact
) matches the Data Package v1 behaviour, but other values (e.g. subset
, superset
) allow to define fewer or more fields and match on field names. This new property extends and makes explicit the schema_sync
option in Frictionless Framework (#39).
schema.missingValues
(updated)
missingValues
now allow to specify labeled missingness (#68).
schema.primaryKey
(updated)
primaryKey
should now always be an array of strings, not a string (#28).
schema.uniqueKeys
(new)
uniqueKeys
allows to specify which fields are required to have unique logical values. It is an alternative to field.contraints.unique
and is modelled after the corresponding SQL feature (#30).
schema.foreignKeys
(updated)
foreignKeys
was updated:
- It should now always be an array of strings, not a string (#28).
foreignKeys.reference.resource
can now be omitted for self-referencing foreign keys. Previously it required settingresource
to an empty string (#29).
field.categories
(new)
categories
adds support for categorical data for the string
and integer
field types (#68).
field.categoriesOrdered
(new)
categoriesOrdered
adds support for ordered categorical data for the string
and integer
field types (#68).
field.missingValues
(new)
missingValues
allows to specify missing values per field, and overwrites missingValues
specified at a resource level (#24).
integer
field type (updated)
integer
now has a groupChar
property. It was already available for number
(#6).
list
field type (new)
list
allows to specify fields containing collections of primary values separated by a delimiter (e.g. value1,value2
) (#38).
datetime
field type (updated)
datetime
’s default format
is now extended to allow optional milliseconds and timezone parts (#23).
geopoint
field type (updated)
geopoint
’s definition now clarifies that floating point numbers can be used for coordinate definitions (#14).
any
field type (updated)
any
is now the default field type and clarifies that the field type should not be inferred if not provided (#13).
minimum
and maximum
field constraints (updated)
minimum
and maximum
are now extended to support the duration
field type (#8).
exclusiveMinimum
and exclusiveMaximum
field constraints (new)
exclusiveMinimum
and exclusiveMaximum
can be used to specify exclusive minimum and maximum values (#11).
jsonschema
field constraint (new)
jsonSchema
can be used for the object
and array
field types (#32).
v1.0
September 5, 2017
Please refer to the Data Package (v1) website.