Skip to content

Changelog

This document includes all meaningful changes made to the Data Package standard. It does not cover changes made to other documents like Recipes or Guides.

v2.0

This release includes a rich set of specification improvements to make Data Package a finished product (see announcement). All changes were reviewed and accepted by the Data Package Working Group.

June 26, 2024

Tabular Data Package (removed)

The Tabular Data Package (package.profile: "tabular-data-package") is removed. It does not add any benefits over defining type: "table" (previously resource.profile: "tabular-data-resource") for its resources, which is more modular (#52).

package.$schema (new)

$schema replaces the profile property and allows easier extension and versioning (#42).

package.contributors (updated)

contributors was updated:

  • contributor.title is no longer required (#7).
  • contributor.givenName and contributor.familyName are new properties to specify the given and family name of contributor, if it is a person (#20).
  • contributor.role has been deprecated in favour of contributor.roles, see further (#18).
  • contributor.roles is a new property that allows to specify multiple roles per contributor, rather than having to duplicate the contributor. It recommendeds to follow an established vocabulary and has suggested values that are different from the deprecated contributor.role (#18).
package.version (updated)

version is now included in the specification, while in Data Package v1 it was erroneously only part of the documentation (#3).

package.sources (updated)

sources was updated:

  • source.title is no longer required (#7).
  • source.version is a new property to specify which version of a source was used (#10).
resource.name (updated)

name now allows any string. It previously required the name to only consist of lowercase alphanumeric characters plus ., - and _. The property is still required and must be unique among resources (#27).

resource.path (updated)

path now explicitely forbids hidden folders (starting with dot .) (#19).

resource.type (new)

type allows to specify the resource type (#51). resource.type: "table" replaces resource.profile: "tabular-data-resource".

resource.$schema (new)

$schema replaces the profile property and allows easier extension and versioning (#42). See also resource.type.

resource.encoding (updated)

encoding’s definition has been updated to support binary formats like Parquet (#15).

resource.sources (updated)

sources now inherits from a containing data package (#57).

Table Dialect (new)

Table Dialect is a new specification that superseeds and extends the CSV Dialect specification. It support other formats like JSON or Excel (#41).

dialect.schema (new)

schema allows extension and versioning (#42).

dialect.table (new)

table allows to specify a table in a database (#64).

schema.$schema (new)

$schema allows extension and versioning (#42).

schema.fieldsMatch (new)

fieldsMatch allows to specify how fields in a Table Schema match the fields in the data source. The default (exact) matches the Data Package v1 behaviour, but other values (e.g. subset, superset) allow to define fewer or more fields and match on field names. This new property extends and makes explicit the schema_sync option in Frictionless Framework (#39).

schema.missingValues (updated)

missingValues now allow to specify labeled missingness (#68).

schema.primaryKey (updated)

primaryKey should now always be an array of strings, not a string (#28).

schema.uniqueKeys (new)

uniqueKeys allows to specify which fields are required to have unique logical values. It is an alternative to field.contraints.unique and is modelled after the corresponding SQL feature (#30).

schema.foreignKeys (updated)

foreignKeys was updated:

  • It should now always be an array of strings, not a string (#28).
  • foreignKeys.reference.resource can now be omitted for self-referencing foreign keys. Previously it required setting resource to an empty string (#29).
field.categories (new)

categories adds support for categorical data for the string and integer field types (#68).

field.categoriesOrdered (new)

categoriesOrdered adds support for ordered categorical data for the string and integer field types (#68).

field.missingValues (new)

missingValues allows to specify missing values per field, and overwrites missingValues specified at a resource level (#24).

integer field type (updated)

integer now has a groupChar property. It was already available for number (#6).

list field type (new)

list allows to specify fields containing collections of primary values separated by a delimiter (e.g. value1,value2) (#38).

datetime field type (updated)

datetime’s default format is now extended to allow optional milliseconds and timezone parts (#23).

geopoint field type (updated)

geopoint’s definition now clarifies that floating point numbers can be used for coordinate definitions (#14).

any field type (updated)

any is now the default field type and clarifies that the field type should not be inferred if not provided (#13).

minimum and maximum field constraints (updated)

minimum and maximum are now extended to support the duration field type (#8).

exclusiveMinimum and exclusiveMaximum field constraints (new)

exclusiveMinimum and exclusiveMaximum can be used to specify exclusive minimum and maximum values (#11).

jsonschema field constraint (new)

jsonSchema can be used for the object and array field types (#32).

v1.0

September 5, 2017

Please refer to the Data Package (v1) website.