Changelog
This document includes all meaningful changes made to the Data Package Standard specifications. It does not cover changes made to other documents like Recipes or Guides.
v2.0-draft
April 1, 2024
Overview
The Data Package (v2) draft release includes a rich set of the specification improvements accepted by the Data Package Working Group during the active phase of the Data Package (v2) work.
Data Package
version
(updated)
version
is now included in the specification, while in Data Package v1 it was erroneously only part of the documentation (#3).
contributors
(updated)
contributors
was updated:
contributor.title
is no longer required (#7).contributor.givenName
andcontributor.familyName
are new properties to specify the given and family name of contributor, if it is a person (#20).contributor.role
has been deprecated in favour ofcontributor.roles
, see further (#18).contributor.roles
is a new property that allows to specify multiple roles per contributor, rather than having to duplicate the contributor. It recommendeds to follow an established vocabulary and has suggested values that are different from the deprecatedcontributor.role
(#18).
sources
(updated)
sources
was updated:
source.title
is no longer required (#7).source.version
is a new property to specify which version of a source was used (#10).
Data Resource
name
(updated)
name now allows any string. It previously required the name to only consist of lowercase alphanumeric characters plus .
, -
and _
. The property is still required and must be unique among resources (#27).
path
(updated)
path now explicitely forbids hidden folders (starting with dot .
) (#19).
encoding
(updated)
encoding’s definition has been updated to support binary formats like Parquet (#15).
Table Dialect
Table Dialect is a new specification that superseeds and extends the CSV Dialect specification. It support other formats like JSON or Excel (#41).
Table Schema
Schema
fieldsMatch
(new)
fieldsMatch allows to specify how fields in a Table Schema match the fields in the data source. The default (exact
) matches the Data Package v1 behaviour, but other values (e.g. subset
, superset
) allow to define fewer or more fields and match on field names. This new property extends and makes explicit the schema_sync
option in Frictionless Framework (#39).
primaryKey
(updated)
primaryKey
should now always be an array of strings, not a string (#28).
uniqueKeys
(new)
uniqueKeys
allows to specify which fields are required to have unique logical values. It is an alternative to field.contraints.unique
and is modelled after the corresponding SQL feature (#30).
foreignKeys
(updated)
foreignKeys
was updated:
- It should now always be an array of strings, not a string (#28).
foreignKeys.reference.resource
can now be omitted for self-referencing foreign keys. Previously it required settingresource
to an empty string (#29).
Fields
missingValues
(new)
missingValues
allows to specify missing values per field, and overwrites missingValues
specified at a resource level (#24).
Field Types
integer
(updated)
integer
now has a groupChar
property. It was already available for number
(#6).
list
(new)
list
allows to specify fields containing collections of primary values separated by a delimiter (e.g. value1,value2
) (#38).
datetime
(updated)
datetime
’s default format
is now extended to allow optional milliseconds and timezone parts (#23).
geopoint
(updated)
geopoint
’s definition now clarifies that floating point numbers can be used for coordinate definitions (#14).
any
(updated)
any
is now the default field type and clarifies that the field type should not be inferred if not provided (#13).
Field Constraints
minimum
and maximum
(updated)
minimum
and maximum
are now extended to support the duration
field type (#8).
exclusiveMinimum
and exclusiveMaximum
(new)
exclusiveMinimum
and exclusiveMaximum
can be used to specify exclusive minimum and maximum values (#11).
jsonschema
(new)
jsonSchema
can be used for the object
and array
field types (#32).
v1.0
September 5, 2017
Please refer to the Data Package (v1) website.