Comparison with CSVW
Authors | Peter Desmet, Evgeny Karev, Sara Petti |
---|
In 2016, a working group chartered by W3C created a set of recommendations for documenting and accessing CSV files, collectively called CSVW on the Web (CSVW). These show extensive similarities with the Data Package standard.
Below we give an overview of the (dis)similarities between the CSVW and Data Package.
Scope
CSVW is a set of specifications to describe delimited text files (CSV, TSV) only (i.e. Model for Tabular Data and Metadata on the Web and Metadata Vocabulary for Tabular Data).
Data Package is a set of specifications to describe datasets, data files and tabular data. It can describe data files that are not tabular and includes extra specifications for those that are (i.e. Table Dialect and Table Schema).
Maintenance
CSVW was published in 2016. Its working group was closed at the time. There is currently a community group, but it does not have the mandate to make changes to the standard.
Data Package was published in 2012, updated to v1.0 in 2017 and v2.0 in 2024. It is maintained by the Open Knowledge Foundation (OKFN). Requests for changes can be proposed by anyone as a GitHub pull request and are decided upon by a working group which includes members from several organizations.
Adoption and Software Support
We could not find documentation on CSVW adoption, but Swirrl maintains a website at csvw.org that explains the standard and provides an overview of software tools. A number of these are conversion tools, but software libraries are available for 4 programming languages (Java, Python, R, Ruby), with Python the most popular (37 stars). Of note is CSV lint: an online tool to validate CSV files.
Data Package has been adopted by a wide range of organizations and projects, many of which have extended the standard and created software. Software libraries are available for 9 programming languages (Go, Java, Javascript, Julia, PHP, Python, R, Ruby, Swift), with Python the most popular (708 stars). These are all maintained as open source software under the Frictionless Data organization on GitHub. Of note is the Open Data Editor: a fully-featured Data Package editor for non-technical users.
Extensibility and Use of Other Standards
CSVW makes extensive use of other standards (such as JSON-LD, XML Schema Datatypes, Compact URIs) and defines how to use those. It also has specifications on how to transform CSVW to JSON and RDF. CSVW is not defined as a JSON Schema.
Data Package defines its own properties for often-used properties (such as description, contributors, licences, sources), none of which are required. Users can include properties from other standards as custom properties. Data Package is defined as a JSON Schema and is designed to be extensible: developers can add or extend properties by referencing a JSON schema in @schema, which are then automatically picked up by validation software. There is also software available to convert Data Package to other metadata standards, such as DataCite and DCAT.
Linking Data with Metadata
CSVW defines different methods of locating metadata.
Data Package metadata are described in a descriptor file named datapackage.json
. This file links to the data file(s) using resource.path and can reference external dialects, schemas, and (domain-specific) specifications.
Property Comparison
Below is a list of all properties defined in CSVW’s Metadata Vocabulary for Tabular Data (version 20151217) and how these are supported in Data Package (v2.0).
Property Syntax
CSVW property | Data Package support | Details |
---|---|---|
Array properties | Yes | |
Link properties | Partial | URLs and paths are supported, but not with a @base URL in @context |
URI template properties | No | |
Column reference properties | Yes | E.g. schema.primaryKey |
Object properties | Yes | E.g. resource.schema |
Natural language properties | Yes | E.g. resource.path |
Atomic properties | Yes |
Top-level Properties
CSVW property | Data Package support | Details |
---|---|---|
@context | No |
Table Groups
CSVW property | Data Package support | Details |
---|---|---|
tables | Yes | Tables can be defined as package.resources with "type": "table" .Tables that are the same in terms of structure, format, dialect, etc. can be defined as a single resource with multiple files in resource.path. Implementations will concatenate those tables. |
dialect | Yes | As resource.dialect for multiple files in resource.path |
notes | Custom property | |
tableDirection | No | |
tableSchema | Yes | As resource.schema for multiple files in resource.path |
transformations | No | |
@id | Custom property | |
@type | No |
Tables
CSVW property | Data package support | Details |
---|---|---|
url | Yes | As resource.path |
dialect | Yes | As resource.dialect |
notes | Custom property | |
suppressOutput | No | |
tableDirection | No | |
tableSchema | Yes | As resource.schema |
transformations | No | |
@id | Custom property | Tables are identified by resource.name |
@type | Yes | As resource.type |
Schemas
CSVW property | Data package support | Details |
---|---|---|
columns | Yes | As schema.fields |
foreignKeys | Yes | As schema.foreignKeys |
primaryKey | Yes | As schema.primaryKey |
rowTitles | No | Titles are defined per field as field.title |
@id | Custom property | |
@type | No |
Columns
CSVW property | Data package support | Details |
---|---|---|
name | Yes | As field.name |
suppressOutput | No | |
titles | Partial | As field.title (a single value) |
virtual | No | |
@id | Custom property | |
@type | No |
Inherited Properties
Data Package properties do not inherit from their parent, unless otherwise specified (e.g. resource.sources). The properties listed below only exist at one level in Data Package, except for missingValues
.
CSVW property | Data package support | Details |
---|---|---|
aboutUrl | Custom property | |
datatype | Yes | As field.type |
default | No | |
lang | No | Suggested as recipe |
null | Yes | As field.missingValues and schema.missingValues with options to define multiple values and labels |
ordered | Yes | As field.categoriesOrdered |
propertyUrl | Partial | As field.rdfType |
required | Yes | As required field constraint |
separator | Yes | As delimiter in list field type |
textDirection | No | |
valueUrl | No |
Common Properties
Common properties can be added in Data Package as custom properties, which just like CSVW recommends prefixed names. Note however that Data Package has its own definitions for often-used properties (e.g. description), which can exist at all levels.
Dialect Descriptions
CSVW property | Data package support | Details |
---|---|---|
commentPrefix | Yes | As dialect.commentChar |
delimiter | Yes | As dialect.delimiter |
doubleQuote | Yes | As dialect.doubleQuote |
encoding | Yes | As resource.encoding |
header | Yes | As dialect.header |
headerRowCount | Yes | As dialect.headerRows |
lineTerminators | Yes | As dialect.lineTerminator |
quoteChar | Yes | As dialect.quoteChar |
skipBlankRows | No | |
skipColumns | No | |
skipInitialSpace | Yes | As dialect.skipInitialSpace |
skipRows | No | As dialect.commentRows |
trim | No | |
@id | Custom property | |
@type | No |
Transformation Definitions
Not supported in Data Package.
Data Types
CSVW defines data types as built-in data types and derived data types. A derived data type extends a built-in data type with formats, constraints, etc. Data Package does not make that distinction, but rather defines a number of Field Types. Depending on the type, it can be extended with a format
and constraints.
CSVW property | Data package support | Details |
---|---|---|
base | No | All types are defined as field.type |
format | Yes | As field.format |
length | No | |
minLength | Yes | As minLength field constraint |
maxLength | Yes | As maxLength field constraint |
minimum | Yes | As minimum field constraint |
maximum | Yes | As maximum field constraint |
minInclusive | Yes | As minimum field constraint |
maxInclusive | Yes | As maximum field constraint |
minExclusive | Yes | As exclusiveMinimum field constraint |
maxExclusive | Yes | As exclusiveMaximum field constraint |
@id | Custom property | |
@type | No |