Skip to content

Comparison with CSVW

Authors Peter Desmet, Evgeny Karev, Sara Petti

In 2016, a working group chartered by W3C created a set of recommendations for documenting and accessing CSV files, collectively called CSVW on the Web (CSVW). These show extensive similarities with the Data Package standard.

Below we give an overview of the (dis)similarities between the CSVW and Data Package.

Scope

CSVW is a set of specifications to describe delimited text files (CSV, TSV) only (i.e. Model for Tabular Data and Metadata on the Web and Metadata Vocabulary for Tabular Data).

Data Package is a set of specifications to describe datasets, data files and tabular data. It can describe data files that are not tabular and includes extra specifications for those that are (i.e. Table Dialect and Table Schema).

Maintenance

CSVW was published in 2016. Its working group was closed at the time. There is currently a community group, but it does not have the mandate to make changes to the standard.

Data Package was published in 2012, updated to v1.0 in 2017 and v2.0 in 2024. It is maintained by the Open Knowledge Foundation (OKFN). Requests for changes can be proposed by anyone as a GitHub pull request and are decided upon by a working group which includes members from several organizations.

Adoption and Software Support

We could not find documentation on CSVW adoption, but Swirrl maintains a website at csvw.org that explains the standard and provides an overview of software tools. A number of these are conversion tools, but software libraries are available for 4 programming languages (Java, Python, R, Ruby), with Python the most popular (37 stars). Of note is CSV lint: an online tool to validate CSV files.

Data Package has been adopted by a wide range of organizations and projects, many of which have extended the standard and created software. Software libraries are available for 9 programming languages (Go, Java, Javascript, Julia, PHP, Python, R, Ruby, Swift), with Python the most popular (708 stars). These are all maintained as open source software under the Frictionless Data organization on GitHub. Of note is the Open Data Editor: a fully-featured Data Package editor for non-technical users.

Extensibility and Use of Other Standards

CSVW makes extensive use of other standards (such as JSON-LD, XML Schema Datatypes, Compact URIs) and defines how to use those. It also has specifications on how to transform CSVW to JSON and RDF. CSVW is not defined as a JSON Schema.

Data Package defines its own properties for often-used properties (such as description, contributors, licences, sources), none of which are required. Users can include properties from other standards as custom properties. Data Package is defined as a JSON Schema and is designed to be extensible: developers can add or extend properties by referencing a JSON schema in @schema, which are then automatically picked up by validation software. There is also software available to convert Data Package to other metadata standards, such as DataCite and DCAT.

Linking Data with Metadata

CSVW defines different methods of locating metadata.

Data Package metadata are described in a descriptor file named datapackage.json. This file links to the data file(s) using resource.path and can reference external dialects, schemas, and (domain-specific) specifications.

Property Comparison

Below is a list of all properties defined in CSVW’s Metadata Vocabulary for Tabular Data (version 20151217) and how these are supported in Data Package (v2.0).

Property Syntax

CSVW propertyData Package supportDetails
Array propertiesYes
Link propertiesPartialURLs and paths are supported, but not with a @base URL in @context
URI template propertiesNo
Column reference propertiesYesE.g. schema.primaryKey
Object propertiesYesE.g. resource.schema
Natural language propertiesYesE.g. resource.path
Atomic propertiesYes

Top-level Properties

CSVW propertyData Package supportDetails
@contextNo

Table Groups

CSVW propertyData Package supportDetails
tablesYesTables can be defined as package.resources with "type": "table".

Tables that are the same in terms of structure, format, dialect, etc. can be defined as a single resource with multiple files in resource.path. Implementations will concatenate those tables.
dialectYesAs resource.dialect for multiple files in resource.path
notesCustom property
tableDirectionNo
tableSchemaYesAs resource.schema for multiple files in resource.path
transformationsNo
@idCustom property
@typeNo

Tables

CSVW propertyData package supportDetails
urlYesAs resource.path
dialectYesAs resource.dialect
notesCustom property
suppressOutputNo
tableDirectionNo
tableSchemaYesAs resource.schema
transformationsNo
@idCustom propertyTables are identified by resource.name
@typeYesAs resource.type

Schemas

CSVW propertyData package supportDetails
columnsYesAs schema.fields
foreignKeysYesAs schema.foreignKeys
primaryKeyYesAs schema.primaryKey
rowTitlesNoTitles are defined per field as field.title
@idCustom property
@typeNo

Columns

CSVW propertyData package supportDetails
nameYesAs field.name
suppressOutputNo
titlesPartialAs field.title (a single value)
virtualNo
@idCustom property
@typeNo

Inherited Properties

Data Package properties do not inherit from their parent, unless otherwise specified (e.g. resource.sources). The properties listed below only exist at one level in Data Package, except for missingValues.

CSVW propertyData package supportDetails
aboutUrlCustom property
datatypeYesAs field.type
defaultNo
langNoSuggested as recipe
nullYesAs field.missingValues and schema.missingValues with options to define multiple values and labels
orderedYesAs field.categoriesOrdered
propertyUrlPartialAs field.rdfType
requiredYesAs required field constraint
separatorYesAs delimiter in list field type
textDirectionNo
valueUrlNo

Common Properties

Common properties can be added in Data Package as custom properties, which just like CSVW recommends prefixed names. Note however that Data Package has its own definitions for often-used properties (e.g. description), which can exist at all levels.

Dialect Descriptions

CSVW propertyData package supportDetails
commentPrefixYesAs dialect.commentChar
delimiterYesAs dialect.delimiter
doubleQuoteYesAs dialect.doubleQuote
encodingYesAs resource.encoding
headerYesAs dialect.header
headerRowCountYesAs dialect.headerRows
lineTerminatorsYesAs dialect.lineTerminator
quoteCharYesAs dialect.quoteChar
skipBlankRowsNo
skipColumnsNo
skipInitialSpaceYesAs dialect.skipInitialSpace
skipRowsNoAs dialect.commentRows
trimNo
@idCustom property
@typeNo

Transformation Definitions

Not supported in Data Package.

Data Types

CSVW defines data types as built-in data types and derived data types. A derived data type extends a built-in data type with formats, constraints, etc. Data Package does not make that distinction, but rather defines a number of Field Types. Depending on the type, it can be extended with a format and constraints.

CSVW propertyData package supportDetails
baseNoAll types are defined as field.type
formatYesAs field.format
lengthNo
minLengthYesAs minLength field constraint
maxLengthYesAs maxLength field constraint
minimumYesAs minimum field constraint
maximumYesAs maximum field constraint
minInclusiveYesAs minimum field constraint
maxInclusiveYesAs maximum field constraint
minExclusiveYesAs exclusiveMinimum field constraint
maxExclusiveYesAs exclusiveMaximum field constraint
@idCustom property
@typeNo