Skip to content

Comparison with MediaWiki Tabular Data

Authors Jakob Voß

MediaWiki is the software used to run Wikipedia and related projects of the Wikimedia Foundation, including the media file repository Wikimedia Commons. Commons hosts mostly images but also some records with tabular data. The MediaWiki Tabular Data Model was inspired by Data Package version 1 but it slightly differs from current Data Package specification, as described below.

Property Comparison

A MediaWiki tabular data page describes and contains an individual table of data similar to a Data Resource with inline tabular data. Both are serialized as JSON objects, but the former comes as a page with unique name in a MediaWiki instance (such as Wikimedia Commons).

Top-level Properties

MediaWiki Tabular Data has three required and two optional top-level properties. Most of these properties map to corresponding properties of a Data Resource:

MediaWiki Tabular DataData Package Table Schema
- (implied by page name)name (required) is a string
description (optional) is a localized stringdescription (optional) is a CommonMark string
data (required)data (optional)
license (required) is the string CC0-1.0 or another known identifierlicenses (optional) is an array
schema (required) as described belowschema (optional) can have multiple forms
sources (optional) is a string with Wiki markupsources (optional) is an array of objects

The differences are:

  • property name does not exist but can be implied from page name
  • property description and sources have another format
  • property data is always an array of arrays and data types of individual values can differ
  • property schema is required but it differs in definion of schema properties
  • there is no property licenses but license fixed to plain string value CC0-1.0 (other license indicators may be possible)

Data Types

Tabular Data supports four data types that overlap with Table Schema data types:

  • number subset of Table Schema number (no NaN, INF, or -INF)
  • boolean same as Table Schema boolean
  • string subset of Table Schema string (limited to 400 characters at most and must not include \n or \t)
  • localized refers to an object that maps language codes to strings with same limitations as string type. This type is not supported in Table Schema.

Individual values in a MediaWiki Tabular Data table can always be null, while in Table Schema you need to explicitly list values that should be considered missing in schema.missingValues.

Schema Properties

The schema property of MediaWiki tabular contains an object with property fields just like Table Schema but no other properties are allowed. Elements of this array are like Table Schema field descriptors limited to three properties and different value spaces:

MediaWiki Tabular DataData Package Table Schema
name (required) must be a string matching ^[a-zA-Z_][a-zA-Z_0-9]*name (required) can be any string
type (required) is one of the Data Types abovetype (optional) with different data types
title (optional) is a localized stringtitle (optional) is a plain string