Dataset metadata

Each dataset includes extensive metadata, covering the content, origin, data quality, temporal coverage and export forms of the relevant data source or collection.

Data catalogs

Metadata is published in two different forms: as a dataset index file, or as a data catalog. Catalogs combine the metadata for multiple datasets into one file, with metadata for each dataset included in an array named datasets. The main data catalog is published with each update of the default collection, and located at:

https://data.opensanctions.org/datasets/latest/index.json

Polling this file in regular intervals (e.g. every 30 minutes) is best practice for finding out if updated data has been released. Please make sure that any integration considers both the addition and removal of datasets in the catalog.

Dataset metadata

The most important piece of metadata for any dataset is its name. Names are lowercase, underscore-linked short identifiers (eg. us_ofac_sdn) used in the actual entity data to reference a data source. The dataset name is also reflected in the URL of the dataset profile on this website, and can be used to mint a URL for the rest of the metadata like this:

https://data.opensanctions.org/datasets/latest/us_ofac_sdn/index.json

Inside the metadata index file (or a catalog entry), the following fields can be found:

Section	Field	Description
`run_time`		Timestamp when the entire index was last updated
`dataset`
	`name`	Dataset’s unique identifier
	`title`	Human-readable title
	`summary`	Short summary string
	`description`	Detailed description of the dataset in markdown syntax
	`tags`	List of tags (see below).
	`index_url`	URL to dataset metadata file
	`version`	Latest dataset version. Each data update produces a new version ID, and version IDs can be relied on to be sortable strings.
	`entity_count`	Number of entities included in the export
	`target_count`	Number of targets included in the export
	`thing_count`	Number of things included in the export. Things are a subset of entities that represent physical objects, eg. people, companies, vessels, etc.
	`last_change`	Timestamp when any entity in the dataset last changed. This marks when the system discovered the change, not when published at source. Also note that changes to our data cleaning tools may result in changes reflected here as well.
	`last_export`	Timestamp of the most recent dataset crawl and export. This is the time of when the process in question was started, not when the resulting data was uploaded to our public archive.
	`type`	Dataset type (source, collection or `external` - the latter for enrichment sources)
	`datasets`	All data sources (and enrichment datasets) included in this collection
	`statistics_url`	URL of a JSON file with detailed summary statistics about the dataset.
	`delta_url`	Index of version-to-version delta URLs for incremental updates.
	`resources`	Array of objects describing associated files, including exports and source data.
	`children`	(For internal use)
	`entry_point`	(For internal use)
	`issues_url`	(For internal use)
	`issue_levels`	(For internal use)
	`updated_at`	Use `last_export` instead.
`coverage`		Coverage metadata object.
	`start`	Date of the first time the dataset was included in the database.
	`countries`	Not used. Look into `statistics_url` instead.
	`frequency`	One of: `never`, `hourly`, `daily`, `weekly`, `monthly`, `annually`
	`schedule`	A more precise (cron-style) specification of the update frequency
`publisher`
	`name`	Publishing source name
	`acronym`	Pubshlishing source acronym (e.g. OFAC)
	`description`	Detailed description of publishing source, uses markdown.
	`url`	Link to the publisher's home page
	`country`	Originating country (code) of publishing source
	`country_label`	Originating country (name) of publishing source
	`official`	`true` if the publisher is a government or inter-governmental organization.
`resources`

Dataset tags

Tags help describe and organize datasets. They are assigned based the characteristics and content of each data source - such as target country, legal jurisdiction, or list type (e.g. sanctions, PEPs, regulatory actions). Tags make it easier to filter, explore, and compare data across sources.

Tag	Description
`list.pep`	Politically exposed persons (PEPs).
`list.regulatory`	Regulatory watchlists and compliance registers.
`list.risk`	Risk-related datasets used for screening and due diligence.
`list.sanction`	Official sanctions lists.
`list.sanction.counter`	Sanctions issued by states with weak democratic institutions.
`list.sanction.eu`	European Union sanctions.
`list.wanted`	Individuals sought by law enforcement.
`sector.banking`	Banking and financial institutions reference data.
`sector.financial`	Broader financial services sector.
`sector.maritime`	Maritime sanctions and datasets that mention shipping vessels.
`sector.medical`	Healthcare-related regulatory actions.
`sector.securities`	Securities and capital markets.
`sector.usmed.debarment`	U.S. providers excluded from federal healthcare programs.
`issuer.west`	Originates from or aligned with a "Western" government coalition (especially in the context of the Russian invasion of Ukraine).
`juris.eu`	EU jurisdiction: relevant for entities regulated under EU financial market rules.
`risk.klepto`	Involvement in kleptocracy or grand corruption.
`target.*`	Identifies the primary country or jurisdiction targeted by the dataset.