Dataset metadata

Each dataset includes extensive metadata, covering the content, origin, data quality, temporal coverage and export forms of the relevant data source or collection.

Data catalogs

Metadata is published in two different forms: as a dataset index file, or as a data catalog. Catalogs combine the metadata for multiple datasets into one file, with metadata for each dataset included in an array named datasets. The main data catalog is published with each update of the default collection, and located at:

https://data.opensanctions.org/datasets/latest/index.json

Polling this file in regular intervals (e.g. every 30 minutes) is best practice for finding out if updated data has been released. Please make sure that any integration considers both the addition and removal of datasets in the catalog.

Dataset metadata

The most important piece of metadata for any dataset is its name. Names are lowercase, underscore-linked short identifiers (eg. us_ofac_sdn) used in the actual entity data to reference a data source. The dataset name is also reflected in the URL of the dataset profile on this website, and can be used to mint a URL for the rest of the metadata like this:

https://data.opensanctions.org/datasets/latest/us_ofac_sdn/index.json

Inside the metadata index file (or a catalog entry), the following fields can be found:

SectionFieldDescription
run_timeTimestamp when the entire index was last updated
dataset
nameDataset’s unique identifier
titleHuman-readable title
summaryShort summary string
descriptionDetailed description of the dataset in markdown syntax
tagsList of tags (see below).
index_urlURL to dataset metadata file
versionLatest dataset version. Each data update produces a new version ID, and version IDs can be relied on to be sortable strings.
entity_countNumber of entities included in the export
target_countNumber of targets included in the export
thing_countNumber of things included in the export. Things are a subset of entities that represent physical objects, eg. people, companies, vessels, etc.
last_changeTimestamp when any entity in the dataset last changed. This marks when the system discovered the change, not when published at source. Also note that changes to our data cleaning tools may result in changes reflected here as well.
last_exportTimestamp of the most recent dataset crawl and export. This is the time of when the process in question was started, not when the resulting data was uploaded to our public archive.
typeDataset type (source, collection or external - the latter for enrichment sources)
datasetsAll data sources (and enrichment datasets) included in this collection
statistics_urlURL of a JSON file with detailed summary statistics about the dataset.
delta_urlIndex of version-to-version delta URLs for incremental updates.
resourcesArray of objects describing associated files, including exports and source data.
children(For internal use)
entry_point(For internal use)
issues_url(For internal use)
issue_levels(For internal use)
updated_atUse last_export instead.
coverageCoverage metadata object.
startDate of the first time the dataset was included in the database.
countriesNot used. Look into statistics_url instead.
frequencyOne of: never, hourly, daily, weekly, monthly, annually
scheduleA more precise (cron-style) specification of the update frequency
publisher
namePublishing source name
acronymPubshlishing source acronym (e.g. OFAC)
descriptionDetailed description of publishing source, uses markdown.
urlLink to the publisher's home page
countryOriginating country (code) of publishing source
country_labelOriginating country (name) of publishing source
officialtrue if the publisher is a government or inter-governmental organization.
resources

Dataset tags

Tags help describe and organize datasets. They are assigned based the characteristics and content of each data source - such as target country, legal jurisdiction, or list type (e.g. sanctions, PEPs, regulatory actions). Tags make it easier to filter, explore, and compare data across sources.

TagDescription
list.pepPolitically exposed persons (PEPs).
list.regulatoryRegulatory watchlists and compliance registers.
list.riskRisk-related datasets used for screening and due diligence.
list.sanctionOfficial sanctions lists.
list.sanction.counterSanctions issued by states with weak democratic institutions.
list.sanction.euEuropean Union sanctions.
list.wanted Individuals sought by law enforcement.
sector.bankingBanking and financial institutions reference data.
sector.financialBroader financial services sector.
sector.maritimeMaritime sanctions and datasets that mention shipping vessels.
sector.medicalHealthcare-related regulatory actions.
sector.securitiesSecurities and capital markets.
sector.usmed.debarmentU.S. providers excluded from federal healthcare programs.
issuer.west Originates from or aligned with a "Western" government coalition (especially in the context of the Russian invasion of Ukraine).
juris.euEU jurisdiction: relevant for entities regulated under EU financial market rules.
risk.kleptoInvolvement in kleptocracy or grand corruption.
target.*Identifies the primary country or jurisdiction targeted by the dataset.
OSZAR »