Data Schema / Technical guidance / Real world identifiers

Real world identifiers

To create a link between statements, and the real-world organisations and people they relate to, statements may include a range of identifying information. We use a common identifier object, with two required properties, and one optional property.

  • scheme must be a value from a codelist of known identifier sources. Separate codelists exist for entities and persons.
  • id must be the value assigned to the relevant entity or person in that scheme;
    • uri may be used to provide a canonical URI from this scheme.

For example, if a source system holds:

  • A registered company number; and
  • A VAT number;

for a company, two entries could be created in the Entity/identifiers array, as in the example below:

[
    {
        "scheme":"GB-COH",
        "id":"012345678"
    },
    {
        "scheme":"GB-VAT",
        "id":"65251235"
    }
]

Entity Identifiers

The values for scheme within an entity statement identifier should be drawn from the http://org-id.guide codelist. This contains details of 100s of company registers and other identifier sources.

Where the publisher is providing an internal identifier, the publisher should either:

  • Publish their full list of internal identifiers, and register this list with the http://org-id.guide codelist; or
  • Use MISC-{Publisher_Name} as the scheme

Person Identifiers

System identifiers

If the source system has assigned a unique identifier to individual persons, and this identifier can be published, then this should be included with the scheme ‘MISC-{Publisher Name}’.

For example, a beneficial ownership reporting system may maintain a database table of ‘person’ records, each with it’s identifier as a primary key. So that users can recognise references to the same person mentioned in separate statements, this identifier should be included in the published data, either in raw form, or modified to ensure a unique value.

Shared identifiers

If the source system has collected one or more known identification numbers for a person, and these can be published without privacy or security risks, then these should also be included in the Person/identifiers array.

The values for scheme within a person statement should be based on the following pattern:

{JURISDICTION}-{TYPE}

Where jurisdiction is expressed using the extended ISO 3-digit country codes list proposed by in ICAO Document 9303 §5 (pages 22-29).

For example, a passport number from Afghanistan would have the scheme:

> AFG-PASSPORT-{NUMBER}

Where the publisher is providing an internal identifier, these should use ‘MISC-{Publisher_Name}’ as the scheme.

Warning

When using BODS to provide open data, it is important to ensure any person identifiers are suitable for publication under national laws and data protection frameworks.

Most of the identifier types listed below are not suitable for publication as part of an open dataset.

The following identification types are currently documented. Suggestions for new types should be made through the issue tracker.

PASSPORT

Passport numbers should follow the format of the identifier (second) line in a machine-readable passport (see Appendix B to Part 4 of ICAO Doc 9303) including at least the document number.

Parsers should be able to extract the document number from the first 9 characters, and to access any subsequent information supplied according to the ICAO format.

TAXID

Country taxpayer identification systems vary. Where specific guidance on including numbers from a particular jurisdiction is required, this may be included here in future.

IDCARD

Country ID card systems vary. Where specific guidance on including numbers from a particular jurisdiction is required, this may be included here in future.