OBIS-0002: Address Attribution Data Model#

Status: Draft · Version: v1-draft · Date: 2026-05-19 · Editor: Bernhard Haslhofer · Focus area: Investigations and forensics

Abstract#

This document specifies a data model for the portable exchange of blockchain address attributions. The model defines a small, vendor-neutral vocabulary covering addresses, clusters, entities, attribution claims, supporting evidence, provenance, confidence, and revocation. It is intended to enable investigators, researchers, and supervisors operating across different tools and organisations to exchange attribution data without semantic loss.

1. Introduction#

Attribution — binding a pseudonymous blockchain address or cluster of addresses to a real-world actor — is the core analytical activity in blockchain investigations, supervision, and research. Today, this work is conducted within isolated tool ecosystems. Each platform defines its own representation of an address, an entity, an attribution, and the evidence that supports it. Identical concepts carry different names; semantically distinct concepts carry the same name. Attributions produced in one ecosystem are difficult to verify or reuse in another, and the resulting fragmentation impedes cross-organisation casework, supervisory comparability, and reproducible research.

This document does not prescribe how attributions are produced. It specifies how they are represented when exchanged.

2. Terminology#

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as described in RFC 2119.

  • Address. A unit of value receipt or control on a blockchain ledger, identified by a ledger-native string and the ledger on which it is valid.
  • Cluster. A set of addresses that, by application of a stated heuristic, are treated as controlled by a single entity.
  • Entity. A real-world actor — natural person, legal person, organisational unit, or service — to which addresses or clusters may be attributed.
  • Attribution. A claim, made by a stated attributor, that an address or cluster is associated with a stated entity.
  • Evidence. Material on which an attribution rests. May be public (URL to a press release, court filing, indictment, blog post) or non-public (opaque reference to internal records).
  • Provenance. The record of who made the attribution, when, by what method, and on what evidence.
  • Confidence. The attributor’s stated degree of certainty in the attribution, expressed using the levels defined in §7.
  • Attributor. The organisation or individual asserting the attribution.

3. Identifiers#

3.1 Chains and addresses#

Chains and addresses are identified using the CAIP-2 and CAIP-10 namespaces.

  • A chain identifier has the form <namespace>:<reference>, e.g. eip155:1 (Ethereum mainnet), bip122:000000000019d6689c085ae165831e93 (Bitcoin mainnet).
  • An address identifier has the form <chain_id>:<address>, e.g. eip155:1:0xabc..., bip122:000000000019d6689c085ae165831e93:1A1zP....

Implementations MUST preserve the CAIP-10 form when exchanging addresses across organisations. They MAY maintain alternative internal representations.

3.2 Clusters#

A cluster identifier is an opaque URI in the attributor’s namespace, e.g. https://attributor.example/clusters/abc123. The identifier MUST be stable across exchanges (a cluster’s identifier does not change when its membership changes; a new identifier is minted only for a structurally different cluster).

3.3 Entities#

An entity identifier is an opaque URI in the attributor’s namespace, e.g. https://attributor.example/entities/binance. Where multiple attributors maintain independent identifiers for the same real-world entity, equivalence is asserted by attribution exchange (§5.4) rather than by global identifier.

4. Entity types#

An entity carries a primary type drawn from the closed initial vocabulary below. Implementations MUST preserve the type when exchanging entities. Where an entity does not match any initial type, the value unknown_service SHOULD be used and a free-text subtype field MAY carry an extension label.

TypeDescription
exchangeCentralised virtual-asset exchange or trading service.
mixerMixing or anonymising service.
bridgeCross-chain bridge or wrapping service.
minerBlock producer (PoW miner, PoS validator, or mining pool).
payment_processorMerchant or remittance payment processor.
gambling_serviceBetting, casino, or prediction-market service.
darknet_marketMarketplace operating predominantly on a darknet.
ransomwareAddress controlled by a ransomware operator.
scamAddress controlled by a scam operator (phishing, rug pull, fraudulent ICO).
sanctioned_entityEntity designated by a competent sanctioning authority.
smart_contractProgrammable on-chain logic (DeFi protocol, DAO treasury, token contract).
individualNatural person. Subject to the privacy provisions of §10.
legal_entityNon-exchange organisation (foundation, NGO, corporation).
unknown_serviceRecognised as an organisational actor but type undetermined.

The vocabulary is intentionally small and intended to be extended through future revisions of this document rather than ad hoc per-implementation values.

5. Data model#

5.1 Attribution#

An attribution MUST contain:

FieldTypeRequiredDescription
idURIyesStable identifier for the attribution, in the attributor’s namespace.
subjectURIyesCAIP-10 address identifier or cluster identifier (§3).
entityURIyesEntity identifier (§3.3).
confidenceenumyesValue from §7.
provenanceobjectyesProvenance record (§6).
evidencearrayrecommendedOne or more evidence references (§5.3).
valid_fromdaterecommendedFirst date on which the attribution is asserted to hold.
valid_todateoptionalLast date on which the attribution is asserted to hold, if known.
revoked_atdatetimeoptionalTime at which the attribution was withdrawn.
revocation_reasonstringoptionalFree-text reason for revocation.

Attribution to an entity of type individual MUST include evidence and SHOULD carry confidence high or vetted. See §10.

5.2 Cluster#

A cluster MUST contain:

FieldTypeRequiredDescription
idURIyesStable cluster identifier (§3.2).
chainstringyesCAIP-2 chain identifier.
heuristicstringyesNamed clustering heuristic. See §5.2.1.
member_countintegeryesNumber of addresses in the cluster.
membersarrayoptionalCAIP-10 addresses. Omitted when exchanging cluster-level attribution without member exposure.
provenanceobjectyesProvenance record (§6).

5.2.1 Clustering heuristic names#

This document defines an initial set of canonical names. Implementations SHOULD use these names when applicable and MAY use vendor-specific names prefixed x-.

NameDescription
co-spendingCommon-input ownership heuristic (UTXO chains).
change-addressChange-address heuristic (UTXO chains).
behaviouralBehavioural pattern matching (timing, amounts, fees).
address-reuseIdentification by repeated address reuse across services.
external-disclosureMembership established by external disclosure (court order, voluntary report).
compositeTwo or more of the above combined; the constituents SHOULD be enumerated in a composite_of field.

5.3 Evidence#

An evidence record MUST contain:

FieldTypeRequiredDescription
typeenumyesOne of: public_url, court_filing, regulatory_designation, voluntary_disclosure, subpoena_response, osint, internal_record.
referencestringyesURL for public evidence, or opaque attributor-namespaced identifier for non-public evidence.
descriptionstringrecommendedShort human-readable description.
observed_atdatetimerecommendedWhen the evidence was collected.

Where evidence is non-public, the attributor MUST retain the underlying record under their own retention policy and MUST be able to produce it on request from another attributor exchanging the data, subject to applicable law.

5.4 Attributor cross-reference#

Where one attributor wishes to assert that its entity is the same as an entity in another attributor’s namespace, it does so by issuing an attribution whose subject and entity are both entity URIs, and whose confidence reflects the basis for the cross-reference. This mechanism replaces a global entity registry.

6. Provenance#

Provenance records align with the W3C PROV Data Model. Every attribution, cluster, and evidence record MUST carry a provenance object with the following fields.

FieldTypeRequiredDescription
attributorURIyesIdentifier of the organisation or agent asserting the record.
created_atdatetimeyesTime at which the record was first asserted.
updated_atdatetimeoptionalTime of last substantive update.
methodenumyesOne of: heuristic, manual_review, osint, disclosure, regulatory_designation, court_order, subpoena, voluntary_report, mixed.
softwareobjectoptionalTool name, version, and configuration identifier where applicable.
derived_fromarrayoptionalURIs of upstream attributions, clusters, or evidence records the present record was derived from.

The derived_from field carries provenance lineage when records are re-published, merged, or refined across organisations.

7. Confidence levels#

The confidence field takes a value from the following closed enumeration. Implementations MUST NOT invent additional levels.

LevelDefinition
vettedThe attribution is supported by an authoritative public source (regulatory designation, criminal indictment, voluntary disclosure by the entity itself, or equivalent) and has been independently verified by the attributor.
highThe attribution is supported by direct evidence (court filing, voluntary disclosure, regulatory action, or attribution previously vetted by another credible attributor).
mediumThe attribution is supported by published OSINT, behavioural pattern matching with corroborating signals, or an attributor’s internal investigation; a reasonable analyst could verify the basis.
lowThe attribution is supported by limited or single-source signals (e.g. an unverified social-media post, an isolated behavioural match).
unverifiedThe attribution is recorded for internal lineage but the attributor does not assert it as a finding.

The vetted and high levels MUST carry at least one evidence record (§5.3). The medium level SHOULD carry evidence.

8. Serialization#

The canonical serialization is JSON. The following example illustrates an attribution of a Bitcoin cluster to a sanctioned mixer.

{
  "id": "https://attributor.example/attributions/0001",
  "subject": "https://attributor.example/clusters/c-9e2a",
  "entity": "https://attributor.example/entities/mixer-x",
  "confidence": "vetted",
  "valid_from": "2024-08-01",
  "provenance": {
    "attributor": "https://attributor.example/",
    "created_at": "2026-04-12T10:24:00Z",
    "method": "regulatory_designation",
    "software": {"name": "example-suite", "version": "3.2.1"}
  },
  "evidence": [
    {
      "type": "regulatory_designation",
      "reference": "https://home.treasury.gov/news/press-releases/example",
      "observed_at": "2026-04-12T10:00:00Z"
    }
  ]
}

A YAML-equivalent representation MAY be used internally; on the wire between organisations the JSON form is normative.

Future revisions of this document MAY define a JSON-LD context for full RDF interoperability. Implementations SHOULD not depend on the JSON-LD form until it is specified.

9. Versioning and revocation#

Attributions, clusters, and evidence records are immutable once exchanged. Updates are made by issuing a new record that supersedes the prior via derived_from, or by issuing a revocation that sets revoked_at on the prior. Revoked records MUST NOT be deleted from any party’s holdings; they remain available with their revocation marked, in order to preserve the lineage of any downstream attributions that relied on them.

10. Privacy and security considerations#

Attribution data is sensitive personal data in many jurisdictions. Where this document does not otherwise constrain implementations, they MUST apply at least the following:

  1. Attribution of an address or cluster to an individual entity is subject to applicable data protection law. Implementations MUST restrict storage, exchange, and onward disclosure of individual attributions to recipients with a lawful basis.
  2. Implementations SHOULD apply purpose limitation: attribution data collected for investigations is exchanged only with parties whose intended use is consistent with that purpose.
  3. Evidence references MUST NOT disclose information that would identify individual data subjects beyond what the attribution itself reveals (for example, a public_url to a leaked document containing additional personal data is not appropriate evidence).
  4. The revoked_at mechanism (§9) MUST be honoured by all parties; an attribution shown to be erroneous is not silently propagated.

11. Conformance#

An implementation is conformant with this document if, when exchanging attribution data with another party:

  1. it produces records using the field names, types, and value enumerations defined here;
  2. it preserves CAIP-10 identifiers (§3.1) for addresses;
  3. it carries provenance (§6) on every attribution, cluster, and evidence record;
  4. it honours revocation (§9); and
  5. it applies the privacy provisions of §10 in respect of individual attributions.

Conformance does not require an implementation to produce attributions; a read-only consumer of attribution data is conformant if it preserves the above when re-emitting received records.

12. Open issues#

The following items are open and will be addressed in subsequent draft revisions:

  • JSON-LD context. A normative @context mapping to RDF terms is desirable for semantic interoperability but is not specified in this revision.
  • Cluster equivalence. A formal mechanism for asserting that two clusters from different attributors represent the same underlying actor is not yet specified beyond §5.4.
  • Confidence calibration. Methods for empirically comparing confidence levels across attributors are out of scope for this revision; a companion document may follow.
  • Behavioural heuristic interoperability. Implementations naming a heuristic behavioural may use materially different methods; finer-grained naming will be considered.

References#

  • IETF RFC 2119, Key words for use in RFCs to Indicate Requirement Levels.
  • ChainAgnostic Standards Alliance, CAIP-2, Blockchain ID Specification.
  • ChainAgnostic Standards Alliance, CAIP-10, Account ID Specification.
  • W3C, PROV Data Model.
  • OBIS-0001, OBIS Document Lifecycle.