OBIS-0003: Attribution Tag Data Model and Exchange Format#
Status: Draft · Version: v1-draft · Date: 2026-05-19 · Editor: Bernhard Haslhofer · Focus area: Investigations and forensics
Abstract#
This document specifies a portable data model and exchange format for attribution tags: claims that bind blockchain addresses or clusters to entities, with structured provenance, a closed confidence vocabulary, and a first-class revocation mechanism. The design draws on the GraphSense TagPack approach of bundling tags with shared header metadata, but commits to several concrete improvements: addresses are identified using the CAIP-10 scheme rather than chain-shorthand strings; category and abuse values are drawn from the controlled vocabularies of OBIS-0002; provenance is structured and W3C PROV-aligned rather than a single free-form source string; confidence is a closed five-level enumeration; and revocation is a first-class operation.
1. Introduction#
Attribution, the binding of a pseudonymous blockchain address or cluster to a real-world actor, is the core analytical activity in investigations, supervision, and research on blockchain data. Each platform working in this space today maintains its own representation of the same underlying claim. Identical concepts carry different names, and attributions produced in one ecosystem are difficult to verify or reuse in another.
The GraphSense TagPack format is the most widely-used open format for sharing attribution data in the investigations community. It represents attribution as a list of tags, each binding an address to a label with a source and (optionally) a category and abuse type. OBIS-0003 builds on the same idea (a bundle of attribution claims with shared metadata) and tightens the parts of the model that matter most for cross-organisation exchange.
This document does not prescribe how attributions are produced. It specifies how they are represented when exchanged.
2. Scope#
OBIS-0003 covers:
- the structure of an individual attribution claim (an “attribution tag”);
- the structure of a bundle of tags with shared header metadata;
- provenance, confidence, and revocation metadata associated with tags;
- the canonical wire serialization.
OBIS-0003 does not cover:
- how clusters are computed from on-chain data (a separate document will address clustering heuristics and cluster-equivalence semantics);
- the controlled vocabularies for entity and abuse types (specified in OBIS-0002);
- transmittal protocols (push, pull, query); only the data format is normative.
3. Terminology#
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as described in RFC 2119.
- Address. A unit of value receipt or control on a blockchain ledger, identified per §7.
- Cluster. A set of addresses that, by application of a stated heuristic, are treated as controlled by a single entity.
- Entity. A real-world actor or service to which addresses or clusters may be attributed.
- Attribution Tag (or “tag”). A single claim binding an address or cluster to an entity, with provenance and confidence.
- Tag Bundle. A set of tags exchanged together, carrying shared header metadata.
- Attributor. The organisation or individual asserting the tag.
- Category. A value from the OBIS Entity Type Vocabulary (OBIS-0002 §6).
- Abuse. A value from the OBIS Abuse Type Vocabulary (OBIS-0002 §7).
4. Related work#
4.1 GraphSense TagPacks#
GraphSense TagPacks are the primary inspiration. A TagPack is a YAML file containing a header (title, creator, default currency, default source, etc.) and a list of tag records (address, label, source, optional category, abuse, confidence, is_cluster_definer). Header fields cascade as defaults to contained tags. TagPacks are identified by a Git URI; individual tags are identified by the triple (address, label, source). OBIS-0003 reuses the bundle-with-header idea and the field-cascading convention, and diverges on identifiers, provenance structure, confidence vocabulary, and revocation.
4.2 INTERPOL DW-VA-Taxonomy#
The category and abuse fields in GraphSense TagPacks draw from the INTERPOL DW-VA-Taxonomy. OBIS-0003 references this work via OBIS-0002, which provides the controlled vocabularies that OBIS attribution tags use.
4.3 W3C PROV Data Model#
The W3C PROV Data Model is the established framework for representing provenance. OBIS-0003 aligns its provenance block with PROV vocabulary (Entity, Agent, Activity, derivation), without requiring full RDF serialization. A future revision may define a normative PROV-O mapping.
4.4 FATF Travel Rule and IVMS101#
The FATF Recommendation 16 (the “Travel Rule”) and the associated IVMS101 data standard specify how originator and beneficiary identity information accompanies VASP-to-VASP transfers. IVMS101 is a transmittal standard, not an attribution standard: it carries identity disclosed by counterparties at transfer time, rather than third-party-asserted bindings of addresses to entities. The two formats are complementary and operate on disjoint segments of the regulatory workflow. OBIS-0003 does not subsume IVMS101 and does not depend on it.
4.5 Closed commercial formats#
The major commercial blockchain analytics vendors (Chainalysis, TRM Labs, Elliptic) each maintain proprietary attribution-tagging formats. These are not publicly redistributable and vary in field structure. They are noted because operational interoperability with these vendors requires bidirectional mapping. They cannot serve as the basis for an open standard.
4.6 Academic prior art#
The academic literature on attribution and clustering provides foundational concepts that this document draws on, including the common-input ownership and change-address heuristics (Meiklejohn et al., 2013), behavioural attribution methods (Möser and Narayanan, 2022), and confidence calibration for tagging pipelines (Béres et al., 2021).
5. Design principles#
OBIS-0003 commits to the following principles, which together distinguish it from the TagPack format it builds on:
- CAIP-10 addresses on the wire. Chain-shorthand fields (e.g., GraphSense
currency: BTC) are acceptable internally, but exchanged records use the CAIP-10 fully-qualified address form. Disambiguation across chains is a first-class concern, not a convention. - Controlled vocabularies for categorisation.
categoryandabusevalues are drawn from the OBIS-0002 vocabularies, not free-form text. Extensions follow thex-prefix rule (OBIS-0002 §8). - Structured provenance. Provenance is a typed block (attributor, method, software, derived_from) aligned with W3C PROV, rather than a single
sourceURL. The TagPacksourcefield is preserved as a separate human-readable back-link, distinct from provenance. - Closed confidence enumeration. A five-level enum (
vetted,high,medium,low,unverified) is normative. Implementations MUST NOT invent additional levels. - First-class revocation. Revocation is a typed operation that produces a record. Receivers MUST honour revocations and MUST NOT silently propagate revoked tags.
- Cluster as a first-class object. Cluster-level attribution carries an explicit cluster identifier, rather than the implicit “this tag applies to the whole cluster” semantics of GraphSense’s
is_cluster_definer: true.
6. Identifiers#
6.1 Chains and addresses#
Chains use CAIP-2: <namespace>:<reference>. Addresses use CAIP-10: <chain_id>:<account>. Examples: eip155:1 (Ethereum mainnet), bip122:000000000019d6689c085ae165831e93:1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa (a Bitcoin mainnet address).
Implementations MAY accept chain-shorthand on input (e.g., BTC, ETH); they MUST emit CAIP-10 on exchange.
6.2 Clusters#
A cluster carries an identifier in the attributor’s namespace, as a URI: https://attributor.example/clusters/<id>. The identifier is stable across exchanges: a cluster identifier does not change when its membership changes; a new identifier is minted only when the cluster represents a structurally different actor.
6.3 Tags and bundles#
A tag bundle is identified by a URI, typically a Git URL of the canonical source file (e.g., https://github.com/<org>/<repo>/blob/main/bundles/<name>.json). A tag within a bundle is identified by the triple (subject, entity_label, attributor).
7. Data model#
7.1 Attribution Tag#
A tag MUST contain:
| Field | Type | Required | Description |
|---|---|---|---|
subject | string | yes | A CAIP-10 address identifier (§6.1) or a cluster URI (§6.2). |
entity_label | string | yes | Human-readable label binding to the entity (e.g., binance-hot-wallet-3). |
category | string | recommended | Term from the OBIS Entity Type Vocabulary (OBIS-0002 §6). |
abuse | string | optional | Term from the OBIS Abuse Type Vocabulary (OBIS-0002 §7). |
confidence | enum | yes | Value from §9. |
provenance | object | yes | Provenance block, §8. |
source | string | recommended | Back-link to a public source supporting the claim. URL preferred. |
valid_from | date | recommended | First date on which the attribution is asserted to hold. |
valid_to | date | optional | Last date on which the attribution is asserted to hold, if known. |
revoked_at | datetime | optional | Timestamp of revocation, if revoked (§10). |
revocation_reason | string | optional | Human-readable reason for revocation. |
context | object | optional | Free-form JSON object for additional metadata (e.g., case identifier, jurisdictional context). |
Tags whose subject is a CAIP-10 address apply to that single address. Tags whose subject is a cluster URI apply to every address in that cluster.
7.2 Cluster#
When a tag references a cluster URI, the bundle SHOULD include a cluster record describing that cluster:
| Field | Type | Required | Description |
|---|---|---|---|
id | URI | yes | The cluster URI (§6.2). |
chain | string | yes | CAIP-2 chain identifier. |
heuristic | string | yes | Named clustering heuristic (e.g., co-spending, change-address, behavioural, external-disclosure). |
member_count | integer | yes | Number of addresses in the cluster. |
members | array | optional | CAIP-10 addresses. MAY be omitted when exchanging cluster-level attribution without member exposure. |
provenance | object | yes | Provenance block, §8. |
7.3 Tag Bundle#
A tag bundle bundles tags and (optionally) clusters with shared header defaults.
| Field | Type | Required | Description |
|---|---|---|---|
title | string | yes | Short title of the bundle. |
creator | string | yes | URI of the organisation or individual that produced the bundle. |
description | string | optional | Longer description. |
created | datetime | yes | When the bundle was produced. |
defaults | object | optional | Default values that contained tags inherit unless overridden. May include category, abuse, confidence, source, provenance.attributor. |
tags | array | yes | Array of Attribution Tag records (§7.1). |
clusters | array | optional | Array of Cluster records (§7.2), if referenced by any tag. |
revocations | array | optional | Array of revocation records (§10). |
Header inheritance is explicit: the defaults block applies only to fields a tag does not specify. There is no implicit per-file inheritance beyond the defaults block.
8. Provenance#
Every tag and cluster carries a provenance block aligned with the W3C PROV Data Model. The block is JSON-native and does not require RDF serialization.
| Field | Type | Required | Description |
|---|---|---|---|
attributor | URI | yes | Identifier of the organisation or agent asserting the record (the PROV Agent). |
created_at | datetime | yes | Time at which the record was first asserted. |
updated_at | datetime | optional | Time of last substantive update. |
method | enum | yes | One of: heuristic, manual_review, osint, disclosure, regulatory_designation, court_order, subpoena, voluntary_report, mixed. |
software | object | optional | Tool name, version, and configuration identifier where applicable. |
derived_from | array | optional | URIs of upstream tags, clusters, or bundles from which the present record was derived (the PROV wasDerivedFrom relation). |
The derived_from field carries lineage when records are merged or refined across organisations. Implementations SHOULD populate derived_from whenever a tag’s content depends on another exchanged record.
9. Confidence#
The confidence field takes a value from the following closed enumeration. Implementations MUST NOT invent additional levels.
| Level | Definition |
|---|---|
vetted | Supported by an authoritative public source (regulatory designation, criminal indictment, voluntary disclosure by the entity itself) and independently verified by the attributor. |
high | Supported by direct evidence (court filing, voluntary disclosure, regulatory action, or attribution previously vetted by another credible attributor). |
medium | Supported by published OSINT, behavioural pattern matching with corroborating signals, or an attributor’s internal investigation. |
low | Supported by limited or single-source signals (e.g., an unverified social-media post, an isolated behavioural match). |
unverified | Recorded for internal lineage; not asserted as a finding. |
The vetted and high levels MUST carry a source URL in the tag. The medium level SHOULD carry a source URL or an opaque attributor-namespaced reference.
10. Versioning and revocation#
10.1 Versioning#
Tag bundles are versioned by re-publication. A new bundle MAY supersede a prior bundle by carrying a supersedes field in its header pointing to the prior bundle’s URI. Receivers SHOULD treat the most recent bundle from a given creator as authoritative for the tags it contains.
10.2 Revocation#
A revocation is a first-class operation that produces a record. A revocation record contains:
| Field | Type | Required | Description |
|---|---|---|---|
revokes | array | yes | Array of (subject, entity_label, attributor) triples being revoked. |
revoked_at | datetime | yes | Time of revocation. |
reason | string | recommended | Human-readable reason. |
provenance | object | yes | Provenance block (§8). |
Revocations MAY be published within a bundle (in the revocations array of the header) or as a standalone bundle whose tags array is empty.
Recipients of a revocation MUST honour it: the named tags MUST be marked as revoked and MUST NOT be silently propagated to downstream consumers without the revocation marker.
Revoked records MUST NOT be deleted from any party’s holdings. They remain available with their revocation marked, in order to preserve the lineage of any downstream tags that relied on them.
11. Serialization#
The canonical exchange serialization is JSON. YAML MAY be used internally (for parity with GraphSense TagPack files); on the wire between organisations, JSON is normative.
Example bundle:
{
"title": "Example sanctioned-mixer bundle",
"creator": "https://attributor.example/",
"created": "2026-04-12T10:24:00Z",
"defaults": {
"category": "mixer",
"abuse": "sanctions-evasion",
"confidence": "vetted"
},
"clusters": [
{
"id": "https://attributor.example/clusters/c-9e2a",
"chain": "bip122:000000000019d6689c085ae165831e93",
"heuristic": "co-spending",
"member_count": 412,
"provenance": {
"attributor": "https://attributor.example/",
"created_at": "2026-03-28T08:00:00Z",
"method": "heuristic",
"software": {"name": "example-suite", "version": "3.2.1"}
}
}
],
"tags": [
{
"subject": "https://attributor.example/clusters/c-9e2a",
"entity_label": "mixer-x",
"source": "https://home.treasury.gov/news/press-releases/example",
"valid_from": "2024-08-01",
"provenance": {
"attributor": "https://attributor.example/",
"created_at": "2026-04-12T10:24:00Z",
"method": "regulatory_designation",
"software": {"name": "example-suite", "version": "3.2.1"}
}
}
]
}The defaults block specifies category, abuse, and confidence that the contained tag inherits; the tag carries only the fields specific to it.
12. Privacy and security considerations#
Attribution data is sensitive personal data in many jurisdictions. Implementations MUST apply at least the following:
- Attribution of an address or cluster to an entity of category
individual(OBIS-0002) is subject to applicable data protection law. Implementations MUST restrict storage, exchange, and onward disclosure of such attributions to recipients with a lawful basis. - Purpose limitation: attribution data collected for investigations is exchanged only with parties whose intended use is consistent with that purpose.
- The
sourcefield MUST NOT point to a resource disclosing personal data beyond what the attribution itself reveals (e.g., a link to a leaked document containing additional personal data is not appropriate evidence). - Tags carrying abuse values
csamorterrorism-financing(OBIS-0002 §7) carry elevated handling obligations and SHOULD be exchanged only with appropriately accredited recipients. - The revocation mechanism (§10.2) MUST be honoured. An attribution shown to be erroneous is not silently propagated.
13. Conformance#
An implementation is conformant with this document if, when exchanging attribution data with another party:
- it produces records using the field names, types, and value enumerations defined here;
- it emits CAIP-10 identifiers for addresses on exchange;
- it draws
categoryandabusevalues from the OBIS-0002 vocabularies; - it carries provenance (§8) on every tag and cluster record;
- it honours revocation (§10.2); and
- it applies the privacy provisions of §12 to
individual-category attributions.
Conformance does not require an implementation to produce tags. A read-only consumer is conformant if it preserves the above when re-emitting received records.
14. Open issues#
- PROV-O / JSON-LD context. A normative
@contextmapping to RDF terms is desirable for full PROV interoperability and is deferred to a future revision. - Cluster-equivalence. When two attributors assign different identifiers to clusters representing the same underlying actor, no global equivalence mechanism is yet defined.
- Bundle signing. Cryptographic signing of bundles and revocations is desirable for non-repudiation and is deferred.
- Behavioural heuristic interoperability. Implementations naming a heuristic
behaviouralmay use materially different methods; finer-grained naming is deferred.
References#
- IETF RFC 2119, Key words for use in RFCs to Indicate Requirement Levels.
- ChainAgnostic Standards Alliance, CAIP-2, Blockchain ID Specification.
- ChainAgnostic Standards Alliance, CAIP-10, Account ID Specification.
- W3C, PROV Data Model.
- GraphSense, TagPacks Wiki.
- INTERPOL Innovation Centre, Darkweb and Virtual Assets Taxonomy.
- FATF, Recommendation 16 (Travel Rule).
- InterVASP, IVMS101 Data Standard.
- S. Meiklejohn, M. Pomarole, G. Jordan, K. Levchenko, D. McCoy, G. Voelker, S. Savage, A fistful of bitcoins: characterizing payments among men with no names, IMC 2013.
- M. Möser, A. Narayanan, Resurrecting address clustering in Bitcoin, FC 2022.
- F. Béres, I. András Seres, A. A. Benczúr, M. Quintyne-Collins, Blockchain is watching you: profiling and deanonymizing Ethereum users, ICBC 2021.
- OBIS-0001, OBIS Document Lifecycle.
- OBIS-0002, Shared Taxonomies for Blockchain Intelligence.