The Matcher Node is the technical component in the Data Republic Matcher Node network that stores hashed splices of PII during the tokenization process. This means that no one Matcher Node can contain an entire hashed field value for PII. Even if a matcher node is compromised, only a fragment of a hash could be extracted, significantly reducing the risk of exposure. When a request for matching is made, the Matcher Node compares hash splits for each token and returns Token pairs to an Aggregator Node.
Matcher Node process for managing hashed PII
Once the PII has been prepared using the Contributor Node, the hashed data is sliced into a number of “slices”. Some of these slices are distributed to Matcher Nodes, at which point the original hash values are discarded by the Contributor Node.
- In Senate Matching, each PII field will have a hash value 512-bits long. Each of these hashes will be individually sliced into 16 32-bit slices. A pre-determined subset of these slices is distributed and stored across various nodes which are hosted on Data Republic’s Matcher Nodes. The “full” hashes are discarded once the slices have been distributed.
- A Matcher Node will only have an encrypted token which relates to a particular individual within a particular dataset and does not have the relevant key to decrypt the token. The encryption key for the token is unique to each combination of Contributor Node, token database, field and slice.
- The slices selected for distribution are sent to their pre-allocated Matcher Nodes, which means, by way of example, that all of the "part one" slices of email addresses from all Contributors will be sent to "Matcher Node 1". Unallocated slices are permanently discarded.
- A Matcher Node may receive multiple slices, but never from the same field. So “Matcher Node 1” might receive slice 1 of all email addresses, and slice 3 of all phone numbers. These slices are then stored separately, they are not concatenated or combined. The associated tokens are encrypted with different keys for each field slice, so it is not possible for a Matcher Node to know which slices from the different fields belong to the same token.
- The slices sent to Matcher Nodes may be shorter than 32 bits, because Senate Matching is designed to guarantee a false positive error rate inside each Matcher Node. This means that each slice within the Matcher Node does not have enough uniqueness to identify an individual on its own.
- Senate Matching utilizes "variable length slicing" which means that different Contributors will use slices of different sizes, depending on the size of the dataset uploaded by that Custodian (i.e. the number of unique customers). The Matcher Node is still able to match slices of different lengths (the process for which is described in Section 3.2 below), guaranteeing false positives and further reducing re-identification risk.
Critically, the steps above are carried out entirely on the Data Custodian’s systems, and at no time does Data Republic receive or process any PII from the raw dataset that the Custodian uploaded. In the case of the Matcher Nodes, no party (not even Data Republic) is able to reconstruct the full hash of the original PII, even in the event that the salt value is known by an attacker.
Download Senate Matching Security Whitepaper