What is the Contributor Node?
The Contributor Node is like a personal data firewall that makes sure PII never leaves your organization. Data files are uploaded to be tokenized and prepared for data matching on Senate.
The Contributor Node is a virtual machine image provided by Data Republic, that the contributor runs inside their own IT environment.
As the Contributor Node is run entirely on the custodian's system and hardware (or, for example, on a cloud-based storage solution operated by the custodian), the raw dataset is never transferred outside of the custodian’s environment, nor does Data Republic have access to the Contributor Node as it sits behind the custodian’s firewall.
Contributor Node process
The first step in the process for using Senate Matching is to upload data with PII into the Contributor Node. The Data Custodian (the authorized person from the contributing organization) extracts the customer data from a database or CRM and uploads it into their organization’s own Contributor Node.
The raw dataset uploaded to the Contributor Node by the custodian will include PII and may include the following details:
- name (given name, and family name);
- date of birth;
- phone number (mobile, home and/or work);
- email address;
- gender; and
- a “natural key”, which is the customer identifier from the source database and known only to the custodian’s organization.
Salting and hashing of PII fields
All PII fields are actually salted and hashed before being sent to the Contributor Node:
- If using the provided Web UI, the hashing is performed in the browser, prior to the browser application calling the Contributor Node API.
- Otherwise, the Contributor performs the hashing step on their own systems, using the specifications provided by the Contributor Node API.
Salt values are distributed by the Consul, the Senate Matching configuration service. Salt values are unique to each field name and are randomly generated 128-bit values. The hash algorithm is SHA-512. SHA-512 is preferred over SHA-256 because of its greater resistance to certain kinds of advanced attacks (see Comparison of SHA functions). Each field is associated with a simple normalization function, which is used to ensure that the same PII value can be matched after hashing, even if represented slightly differently between Custodians (e.g. emails with upper or lower case letters).
Variable bit-length slicing
The Contributor Node slices each PII hash into 32 pre-allocated slices. A predetermined selection of these slices is mapped to a different Matcher Node. Any slice not mapped to a Matcher Node is discarded. The hash slices may be further shortened, in order to ensure that there is a probability that some slices will have “collisions” within an individual Matcher Node. A collision is when two different inputs (e.g. two different email addresses) have the same value for a particular slice.