What is a Contributor Node?

A Contributor Node is a tool used to tokenize your PII data and enables you to match your customer base with other organizations. See here for more information about Contributor Nodes.

In this article, you will learn about:

Prerequisites:

Tokenize your data with the Contributor Node

1. Type in / copy the address of your Contributor Node into the browser (https://[host name of your Contributor Node]/)

Your login credentials are created by your organisation when your Contributor Node is configured.

a. If using local authentication, the username is "api" and the password is set in contributor.sh (See Contributor Script - HTTP_BASICAUTHPASSWORD setting)

b. Alternatively, if your IT department has configured SSO support, you will be directed to your organisation's single sign on system.

2. When you log in you will see the Dashboard with a list of databases and the Matching system status. Click on the database you would like to use.

Note: By default, there will be two databases (one for production, and one for testing).

3. Drag and drop the file into the middle panel, or use the browse button to select the file from your machine. Leave the CSV format details (currently, only the defaults are supported).

Click Upload data file.

A green progress bar on the right will show the file being uploaded and processed and the token count will start increasing.

4. Once the tokenization process is complete, the Database summary panel will update to show today’s date, and the total number of tokens in the database. You will now be able to download the tokens by clicking the 'Download tokens' button on the left of the screen.

Your tokenized data is now ready for either adding to attribute data (optional), or loading to the Data Republic platform.


Tokenize PII using the API

Instead of using the Contributor Node’s web browser UI to upload PI for tokenization, you can also use the API.

Your Contributor Node supports uploading both “pre-hashed” and “plain-text” data. By default, plain-text API endpoints are disabled (this is because our recommended approach is to hash the data prior to sending to the Contributor Node). The Browser UI will always hash data in the browser before uploading to the node.

Although our recommendation is to hash the data prior to upload to the Contributor Node it may be acceptable to data custodians to upload plain-text data and use the Contributor Node itself to salt and hash the PI fields. This will depend on your organization’s policies and how your node has been deployed (for example, if it is “on-prem” or hosted by a cloud provider). Instructions for enabling the plain-text API endpoints are given below.

Your node also supports synchronous and asynchronous uploads:

  • Synchronous upload means the API request will remain open until all the data has been hashed, sliced, and distributed to matcher nodes.
  • Asynchronous upload means the API request will return immediately, and the data upload will proceed in the background. There is an API call to fetch the job status so that you can monitor progress. Alternatively, the browser UI shows the job status on the database page.

Accessing the Swagger (Open API) Specifications

The first step to tokenizing your PII through the API requires accessing the Swagger Specifications.

1. Log into your Contributor Node browser UI. At the bottom of the page, click View Swagger Specifications.

Note: We recommend you open this in a new tab - it can be helpful to be able to switch back to the UI occasionally.

2. You can download the Swagger Specification as a JSON file for loading into any API tools you want to use.

Alternatively, you can try out the API from the browser window. To do so, click Authorize to set the API credentials (most API calls will require authentication).

3. Type in your Contributor Node credentials and click Authorize. For API access, the username is always “api”, the password is set at installation.

Close the pop-up by clicking the “X” in the top right corner (do not click “logout”).

4. You can now call specific API end points to try them out and see the parameters required and results returned.

For example, the DatabaseStatistics end point will give you a token count and updated date for a specific database.

5. Swagger will also generate sample Curl commands. You can usually copy-and-paste these into a command line to call the API.

Note: Take care with these commands. The sample Curl command will include the authorization credentials. We’ve obscured these in the example on the right. Be sure not to share these commands with unauthorised users.

From here, you can either use the API for uploading pre-hashed data, or plain-text data.

Uploading pre-hashed data via API

Prerequisite: You have authorised your API credentials in the Swagger specifications.

1. Look up the DBUUID for the token database you want to upload data to. Use the GlobalConfig API call to get a list of DBUUIDs on this system.

2. The GlobalConfig API call will return the list of token databases on this Contributor Node. For each database it will tell you:

  • The human readable name of the database (e.g. “Production”)
  • The list of supported fields for each database (these are listed by Field ID).

3. The rest of the GlobalConfig payload contains the list of fields on this Matching network. For each field, it will specify:

  • The human readable field name (e.g. “email”)
  • The field ID (e.g. 8012)
  • Normalisation method to use for this field before hashing (see Contributor Node Schema for the definition of these rules).
  • The default salt value (HashSalt) to append to your field before hashing. This is the value used when uploading via the browser.

Note that you can substitute your own salt values, however any matching parties must be given those same salt values for matching to be possible on that field. Data Republic does not need to know these custom salt values.

(The salt values in the screenshot at right are not the real salt values for any production Matching network)

4. Now that you have the field names, normalization method, and salt value you can prepare your data. For each PII value in each field:

  1. Normalize the plain-text values
  2. Append the salt value.
  3. Hash using SHA-512 (SHA2).
  4. Encode the hash value using Base64

Save the result of this process as a CSV file with the same column headers as for “plain-text” CSV.

5. Scroll down and open the LoadHashedRecords API call. Click Try it out.

6. Enter the DBUUID for the token database, and select the PII file you want to upload.

The CSV file has the same format, including the same column names. However the row values should be salted and hashed.

Click Execute when ready.

Your data is now tokenized. If you now go to the token database in the Contributor Node UI, you will be able to download the tokens by clicking the 'Download tokens' button on the left of the screen.

Your tokenized data is now ready for either adding to attribute data (optional), or loading to the Data Republic platform.

Uploading plain-text data via API

Prerequisite: You have authorised your API credentials in the Swagger specifications.

1. Scroll down to LoadRecords. Click Try it out.

2. Type in the DBUUID for the database you want to upload to (you can obtain this from the URL when you log into the Contributor Node: https://<contributor-hostname>/dashboard/<DBUUID>

Select a small CSV sample file to upload and test. An appropriate format can be obtained by clicking Download data template on the database screen in the browser UI.

Click Execute.

3. Your Curl Command is created, this is the API call. Copy the Curl command.

4. Paste the command into a Terminal to run.

You can adjust the file name portion of the API call. You can see this towards the end of the Curl command with the text beginning “file=@”. The file path should be relevant to the current working directory in your terminal.

5. Curl will report the status of the request. Expected response codes are documented in the Swagger file.

Note - Response codes:

  • 200: File processed successfully
  • 404: DBUUID not found on this Contributor Node
  • 500: Server error (usually indicates that plain text PII is not enabled on this node)

Your data is now tokenized. If you now go to the token database in the Contributor Node UI, you will be able to download the tokens by clicking the 'Download tokens' button on the left of the screen.

Your tokenized data is now ready for either adding to attribute data (optional), or loading to the Data Republic platform.

If you have any questions, you can reach out to our support team for assistance at support@datarepublic.com.

Using your own Tokens

This feature is designed for customers who want to handle the token generation and mapping process in their own system, rather than using the Contributor Node to maintain the mapping between a token and a PersonID. You may want to consider this option if:

  • You have a very large volume of data (> 20M rows) and would rather manage the process within your own data warehouse
  • You do not have an appropriate field to use as PersonID, or
  • You have a PersonID but it meets the definition of “personal information” and is not permitted to leave the data warehouse.

This feature is available from 1.8.0 and is currently in alpha trials with customers.

If a data contributor prefers to generate their own tokens and maintain the mapping between token and PersonID within their existing data warehouse, Data Republic has developed a tool called the Matching Tokenizer to help them generate random and unique tokens easily and within their own IT environment.

The Matching Tokenizer is a command line interface (CLI) tool. We provide binaries for Mac OS, Linux and Windows.

Note: Privacy-Preserving Matching Tokenizer supports 64-bit CPU architecture only.

Prerequisites:

  • You have contacted Data Republic about requesting to use your own tokens. This is so that the proper database in your Contributor Node can be set up for you.

1. Use your personID as in input to the Matching Tokenizer.

This tool takes both input and output files as parameters, or you can use standard input/output by omitting the -i or -o parameters.

The input is an IDs file with each line containing an ID to be tokenized.

For example:

1 1001

2 1002

3 1003

4 1004

5 1005

6 1006

7 1007

8 1008

9 1009

10 1010

The output is a file containing the ID and its associated Token on each line separated by a comma. The order of input records is not preserved.

For example:

1 1001,0x5f466bca382b

2 1002,0xe50e0f8c0b84

3 1003,0x4822369799c6

4 1004,0x9b0fd3f4a6bc

5 1005,0xa604f0e8e216

6 1006,0xcb32fb0fb804

7 1007,0x4d4a62129d38

8 1008,0xaf44468bbbfe

9 1009,0xcb74556636b7

10 1010,0x97a0ff1e24c

The Tokenizer does not check content of the output file but simply appends new content to the end.

2. Add the tokens to the table with the PII fields, and remove the personid field.

E.g. the table below:

personid

email

phone

1001

alison@example.com

(555) 623-2565

1002

james@example.com

(555) 710-1092

1003

john@example.com

(555) 877-9905

becomes this table:

token

email

phone

0x5f466bca382b

alison@example.com

(555) 623-2565

0xe50e0f8c0b84

james@example.com

(555) 710-1092

0x4822369799c6

john@example.com

(555) 877-9905

If you are using attribute data as well, you should add the tokens to the attribute data as well.

3. You can now log into the Contributor Node, and select the database for adding your own tokens that Data Republic has set up, and drop in the file with your tokens and PII fields.

Note: There are some differences in the UI.

In the Database Summary panel, you will see the number of token operations(added, updated or removed) instead of the number of total tokens.

Similarly when you call the /api/Contributor/v1/DatabaseStatistics/{DBUUID} API, you will see both TokenCount and TokenOpCount.

TokenCount will always be 0 in this mode as Contributor Node will not save the tokens. TokenOpCount marks how many token operations(added, updated, removed) have been done on this Database. It counts the operations regardless of whether the same token has been operated before.

1 {

2 "TokenCount": "0",

3 "LastUpdatedTime": "2020-06-26T01:18:57Z",

4 "TokenOpCount": "10"

5 }

Your tokenized data is now ready for either adding to attribute data (optional), or loading to the Data Republic platform.

Related Articles:


Did this answer your question?