All data sharing and matching on the Data Republic platform is governed by approval workflows in projects.


This article provides an overview of the end-to-end process for getting your data onto the Data Republic platform through to making it available for matching and analysis in a project.

Click on any of the links below for an in-depth overview of each topic and relevant how-to guides.

Prerequisites for Data Sharing

Legal requirements

Organizations involved in a data sharing project must have signed software-use terms and have a legal framework in place to govern the rights and responsibilities of each organization participating in a data share.

Technical requirements

Your organization may be required to install additional applications and perform some setup configuration to access the platform.

Team setup requirements

Projects will require delegation of tasks to one or more users to prepare data, project manage and approve data license terms on behalf of your organization.

Role required

Activity

Analyst

1. Upload transaction/attribute data and prepare a data package (on the Data Republic platform)
2. Analyze datasets on the Data Republic platform (if your organization will be performing the data analysis)

3. Prepare data for PII upload and tokenization (using your Contributor Node), and attach tokens to transaction/attribute data (this only applies to projects involving Matching).

Project Manager

1. Create or invite People to a project
2. Delegate responsibility and oversee all tasks for your organization in order to execute a data sharing project on the Data Republic platform

Data license approver

1. Approve the data license (project specific terms) on behalf of your organization

Workflows

There are 2 main workstreams for data sharing on the Data Republic platform:

  1. Tokenization: where PII is prepared and tokens are generated to replace PII (using the Contributor Node)

  2. Data preparation: where data is uploaded and prepared for projects (using the Manage Data menu)

  3. Project creation (and management): where organizations can collaborate and manage their data sharing (using the Project menu)

Note:

  • Tokenization must be completed before the Data Preparation

  • Data Preparation can be started by each organization independently to Project Creation.

  • Project creation (and management) is facilitated by the organization that is providing the Workspace

Tokenize your PII data

In this workstream, analysts will be preparing their PII data internally

0. Data Preparation (completed in each contributing organization's environment):

  • Using a Contributor Node, Data Custodians can upload PII to receive tokens so that each unique individual will be represented by a random token.

  • Since data matching can be performed on a range of PI fields available, data matching rules must first be determined by organizations before PII can be prepared for upload and tokenization by the Contributor Node.

  • Once tokens are received, data preparation on the Data Republic platform typically involves uploading tokens and attribute data to the Data Republic platform, creating database tables so that your data can be queried later, and packaging files and/or tables together for projects.

1. Load PI into the Contributor Node (CN): We recommend using API for upload.

2. Download tokens from the CN: Click the button to download the tokens and Person ID's into your environment.

3. Prepare your tokens for the Data Republic platform: Remember to remove personid from the table you upload to the Data Republic platform.

Note: If you have attribute data for your project, you will need to first append your tokens to your attribute table for upload to allow for deeper analysis later (use personid as the primary key to join).

Preparing your data for matching

Data Republic does not allow any personally identifiable information (PII) on the Data Republic Platform. Therefore, any PII belonging to an individual must first be replaced with a random token

0. Data Preparation (completed in your organization's environment):

  • Make sure data is cleaned (i.e. no duplicate rows, removal of null value rows, standardized formatting for fields, etc.) to ensure that the analysts can analyze the data as soon as possible.

  • You will need to first append your tokens to your attribute table for upload to allow for deeper analysis later.

  • Data should be prepared according to what has been agreed between organizations.

1. Upload Files: Drag and drop files less than 100MB or use SFTP for larger files.

2. Create Database: If you would like to query structured data (using View Builder) or allow approved analysts to query your data in a Redshift database, you will need to create a database. Once you have a database ready, you can create tables for your data, using the databases to group similar tables together.

4. Create a View of your table using View Builder (optional): This is optional, in the instance where you would like to share only a subset of your data for specific projects.

5. Create a Package: Specify the tables and files that you would like to contribute to a Project. A package contains meta-data only (i.e. file/table names).

Requesting a Data Match

All data sharing and Matching on the Data Republic platform is managed via Projects, where one or more organization may be invited to collaborate. 

0. Create a Project: Start a new project and add a description (i.e. short overview for the purpose of the project)

1. Add People: Add people who are needed for the Project (who have Data Republic platform user accounts).

2. Have Conversations: Use the Conversations to converse with everyone in the Project, or create subgroups (i.e. your organization and DR) to have all communications in one place.

3. Select a Legal Framework: Participating organizations can use Data Republic's Common Legal Framework for data exchange (if this has been executed by parties involved) or create and select their own legal framework to use in Projects (if an organization has signed the Third Party Module).

4. Add Packages (from the Data Preparation work stream): If you will be contributing data to a Project, you will need to add Packages to the Project (to specify which files and/or tables you will be sharing).

5. Submit the Data License: A Data License should be drafted while the Data is being prepared, as organizations will need to allocate enough time to discuss, negotiate, agree on terms for permitted use of data and approve the data license.

6. Request a Workspace and Request a Match: Once the Data License is approved, a Workspace can be created, people can be added, and the match can be triggered

  • Once matching is complete, a token-pair match table is available. The table will display which two tokens have matched, and which fields they have matched on. For example, token A and token B matched on full name and mobile number, whereas token C and token D matched only on full name.

7. Begin your analysis: Only authorized users from organizations permitted in the Data License can access data for analysis.

8. Request outputs from your Workspace: When you have finished with your analysis, submit a request to extract any outputs permitted by the data license in your project. All output requests are subject to approval by Data Republic.

Project Completion

Once data analysis is underway in the Project, the analyst will explore the data and create any outputs if the Data License allows for this. Upon project completion, the Workspace will be terminated.

1. Terminate Workspace: Once the Project is complete and permitted outputs have been extracted, the Workspace can be terminated, and all of the data inside deleted.

Additional Information

What is Privacy-Preserving Matching?

What PI fields can be matched?

Data Republic and Personally Identifiable Information (PII)


Did this answer your question?