All data sharing and matching on the Data Republic platform is governed by approval workflows in projects.
This article provides an overview of the end-to-end process for getting your data onto the Data Republic platform through to making it available for matching and analysis in a project.
- Prerequisites for Data Sharing
- Tokenize your PII
- Preparing your data for matching
- Requesting a data match
- Project Completion
- Additional Information
Click on any of the links below for an in-depth overview of each topic and relevant how-to guides.
Prerequisites for Data Sharing
Organizations involved in a data sharing project must have signed software-use terms and have a legal framework in place to govern the rights and responsibilities of each organization participating in a data share.
- Your Organization has signed our Software Agreement or Guest Agreement .
- Your have signed the Privacy-Preserving Matching Module
- A legal framework for data sharing has been agreed between your organisation and the organisation you are working with. This could be the Data Republic Common Legal Framework (CLF) or your own pre-existing legal agreement.
Your organization may be required to install additional applications and perform some setup configuration to access the platform.
- You have installed your Contributor Node
- Ensure your Contributor Node is installed in an environment with at least the minimum specifications for the number of records you will be working with.
- Ensure you have worked with your IT team to whitelist the recommended domains and ports for your region.
Team setup requirements
Projects will require delegation of tasks to one or more users to prepare data, project manage and approve data license terms on behalf of your organization.
1. Upload transaction/attribute data and prepare a data package (on the Data Republic platform)
3. Prepare data for PII upload and tokenization (using your Contributor Node), and attach tokens to transaction/attribute data (this only applies to projects involving Matching).
1. Create or invite People to a project
Data license approver
1. Approve the data license (project specific terms) on behalf of your organization
There are 2 main workstreams for data sharing on the Data Republic platform:
- Tokenization: where PII is prepared and tokens are generated to replace PII (using the Contributor Node)
- Data preparation: where data is uploaded and prepared for projects (using the Manage Data menu)
- Project creation (and management): where organizations can collaborate and manage their data sharing (using the Project menu)
- Tokenization must be completed before the Data Preparation
- Data Preparation can be started by each organization independently to Project Creation.
- Project creation (and management) is facilitated by the organization that is providing the Workspace
Tokenize your PII data
In this workstream, analysts will be preparing their PII data internally
0. Data Preparation (completed in each contributing organization's environment):
- Using a Contributor Node, Data Custodians can upload PII to receive tokens so that each unique individual will be represented by a random token.
- Since data matching can be performed on a range of PI fields available, data matching rules must first be determined by organizations before PII can be prepared for upload and tokenization by the Contributor Node.
- Once tokens are received, data preparation on the Data Republic platform typically involves uploading tokens and attribute data to the Data Republic platform, creating database tables so that your data can be queried later, and packaging files and/or tables together for projects.
1. Load PI into the Contributor Node (CN): We recommend using API for upload.
2. Download tokens from the CN: Click the button to download the tokens and Person ID's into your environment.
3. Prepare your tokens for the Data Republic platform: Remember to remove personid from the table you upload to the Data Republic platform.
Note: If you have attribute data for your project, you will need to first append your tokens to your attribute table for upload to allow for deeper analysis later (use personid as the primary key to join).
Preparing your data for matching
Data Republic does not allow any personally identifiable information (PII) on the Data Republic Platform. Therefore, any PII belonging to an individual must first be replaced with a random token.
0. Data Preparation (completed in your organization's environment):
- Make sure data is cleaned (i.e. no duplicate rows, removal of null value rows, standardized formatting for fields, etc.) to ensure that the analysts can analyze the data as soon as possible.
- You will need to first append your tokens to your attribute table for upload to allow for deeper analysis later.
- Data should be prepared according to what has been agreed between organizations.
1. Upload Files: Drag and drop files less than 100MB or use SFTP for larger files.
2. Create Database: If you would like to query structured data (using View Builder) or allow approved analysts to query your data in a Redshift database, you will need to create a database. Once you have a database ready, you can create tables for your data, using the databases to group similar tables together.
4. Create a View of your table using View Builder (optional): This is optional, in the instance where you would like to share only a subset of your data for specific projects.
5. Create a Package: Specify the tables and files that you would like to contribute to a Project. A package contains meta-data only (i.e. file/table names).
Requesting a Data Match
All data sharing and Matching on the Data Republic platform is managed via Projects, where one or more organization may be invited to collaborate.
0. Create a Project: Start a new project and add a description (i.e. short overview for the purpose of the project)
1. Add People: Add people who are needed for the Project (who have Data Republic platform user accounts).
2. Have Conversations: Use the Conversations to converse with everyone in the Project, or create subgroups (i.e. your organization and DR) to have all communications in one place.
3. Select a Legal Framework: Participating organizations can use Data Republic's Common Legal Framework for data exchange (if this has been executed by parties involved) or create and select their own legal framework to use in Projects (if an organization has signed the Third Party Module).
4. Add Packages (from the Data Preparation work stream): If you will be contributing data to a Project, you will need to add Packages to the Project (to specify which files and/or tables you will be sharing).
5. Submit the Data License: A Data License should be drafted while the Data is being prepared, as organizations will need to allocate enough time to discuss, negotiate, agree on terms for permitted use of data and approve the data license.
- Once matching is complete, a token-pair match table is available. The table will display which two tokens have matched, and which fields they have matched on. For example, token A and token B matched on full name and mobile number, whereas token C and token D matched only on full name.
7. Begin your analysis: Only authorized users from organizations permitted in the Data License can access data for analysis.
8. Request outputs from your Workspace: When you have finished with your analysis, submit a request to extract any outputs permitted by the data license in your project. All output requests are subject to approval by Data Republic.
Once data analysis is underway in the Project, the analyst will explore the data and create any outputs if the Data License allows for this. Upon project completion, the Workspace will be terminated.
1. Terminate Workspace: Once the Project is complete and permitted outputs have been extracted, the Workspace can be terminated, and all of the data inside deleted.