This article provides a high level overview of the following:
- Data Republic Architecture
- Privacy-Preserving Matching Architecture
- What hosting platform does Data Republic use?
- Related articles
Data Republic’s platform enables organizations to confidently govern data collaboration projects with external parties by delivering comprehensive online licensing workflows, collaborative project spaces and secure analytics environments.
The diagram below, provides a high level logical view of the Data Republic ecosystem:
1. Data Republic Platform
Data Republic manages the governed access workflows required to safely exchange data between organizations. A view of the Data Republic Architecture is below:
Data Republic allows organizations to upload datasets into a fully managed cloud environment. Datasets are transferred via SFTP and stored on Hadoop. After tables and views are created and copied to Redshift, you will be able to query a copy of the approved dataset by connecting to Redshift within the workspace.
What is the underlying database Data Republic Platform sits on?
Data Republic stores datasets in encrypted storage, not in active databases. We use Amazon S3, HDFS and Redshift to provide an API to access approved subsets of the data for data analysis for access to Data Contributor-approved data. This data can be made available to Data Analysts via databases within their quarantined Workspaces.
For more information on Data Republic Security Architecture, please see the Data Republic Security White Paper.
2. Privacy-Preserving Matching
Privacy-Preserving Matching exists as a separate component of that platform, and is solely concerned with protecting customer PII while enabling anonymized dataset linkage. A view of how Privacy-Preserving Matching interacts with the Data Republic Platform is below:
Data Republic's Privacy-Preserving Matching service employs a 'Private by Design' approach by separating the anonymized data and systems handling PII.
In Privacy-Preserving Matching, the Data Custodian installs a Contributor Node in their organization's environment. This ensures that PII never has to leave the organization. The node assigns a randomized token and returns the token back to the contributor. The PII is normalized, salted, hashed, and distributed to the Matcher Nodes. As soon as a match request is authorised in Data Republic, an Aggregator Node communicates with the Matcher Node to generate lists of token pairs that may match.
Finally, the Aggregator Node filters out false positives and provides a final match table to Data Republic. A masked version of the token pair table is loaded into a Workspace for analysis.
What hosting platform does Data Republic use?
Data Republic's platform is hosted on Amazon Web Services (AWS). Their service features leading-edge security and uptime, helping us ensure that your data is safe and available for you and your users. We host in Amazon's Sydney (Australia), Singapore (Singapore) and North Virginia (USA) datacenters and deploy additional security features on top of the standard AWS platform. You can read more about Amazon's security policies and practices here: AWS Whitepaper.
Is it safe to store data on Amazon Web Services?
Using AWS (Amazon Web Services) to store data is safe. AWS provides storage for many of the leading web services today.
AWS’s industry-leading security is audited by independent third-party assessors to ensure it is compliant with the US Federal Government National Institute of Standards and Technology (NIST) guidelines 800-171, which were released in June 2015.