In this article, you will learn about:
- Overview of how the platform works
- Your responsibilities as an Analyst
- Best practices for data analysis and auditing
Overview of how the Data Republic platform works
Using the Data Republic platform, analysts may upload datasets onto the Data Republic Platform and prepare data packages for exchange. Analysts may also be invited to participate in projects and analyze approved datasets in Workspaces. Analyses can be conducted using the default or optional analytical tools to create models.
Your responsibilities as an Analyst
Prepare data for exchange
Data upload and management can be carried out if you have the appropriate permissions on the Data Republic platform:
- Upload data via SFTP or the Data Republic platform
- Create a database (to enable approved datasets to be queried inside analytical workspaces)
- Add tables to your database (define the table schema)
- Load data into the tables
- Confirm that the data loaded correctly
- Create data views using the view builder. The view builder assists you when analyzing data that has been aggregated and kept inside the database. You can create, alter or drop (or delete) views. You also can run SQL queries and build many different views combining across one or more tables.
- Create data packages for exchange. A data package references all files associated with the data contribution, not just data tables and views.
Prepare data for Privacy Preserving Matching Projects
- Tokenize the PII for a matching project using your organization's Contributor Node.
- Follow the steps in “Prepare data for exchange” to ensure your data is ready for a matching project on the Data Republic platform.
- Once the steps are complete, you can create the data packages and configure them for Matching.
Analyze data on the Data Republic platform
Access data on the Data Republic platform using a Workspace.
- A Workspace is a segregated analytical environment created within a project after your data license has been approved by all parties to the license. A workspace is a Windows or Linux virtual machine loaded with approved datasets.
- After the data license is approved, request a workspace.
- Request access to the workspace by adding yourself as a user.
- Complete data analysis using your preferred analytical software.
- Ensure you apply any special conditions in your analysis required by the data license.
Best practices for data analysis and auditing
As the Analyst, you are responsible to ensure that the output you are creating aligns with the agreed upon permitted use. If required by the data license, Data Republic will perform a manual audit to confirm that the output matches what was outlined in the permitted use.
Request an output extract from the Workspace, when your analysis is complete.
Shut down the workspace by creating a request through the UI for the relevant Workspace once the analysis is complete and is not recurring.
Ensure that SQL queries are optimised to avoid long-running queries when possible. Here are some recommendations to help ensure your queries run optimally:
- optimise all base tables to speed up querying time when using these tables
- splitting up your queries into smaller ones (i.e. put subqueries into their own optimised tables)
- test your queries on sample data to ensure that the query works as expected on smaller datasets