This article shows you how to create and modify data packages for exchange.
Once created, data packages can be added to:
- a project for approval and exchange, or to
- a data listing to enable discoverability and visibility of the meta data
Access to a data package by users is always controlled and must be approved by a data custodian of the organization (i.e. data license approver).
What is a data package?
A data package references all files associated with a data exchange, not just data tables and views. You can reference images and documents in packages, as well as SQL code and your end-user agreement.
Note: Prior to adding or using a package in a project, if you make any changes to files or tables (such as renaming, deleting or moving them to another directory), you must update the file/table references in the package to avoid any referencing errors later on. You can make changes to the meta-data/references in a package by creating a new version.
In this article you will learn about
- Creating a package
- Adding files to a package
- Adding tables/views to a package
- Configuring a package for a Privacy Preserving Matching project
- Modifying packages
- Data package FAQs
You will have:
- Uploaded data onto the Data Republic platform;
- Created a database and tables; and/or
- Loaded data into the tables; and/or
- Created a data view
Creating a package
From the left navigation menu, click Manage Data.
On the Manage Data screen, click the Packages tab, then click Create new package.
On the Create new package screen, provide an overview (meta-data description) of the dataset.
Note: Once a package has been added to a project, the meta-data information provided will be visible to any user who has access to the project; however, users will not have access to the data itself in the project (unless approved). Any sensitive information you do not want to reveal outside of your organisation should not be provided:
- Give your package a descriptive name.
- Provide a description of what your package will contain.
- Provide a description of the expected data characteristics.
- Click the Create new data package button.
The package is created in draft status and a message displays to indicate the new package has been created.
Click the package name to edit the contents.
Should I add a table or file?
This depends on whether you are creating a data package for a listing in the Catalog and/or how the data recipient will want to work with the data in a project and what is allowed in the data license.
- To enable direct download of package contents once a license is approved in a project, add data as a file.
- To enable analysis of data in a Workspace later, it may be better to add the data as a table to the package. In a workspace, the user will be able to connect to a redshift database to query the data table using SQL Workbench, Python, R, Tableau, etc. If the user prefers to work with applications such as excel, you can also include the data as csv file in the package. You can also include supporting documentation in other file formats, such as pdf, .doc, etc.
- If you want to add the package to a Data Listing we recommend adding a table to the package. If you do not add a table to the package, the meta data (table schema) will not be visible in an approved data listing. Analysts browsing a data listing will want to view the meta data to determine whether the dataset may be useful for what they are trying to solve.
Adding tables or views to a package
Click Tables & Views to add some tables to the package.
To add a table or view to the package:
- Click the arrow next to a database to expand it.
- Select tables and views as required. They are listed lower down the screen under 'Package Inclusions' as you select them.
To add files to the package:
- Click Files.
- Click the arrow next to a folder to expand it and select a file. They are listed lower down the screen as you select them.
- Alternatively, select a folder to add all the files in that folder.
2. When you have added everything you need for the package, click Save Package.
Note: any tables/files added to the package are listed under 'Package inclusions'.
3. If you need to configure your package for a Matching project, proceed to the next step below. Alternatively, if you do not need to configure your package for matching and no further changes are needed, click Submit. From here, any changes will require a new version to be created.
4. The status of the data package will automatically change from 'draft' to ‘for review’ and then ‘approved’. The package can now be added to a project or data listing for exchange.
Configuring a package for a Privacy Preserving Matching Project
- On the packages screen, click Link token database. Refer to Configuring data packages for a Matching Project for next steps.
Why can't I edit the package? Configuration changes can only be made if a package is in draft status. If your package is not in draft status, click the 'Edit as new version' button on the packages screen to see the 'Edit token link' button.
2. Click Submit when you are ready to approve this version of the package. The status of the data package will automatically change from 'draft' to ‘for review’ and then ‘approved’ within a few minutes. The approved package can now be added to a project or data listing for exchange.
Modify an approved package
To update data in a table only:
- Simply, load a new data file to the table in the Databases tab from the Manage Data screen. If required, re-load the package to your workspace if you need to work with the updated table (data in the table will be replaced rather than appended). Note: you do not need to create a new version of the package to update data in a table within the package.
To modify files, tables or view references (i.e. meta data):
- A 'new version' of the package must be created. You will also need to create a new version to update the package if the name/location of a file, table or view has changed to avoid any referencing errors later.
To create a new version of a package (i.e. to modify file/table/view references):
- From the 'Manage Data' screen, click the 'Packages' tab and go into the package you need to modify.
- Note: the option to edit as a new version is only available if you navigate to Manage > Packages tab; it is not available if you access the package screen via a project. Only the latest version can be edited, the option to create a new version will not be available for older versions which have been superseded.
- On the packages screen, Click 'Edit as new version' to create a new version to edit.
- Modify the package as required, select or unselect any files, tables or views, then click 'Save package'.
- When you are ready to lock down the changes, click 'Submit'. The data package status will automatically change to 'Approved' within a few minutes.
- Note: If the package does not automatically change to approved status, please contact Data Republic Support. Once the package is approved, you can add the package in a project or listing.
Data package FAQs
How are data packages used in the Data Republic platform?
Data packages are created for data exchange on the Data Republic platform.
- They can be added to projects so that organizations participating in an exchange can negotiate the permitted use of data for their project. Within a project, a data license must requested and approved by participating organizations before access to data can be granted.
- Data packages can also be listed in a Catalog to enable discoverability by other organizations or authorized users within your own organization. Authorized users can request access to the data package via its listing and negotiate a data license for permitted use of the data package in their project.
How do users access data in a package?
Users can only access the data package if a license for permitted use of data has been approved in a project. Access to data can be provided in two ways:
- If a data license permits direct download of the contents of the package e.g. a data file, Data Republic will provide the data licensee (data recipient) with a link to download the dataset.
- If a data license permits access to the data package via a Workspace only, then a copy of the file, table or view referenced in the package is created and loaded to a Workspace for analysis.
Within a Workspace:
- Users can connect to a redshift database (using SQL workbench, R, python, Tableau, Power BI, etc) to query data tables
- Any files referenced in the package can be found in a local directory in the workspace.
Note: Any data output requested from a workspace must align with the permitted use and extraction rules agreed to in the data license.