What is Automated Insights?
Automated Insight allows a user to automate delivery of pre-approved data pipelines, by running the code used to create the data output on a timed schedule. As the data pipeline is pre-approved and locked down, audit and approval by Data Republic is not required for subsequent extraction and delivery.
When is it used?
Automated Insights is used in scenarios where the data pipeline needs to be re-run with refreshed data to produce new insights on a regular basis.
How does it work?
Data Republic's system will automatically run pre-approved code on a scheduled basis, developed by the licensee in the Jupyter Notebook. When the data output is ready for download, a link is provided to the Data Licensee within the project.
Note: For the output to reflect the latest dataset, the Data Contributor is required to update the data table prior to the scheduled run date for each delivery.
This article will cover
- Prerequisites for using Automated Insights
- Data License requirements to allow use of Automated Insights
- How to enable Automated Insights
- Jupyter Notebook requirements
To use Automated Insights:
- Your data license will need to allow for use of Automated Insights - you will need to answer 'yes' to "Do you require automated insights?".
- We recommend that you build your data product first, so you can test it manually in the Workspace. Once you can confirm that the output is as expected, you can transfer your code to a preconfigured Jupyter Notebook to schedule the run and delivery. Data Republic will review and approve the Notebook prior to approving the schedule.
- Timing of the data refresh and output has been agreed and confirmed with the Data Contributor. Note: The data refresh will need to happen before the scheduled delivery date, otherwise the data output will not reflect the latest dataset.
Data License requirements
If you have an existing Data License for your data product:
- Update and edit the Data License to allow use of Automated Insights. Answer 'yes' to "Do you require automated insights" in the Data License.
- Submit the Data License for approval
If you do not have an existing Data License to create a data product:
- Create a Data License to request access to data for intended use to create your data product.
- Within the Data License, ensure you answer 'yes' to "Do you require automated insights?".
How to enable Automated Insights
- You must have an existing Workspace containing the code to create your data product. Your code must first be reviewed and approved by Data Republic. Contact Data Republic via the Project Conversation.
- This code will then need to be copied into a Jupyter Notebook file that has been configured to request and schedule automatic delivery of your data output. The 'Automated Insights' Notebook will contain all the instructions you need to submit your request.
How to request 'Automated Insights' Jupyter Notebook:
- On the Workspace request screen, ensure ‘Automated Insights’ is selected under the list of available software, along with any other software required to create your data product.
If you already have a Workspace that is running, submit a Workspace change request to add the 'Automated Insights' Jupyter Notebook to your Workspace.
How to use 'Automated Insights' Juypter Notebook
- Double click start_jupyter.bat on your desktop to open Jupyter and access your preconfigured Automated Insights Notebook.
- Follow the instructions in the Notebook to create and run your data pipeline. Ensure your pipeline produces the correct output if the data tables are updated.
- At the bottom of the Notebook, click ‘Submit’ when you are ready for your pipeline to be reviewed and locked down by Data Republic. Also, please notify Data Republic via Conversations in the Project that you have submitted your Notebook.
- Data Republic will review the data pipeline in the Notebook and reject or approve the request to schedule delivery.
- When the data product is ready for download, the user will receive a S3 link in the Project Conversation to download the data product. The link will be valid for an hour. Please contact Data Republic via the Conversation if you have any issues accessing the link.
- If you require a copy of the code for your data pipeline, this can be requested via the ‘Request Output’ button on the Workspace screen. This should be done before you ask for your Workspace to be shut down.
Jupyter Notebook requirements for Automated Insights :
- All code must be written in Python 3, or SQL executed by Python.
- Only standard library Python packages are available, unless otherwise requested (additional charges may apply).
- Only Python is supported. R is not supported for this release, and Python2.x will not be supported.
- Any table referenced in your code in Jupyter Notebook should be prefixed with ‘results_’. For example ‘results_tablename’.
- Only tables or views stored in the Redshift database can be analyzed, files within packages cannot be queried for analysis.
- All code to run must be included within the notebook, no external code will run.