Once a Workspace is approved and is in a Running state, connect to the Workspace to complete your analysis.
In this article you will learn how to:
- Connect to a Workspace
- Access files and tables in a workspace
- Connect to a rootless docker
- Collaborate between workspaces
- How to best use your database and your workspace's local resources
- Change your workspace set up
- Understand rules for data-use and extraction
Additionally, there is an FAQ if you have further questions
- you have created a project
- you have added people to your project
- your data license has been approved by all parties
- your Workspace has been approved and is running
- you have loaded data to your workspace
Connect to the Workspace
1. Navigate to your project’s Workspaces tab and click the Workspace name.
2. Take note of the Workspace reference number (DSW-xxx) and click Connect to this workspace.
3. The next screen lists all the Workspaces that you have access to.
4. Click on the Workspace reference number to connect.
Note: If the workspace reference number is not listed, you may not have been added to the workspace; click the 'Change workspace set up' button in the Workspace Summary screen to edit the list of users who can access the workspace, and submit your request for approval.
5. The workspace will launch as a new tab in your browser. No special firewall rules should be required, the access protocol is via HTTPS.
How do I change the screen resolution in my Workspace? See Workspaces FAQ.
- Please note that the screen in which you launch will adapt to your window if you have multiple monitors.
- If you move the Workspace window to a larger or smaller monitor please refresh your browser for the Workspace image to dynamically adapt, or snap to the edges if in fullscreen.
Accessing files and tables
Once data packages have been added to the Workspace:
- Files can be accessed in C:\Users\Public\Public Documents\ for Windows, or Desktop/Workspace/package_files/ for Linux; a new folder will be created if there are any files - the folder name will contain the date and time that the upload occurred.
- Tables can be accessed by connecting to the Redshift database using SQL Workbench, Python or R. Your Redshift database connection profile will automatically be configured. To view your Redshift connection profile, open SQL Workbench > Files > Connect window. You will only be able to access the database from within your secure Workspace(s).
Connect a rootless docker
Preparing a Container in your local environment
Once you have a container that you wish to use in the workspace on your local machine, open the Command Prompt/Terminal and type:
docker save <image name>:<tag> > <name>.tar
Loading a Container in your Workspace
When the image is in the workspace, in a shell, to load your new image, run:
docker load < <name>.tar
Now you can use this container as you would have on your local environment.
If you want to access the internet via the proxy, or redshift within the container, you can append the
--env-file /home/LinuxDswUser/docker.env param to your docker run.
Building Containers in the Workspace
You can build docker images as you would normally. The only caveat is images cannot be built in the home folder due to permissions issues, please create a subdirectory to host their dockerfile ie.
Keep in mind that any docker containers to be built must originate from images that are already on the workspace, and docker hub is not available.
Collaboration between Workspaces
Set up one or more Workspaces in a project to allow team members to collaborate in real time. (Note: Check with your project lead before requesting additional Workspaces as additional costs may be incurred).
Within a Project, users between Workspaces can:
1. Share files between Workspaces by adding them to the Project Drive (total drive storage is 2GB).
- The Project Drive can be accessed via a shortcut on the desktop or File Explorer. Any user in a Workspace can read, modify or delete any files added to the Project Drive.
- If there is only one Workspace in your Project, the Project Drive will function like any other local drive in your Workspace as there will be no other Workspaces to share files with.
- If you have workspaces with different types of hardware (i.e. Windows and Linux) in the same project, they can collaborate using the Project Drive.
- I don't have a Project Drive, how do I request one? You will need to submit a request to add the Project Drive. Save what you are working on in the Workspace, then submit a request to add the Project Drive: to do this, click 'Change Workspace setup', do not make any changes and check the boxes to agree to the terms before clicking Submit. Your request will automatically be processed and approved (the status of your request will be displayed on the top right of the Workspace details screen). Your Project Drive should be available on the same business day. Connect to the Workspace, log out of Windows and log back in - the Project Drive should be visible. You will need to complete this step for any other Workspaces in your project that do not have the Project Drive.
2. Query and write to the same Redshift database from any Workspace within your Project, using languages such as SQL, Python or R. Workspaces created in the same Project will automatically connect to the same Redshift database. Open SQL Workbench to view your Redshift database connection details, and use that credentials to connect to your Redshift database from various analytics software.
To learn more about Workspaces, refer to the Workspace FAQ
How to best use your database and your workspace's local resources
In your Workspace you have access to the compute resources of your connected redshift database, and the compute resources of the workspace itself. As such, different types of analytics will prefer using the different types of compute resources. Our suggestions are as follows:
- The redshift database's compute resources dwarf that of the workspace itself and so should always be your first port of call when conducting any data processing
- That said, the redshift database only responds to SQL commands. Whilst you can connect to the database from various analytics software, these software and their database libraries will still only be able to conduct analytics that is possible using SQL. For example, you are typically not able to create and train a machine learning model using SQL, but you can quickly and easily aggregate data or create new features
- In these instances where you cannot use SQL for your analysis, you must use your workspaces local resources by reading the data into workspace memory or disk (such as with a Python pandas or R data frames). But this does not mean your redshift database is now useless!
- For optimal results (and where possible with your analytics), best practice is to do all pre-processing of your data in redshift before you need to work using workspaces local resources (e.g. training your model). From there, and if your model or analytics being conducted allows incremental learning, please read incremental batches of data from the redshift database into memory, wiping each batch of data from memory after each incremental training is complete. This should be the most memory optimised way of using the workspace
- If the above is not possible, or you require further local compute resources (e.g. you are running out of memory), you may need to upgrade your workspace by changing your workspace setup. If you are a guest of another organisation, please reach out to them to suggest and approve an appropriate upgrade
Change your Workspace setup
Request a change in your workspace set up if you need to:
- Add users
- Change the workspace size / spec
Please refer to Custom configuration of an existing Workspace for more information.
To change your Workspace setup:
- From the Workspace summary page, click Workspace setup
- Follow the prompts to edit the workspace set up and submit your request for approval. Remember to read and agree to the terms and conditions before submitting your request. Requests are generally approved within 1 business day.
Rules for data-use and extraction
Prior to running your analysis, it is important to read the rules for data-use and extraction in the license (found in the license tab of your project) to ensure:
- You understand conditions of permitted use of data, including how the data should be prepared for extraction (if permitted) and if there are any rules you need to apply to your queries (for e.g. the data custodian may ask you to apply rules of aggregation to mitigate any risk of re-identification)
- Comments should be included in your queries to demonstrate that you have applied the requirements of the permitted use in your data license. This will help Data Republic to quickly approve your data extract request.
How do I change the screen resolution in my Workspace?
Workspace resolution is primarily controlled by the amount of zoom you have enabled on your web browser and the size of the browser itself.
- Check that the zoom % you have on your browser is set at 100%.
- Refresh the browser tab displaying the Workspace. Refreshing the browser resets the workspace display resolution to the new size & zoom of your browser.
- If you resize your browser window, please refresh the browser again for the workspace to display correctly. To refresh your browser:
• Mac: Command + R
• Windows: Control + R, or
• From the browser menu: View > ‘Reload this page’