If you already have an Amazon Web Services (AWS) account and use S3 buckets for storing and managing your data files, you can make use of your existing buckets and folder paths when loading data into Data Vault drive for files and database tables.

Loading data from an S3 bucket is performed in two steps:

  • Step 1: Stage the data files in an S3 bucket. If they haven’t been staged yet, use the upload interfaces/utilities provided by AWS to stage the files.

  • Step 2: Use the AWS S3 copy command to load the contents of the staged file(s) into a Data Vault drive file folder or database table. You can load directly from the bucket, but we recommend creating an external stage that references the bucket and using the external stage instead.

This article describes how to configure secure access to Data Vault for S3 to S3 Copy to load data from your Amazon S3 bucket into your organisation’s Data Vault drive. You can then reference them in data packages within the Data Vault UI.

Configuring a Secure Access to your organisation’s Data Vault drive

To write data from an AWS S3 bucket into a Data Vault drive, the security and access management policies must allow you to access the Data Vault drive’s bucket.

To configure secure access to the Data Vault drive’s bucket, you need to:

  • Configure an AWS IAM role with the required policies and permissions to read and write to the Data Vault drive’s bucket. This approach allows individual users to avoid providing and managing security credentials and access keys.

Configuring an AWS IAM Role to access Data Vault drive’s bucket

This section describes how to configure an IAM role for Data Vault to be accessed from your AWS account in a secure manner.

As a best practice, limit S3 bucket access to a specific IAM role with the minimum required permissions. The IAM role is created in your AWS account along with the permissions to access your S3 bucket and the trust policy to copy into the Data Vault drive’s bucket.

Note: Completing the instructions in this article requires administrative access to AWS. If you are not an AWS administrator, ask your AWS administrator to perform these tasks.

Step 1: Configure S3 Bucket Access Permissions

AWS Access Control Requirements

Data Vault requires the following permissions on the Data Vault drive’s bucket and folder to be able to access files in the folder (and any sub-folders):

  • s3:GetObject

  • s3:GetObjectVersion

  • s3:ListBucket

  • s3:PutObject

  • s3:PutObjectAcl

  • s3:DeleteObject

  • s3:DeleteObjectVersion

  • kms:Decrypt

  • kms:Encrypt

  • kms:GenerateDataKey

  • kms:DescribeKey

As a best practice, we recommend creating an IAM policy to access the Data Vault drive’s bucket. You can then attach the policy to the role and use the security credentials generated by AWS for the role to access files in the bucket.

Creating an IAM Policy

The following step-by-step instructions describe how to configure access permissions in your AWS Management Console so that you can use the Data Vault drive’s bucket to load and unload data:

  • 1. Log into the AWS Management Console.

  • 2. From the home dashboard, choose Identity & Access Management (IAM).

  • 3. Choose Account settings from the left-hand navigation pane.

  • 4. Expand the Security Token Service Regions list, find the AWS region corresponding to the region where your account is located, and choose Activate if the status is Inactive.

  • 5. Choose Policies from the left-hand navigation pane.

  • 6. Click Create Policy.

  • 7. Click the JSON tab.

  • 8. Add a policy document that will allow Data Vault to access the S3 bucket and folder.

    The following policy (in JSON format) provides Data Vault with the required permissions to load data using a single bucket and folder path. You can also purge data files using the PURGE copy option.

    Copy and paste the text into the policy editor:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:GenerateDataKey",
"kms:DescribeKey",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<data-vault organisation bucket>",
"arn:aws:kms:<kms key for data-vault organisation bucket>"
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::<data-vault organisation bucket>",
"arn:aws:s3:::<data-vault organisation bucket>/files/*",
"arn:aws:s3:::<data-vault organisation bucket>/databases/*"
]
},
{
"Sid": "VisualEditor3",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:DeleteObjectVersion",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<data-vault organisation bucket>",
"arn:aws:s3:::<data-vault organisation bucket>/files/*",
"arn:aws:s3:::<data-vault organisation bucket>/databases/o_*/*"
]
},
{
"Sid": "VisualEditor4",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::<source bucket>/*",
"arn:aws:s3:::<source bucket>"
]
}
]
}

Note:

  • Make sure to replace

    • <source bucket> with your actual bucket name.

    • <data-vault organisation bucket> with your actual Data Vault drive’s bucket name that you can find on a database details page.

    • <kms key for data-vault organisation bucket> with your actual KMS key that you can find on the Data Vault S3 connector page.

  • Setting the "s3:prefix": condition to either ["*"] or ["<path>/*"] grants access to all prefixes in the specified bucket or path in the bucket, respectively.

Note that AWS policies support a variety of different security use cases.

  • 9. Click Review policy.

  • 10. Enter the policy name (e.g. sandbox_data_vault_drive_access) and an optional description. Then, click Create policy to create the policy.

Step 2: Create an AWS IAM Role

In the AWS Management Console, create an AWS IAM role that grants privileges on the Data Vault drive’s bucket containing your data.

  1. Log into the AWS Management Console.

  2. From the home dashboard, choose Identity & Access Management (IAM).

  3. Choose Roles from the left-hand navigation pane.

  4. Click the Create role button.

  5. Select AWS service as the trusted entity type.

  6. Select S3 as use case.

  7. Click the Next button.

  8. Locate the policy you created in Step 1: Configure S3 Bucket Access Permissions (in this article), and select this policy.

  9. Click the Next button.

  10. Enter a name and description for the role, and click the Create role button.

    You have now created an IAM policy for the Data Vault drive’s bucket, created an IAM role, and attached the policy to the role.

  11. Record the Role ARN value located on the role summary page.

Note: This AWS IAM Role ARN needs to be saved against the Data Vault S3 connector.

Copying Data from an S3 Stage

Load data from your staged files into the target folder or table.

Copying unstructured data (files)

Execute AWS S3 copy command to load your files data into the target folder:

aws s3 cp s3://source-bucket/<folder> s3://<organisation-data-vault-drive>/files/<folder> --recursive --request-payer

Copying structured or semi-structured data (database tables)

Note: A Data Vault database needs to be created from the Data Vault UI first.

Execute AWS S3 copy command to load your table data into the target table:

aws s3 cp s3://source-bucket/<source-database>/<source-table> s3://<organisation-data-vault-drive>/databases/<data-vault-database>/<data-vault-table> --recursive --request-payer

Did this answer your question?