Snowflake GCP Storage Integration Setup Guide
    • Dark
      Light

    Snowflake GCP Storage Integration Setup Guide

    • Dark
      Light

    Article Summary

    Overview

    This topic covers how to set up a storage integration for use in Snowflake instances on the Google Cloud Platform (GCP). Matillion ETL for Snowflake on GCP requires users to select a storage integration when configuring data staging components.

    A storage integration is a named Snowflake object that removes the need to pass explicit cloud provider credentials such as secret keys or access tokens. An integration object references a Google Cloud Storage service account.

    Note
    • Completing the instructions below requires access to your Google Cloud Storage project as a project editor. If you aren't a project editor, please contact your Cloud Storage administrator for assistance.
    • Only users with the Snowflake role ACCOUNTADMIN, or a role with the global CREATE INTEGRATION privilege, can execute the SQL commands mentioned in this topic.
    • Any identifiers not enclosed in double quotes ( " " ) aren't case sensitive. The example below includes this for demonstrative purposes; the created integration DocsTeam is passed by Snowflake as DOCSTEAM.

    Create a storage integration in Snowflake

    1. Log in to your Snowflake account.
    2. In either an existing Snowflake worksheet or a new worksheet, use the CREATE STORAGE INTEGRATION command.

    The form of the command is as follows:

    CREATE STORAGE INTEGRATION <integration_name>
    TYPE = EXTERNAL_STAGE
    STORAGE_PROVIDER = GCS
    ENABLED = TRUE
    STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/')
    [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]
    

    This uses the following variables:

    • <integration_name> The name of the new integration.
    • <bucket> The name of a Google Cloud Storage bucket that stores your data files.
    • <path> An optional file path used to provide granular control over objects in the Cloud Storage bucket.

    The parameters STORAGE_ALLOWED_LOCATIONS and STORAGE_BLOCKED_LOCATIONS let you specify any buckets that you wish to restrict access to, or block access from, respectively. As shown in the above template, locations are enclosed in single quotation marks, and multiple locations separated by commas. The STORAGE_BLOCKED_LOCATIONS parameter is optional.

    The following example shows that the command for the integration DocsTeam has been run successfully and the storage integration DOCSTEAM has been created.

    Storage integration command result

    This command will create a storage integration and explicitly limit external stages that use this integration to reference the specified bucket (the bucket name is blurred for security purposes).

    Additional stages that also use this integration can reference the permitted bucket.

    1. In your Snowflake worksheet, execute the GRANT command to set the usage of this particular integration to PUBLIC.
    GRANT usage on integration <integration_name> to ROLE Public
    
    1. In your Snowflake worksheet, execute the Describe Integration command to retrieve the ID for the Cloud Storage service account that was automatically created for your Snowflake account.

    You can abbreviate DESCRIBE to DESC when writing this command, and the term STORAGE is optional:

    DESC [STORAGE] INTEGRATION <integration_name>
    

    After executing the command, the results will look like this:

    Describe storage integration command result

    1. After you run the DESC command, navigate to the row labelled STORAGE_GCP_SERVICE_ACCOUNT in the property column. From the property_value column of this row, retrieve the ID (the string the red arrow in our image points at) for this Cloud Storage service. Copy this ID string, as you will require it later.

    As Snowflake clarifies in their documentation, they provision a single Cloud Storage service account across your entire Snowflake account. All Cloud Storage integrations use that service account.

    The next section focuses on granting permissions within the GCP console.


    Granting service account permissions to access bucket objects

    Follow the steps below to configure IAM access permissions in your Google GCP Console to use a Cloud Storage bucket to load and unload data:

    1. Log in to the Google Cloud Platform console.
    2. Click IAM & admin.
    3. Click Roles from the drop-down menu.
    4. Click + CREATE ROLE.
    5. Complete the Title, Description, and ID fields.
    6. Click + ADD PERMISSIONS.
    7. Add the following permissions:
      • storage.buckets.get
      • storage.objects.create
      • storage.objects.delete
      • storage.objects.get
      • storage.objects.list

    The image below provides an example.

    Example permissions

    1. Click Create (or Update if you are editing a Role).

    The next section focuses on assigning a custom role to the Cloud Storage service account.


    Assigning the custom role to the Cloud Storage service account

    1. Click the menu button in the top left of the GCP Console.
    2. Click Storage.
    3. Click Browser.
    4. Navigate to your Cloud Storage bucket or create a new bucket.
    5. Inside your bucket's details page, click the Permissions tab.
    6. Click the blue Add members button.
    7. In the Add members to form, enter your STORAGE_GCP_SERVICE_ACCOUNT ID obtained during Create a storage integration in Snowflake, earlier.
    8. Click the Role drop-down and select Storage > Storage Admin.
    9. Click Save.

    Using the integration in Matillion ETL

    1. In Matillion ETL, create a new orchestration job, and drag your chosen data staging component onto the canvas.
    2. Click on the data staging component, then click its Properties tab and set the following properties:
      • Staging: Select Existing Google Cloud Storage Location. The alternative, Snowflake Managed, is the default setting and will configure sensible defaults automatically.
      • GCS Staging Area: Select your Cloud Storage bucket.
      • Storage Integration: Select your newly created integration.

    Contact support

    If you require assistance creating a storage integration in Snowflake, or with any other aspect of Matillion ETL, please visit our Getting Support page.

    Please consult the Snowflake documentation to read deeper into these topics, for example, to learn how to modify an existing stage to use a storage integration.