Salesforce Incremental Load
    • Dark
      Light

    Salesforce Incremental Load

    • Dark
      Light

    Article Summary

    This article is part of the series target="blank">Incremental Load Tools and Shared Jobs.

    Overview

    The Salesforce Incremental Load component lets users configure a pre-existing (read-only) Shared Job that will perform an incremental load from Salesforce. When the component is added to the Matillion ETL canvas, the Salesforce Incremental Load wizard is activated. Once this wizard is finalized, a unique component is added to the canvas, with a custom configuration provided by the user's setup choices.

    Users should schedule their incremental load jobs to run periodically for the job to continually update the created tables. To learn more about scheduling, read Manage Schedules.

    In the Components panel, type "salesforce incremental load" to locate the component and add it to the canvas.


    Salesforce Incremental Load Setup (Snowflake)

    1. Connection Details

    Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.

    Salesforce OAuth

    Here, users can select a configured Salesforce OAuth entry from the dropdown list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.

    In this example. an OAuth entry named "Salesforce" has been selected.

    Parameter | Value

    Next, specify any connection options. These exist as parameter=value pairs. To add a parameter, click . Users can consult the Salesforce Data Model for more information about connection options.

    Sync Deleted Records

    Check this box if you wish to synchronize record deletions from source data to the target table. By default, this box is not checked.

    Click Next.

    2. Data Sources

    Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

    Data Sources

    Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.

    By default, the right column is empty, and users must manually add the columns they wish to load.

    In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.

    Use the text fields above the columns to refine your searches.

    Click Next.

    3. Columns

    Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

    Select Columns

    Click to navigate into a similar multiple-select dialog that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

    Click Next.

    4. Staging Configuration

    Page 4 of the wizard requires users to specify details for data staging.

    PropertySettingDescription
    Staging Table PrefixStringSpecify a prefix to be added to all tables that are staged.
    Staging WarehouseSelectSelect the staging warehouse.
    Staging DatabaseSelectSelect the staging database.
    Staging SchemaSelectSelect the staging schema.

    Click Next.

    5. Target Configuration

    Page 5 of the wizard requires users to specify target data warehouse details.

    PropertySettingDescription
    Target Table PrefixStringSpecify a prefix to be added to all tables in the load.
    Target WarehouseSelectSelect the target warehouse.
    Target DatabaseSelectSelect the target database.
    Target SchemaSelectSelect the target schema.
    ConcurrencySelectSelect whether to load data in a concurrent or sequential method.

    Click Create & Run to finish the setup, or else click Back to cycle back through the wizard and make any desired changes.


    Salesforce Incremental Load Setup (Redshift)

    1. Connection Details

    Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.

    Salesforce OAuth

    Here, users can select a configured Salesforce OAuth entry from the dropdown list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.

    In this example an OAuth entry named "Salesforce" has been selected.

    Parameter | Value

    Next, specify any connection options. These exist as parameter=value pairs. To add a parameter, click . Users can consult the Salesforce Data Model for more information about connection options.

    Sync Deleted Records

    Check this box if you wish to synchronize record deletions from source data to the target table. By default, this box is not checked.

    Click Next.

    2. Data Sources

    Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

    Data Sources

    Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.

    By default, the right column is empty, and users must manually add the columns they wish to load.

    In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.

    Use the text fields above the columns to refine your searches.

    Click Next.

    3. Columns

    Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

    Select Columns

    Click to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

    Click Next.

    4. Configuration

    Page 4 of the wizard requires users to specify data warehouse details.

    PropertySettingDescription
    Staging BucketSelectSelect the S3 bucket from the dropdown list for data staging. The available buckets depend on the selected Redshift cluster.
    Staging Table PrefixStringSpecify a prefix to be added to all tables that are staged.
    Stage SchemaSelectSelect the Redshift schema via which tables will be staged.
    Target Table PrefixStringSpecify a prefix to be added to all tables in the load.
    Target SchemaSelectSelect the Redshift schema into which tables will be loaded.
    Target Distribution StyleSelectSelect the distribution style.
    1. All Copy rows to all nodes in the Redshift Cluster.
    2. Even Distribute rows around the Redshift cluster evenly. This is the default setting.
    ConcurrencySelectSelect whether to load data in a concurrent or sequential method.

    Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.


    Salesforce Incremental Load Setup (BigQuery)

    1. Connection Details

    Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load Generator, and requires basic connection information.

    Salesforce OAuth

    Here, users can select a configured Salesforce OAuth entry from the dropdown list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.

    In this example an OAuth entry named "Salesforce" has been selected.

    Parameter | Value

    Next, specify any connection options. These exist as parameter=value pairs. To add a parameter, click . Users can consult the Salesforce Data Model for more information about connection options.

    Sync Deleted Records

    Check this box if you wish to synchronize record deletions from source data to the target table. By default, this box is not checked.

    Click Next.

    2. Data Sources

    Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

    Data Sources

    Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.

    By default, the right column is empty, and users must manually add the columns they wish to load.

    In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.

    Use the text fields above the columns to refine your searches.

    Click Next.

    3. Columns

    Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

    Select Columns

    Click to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

    Click Next.

    4. Staging Configuration

    Page 4 of the wizard requires users to specify details for data staging.

    PropertySettingDescription
    Cloud Storage AreaSelectSelect the Cloud Storage bucket to stage the data.
    Staging Table PrefixStringSpecify a prefix to be added to all tables that are staged.
    Staging ProjectSelectSelect the staging project.
    Staging DatasetSelectSelect the staging dataset.

    Click Next.

    5. Target Configuration

    Page 5 of the wizard requires users to specify target data warehouse details.

    PropertySettingDescription
    Target Table PrefixStringSpecify a prefix to be added to all tables in the load.
    Target ProjectSelectSelect the target BigQuery Project.
    Target DatasetSelectSelect the target dataset.
    ConcurrencySelectSelect whether to load data in a concurrent or sequential method.

    Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.



    Completion

    Upon completion of the wizard, a Salesforce Incremental Load component will be present on the canvas. This component is not identical to the Salesforce Query component, but it does utilise the same ideas to perform an incremental load from Salesforce.

    To make changes to the component, simply click on it, and click into any of the component's properties—exactly as you would with any other Matillion ETL component. Users should avoid making changes to the component during a running workflow.



    Enable schema drift

    Schema drift support accommodates changes made to the source data, such as:

    • Missing columns as a result of changes in the Salesforce schema. Missing columns are loaded as NULL.
    • Data type changes for specified columns in the shared job configuration. For further information, see below.
    • Tables no longer present in the Salesforce schema. Any missing tables will no longer be loaded. However, your shared job will fail. All other tables specified as part of the configuration in the shared job will be loaded. If this scenario occurs, edit your shared job Table and Columns grid variable to remove the missing table.
    • Manual addition of new tables or columns via the Table and Columns grid variable within the existing shared job configuration. If a new table or column is added to your source, it is not added to the shared job configuration as a default behavior. However, any new tables or columns can be added manually.

    Data Type changes will also be accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as <column_name>_datetime and the column re-purposed as the new data type. The format of the datetime extension is _yyyymmddhhmmss, e.g._20210113110334 and will be the same for all columns in the same table in the same shared job configuration. The new column will be NULL up to the date of change—this should be considered for downstream dependencies such as views and reports.

    Upon completion of the wizard, view the list of the component's properties, and click next to Automatically Update Target Metadata. Delete "No" and replace it by typing "Yes" into the text field provided. Then, click OK to save the change, and enable schema drift support.



    Blacklisted Tables

    The following tables have been blacklisted:

    • AccountHistory
    • Announcement
    • ContactHistory
    • ContentDocumentLink
    • ContentFolderItem
    • ContentFolderMember
    • ContentHubItem
    • DataStatistics
    • DatacloudAddress
    • EntityParticle
    • FeedAttachment
    • FeedItem
    • FeedRevision
    • FieldDefinition
    • FlexQueueItem
    • LeadHistory
    • LeadShare
    • ListViewChartInstance"
    • Note
    • OutgoingEmail
    • OutgoingEmailRelation
    • OwnerChangeOptionInfo
    • PicklistValueInfo
    • PlatformAction
    • RelationshipDomain
    • RelationshipInfo
    • SearchLayout
    • SiteDetail
    • TaskFeed
    • TaskRelation
    • UserEntityAccess
    • UserFieldAccess
    • Vote


    Video