Vacuum Tables
    • Dark
      Light

    Vacuum Tables

    • Dark
      Light

    Article Summary

    This article is specific to the following platforms - Redshift - Delta Lake.

    Vacuum

    Perform a vacuum operation on a list of tables. Vacuum is a housekeeping task that physically reorganizes table data according to its sort-key, and reclaims space left over from deleted rows.

    Vacuum can be a very expensive operation. Depending on your use-case, vacuum may be unnecessary. For more information on the situations where a vacuum may be necessary, please refer to the Amazon Redshift Documentation.

    For more information about the vacuum process, read Vacuum.

    When in doubt, vacuum is almost always used at the end of an Orchestration Job.


    Properties

    Redshift Properties

    PropertySettingDescription
    NameTextA human-readable name for the component.
    SchemaSelectSelect the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.
    Tables to VacuumSelectOne or more tables to Vacuum.
    Note: Only one vacuum may be running at any one time across an entire Amazon Redshift cluster. Therefore, vacuums may fail due to concurrent workloads. This is usually harmless if the same tables will be vacuumed again on the next run of the job. If this is the case, consider joining the "Failure" link of the component to an "End Success" to prevent vacuum failure from failing the whole job. See the example for more details.
    Vacuum OptionsSelectReclaims disk space occupied by deleted rows in a table. Additional options are available to the user (see AWS Documentation for detailed information).
    DELETE ONLY: Will not sort tables and is consequently quicker than other methods.
    FULL: Is equivalent to DELETE ONLY if the target table is >95% sorted, otherwise will perform a full sort.
    None: A default Vacuum operation. This is analogous to "FULL" in the current AWS implementation.
    REINDEX: Analyzes interleaved sort keys and performs a FULL sort.
    SORT ONLY: Sorts the table but does not reclaim disk space. Is quick at the expense of unclaimed memory.

    Delta Lake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    CatalogSelectSelect a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
    DatabaseSelectSelect the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
    Tables to VacuumTable SelectSelect which tables to vacuum.
    Retention PeriodIntegerThe retention threshold. The default is 7.
    Retention UnitSelectSelect the unit of retention. Choose from Day, Hour, Week. The default is Day.