API Profile Example - Jira Cloud
    • Dark
      Light

    API Profile Example - Jira Cloud

    • Dark
      Light

    Article Summary

    Overview

    This page contains a worked example of the RSD format information found in API Profiles - RSDs. This example uses the Jira Cloud REST API.

    Note

    Your Matillion ETL instance must be able to access the API endpoint https://yourcompany.atlassian.net over the network. You can test the connectivity using the Network Connectivity Test shared job, which you can download from Data Stagers - Support. If Matillion ETL does not have network connectivity to the API endpoint, you must get this resolved with your network administrator before continuing.


    Step 1 - Find the API documentation

    Atlassian publish this online guide to their Jira REST APIs.

    The Atlassian documentation answers the following questions:

    • "How do you authenticate?" — Using a username and token, described in this article.
    • "How do you supply parameters?" — GET and POST methods are both supported. In this example we'll use GET parameters in the URL.
    • "Does the API provide data at the necessary granularity?" — The Jira Cloud API is large, and this example demonstrates querying issues, and the return format is JSON with embedded arrays, as documented here.
    • "Does the API use paging, and if so, how does it provide the information needed to page through the result set?" — Yes, this is an example of an 'offset' pager and is documented here. You can use the startAt, maxResults, and total parameters to extract the full set of data.

    Step 2 - Load the first page

    For convenience it's often best to use the Matillion ETL server itself to extract and load the first page of data.

    To get started, SSH into your Matillion ETL server. It will save some typing later if you set up environment variables containing your company name and security credentials.

    export mycompany=YourCompanyNameHere
    export mycredentials=YourName@YourCompany:YourJiraToken
    

    You can verify network connectivity to the API endpoint using the nc command:

    nc -w 5 -v "${mycompany}.atlassian.net" 443
    

    You should see a success message.

    Connectivity success

    If you see errors such as nc: getaddrinfo: Name or service not known or (tcp) timed out: Operation now in progress, you should contact your network administrator to make sure Matillion ETL has been granted network access to Jira Cloud.

    If connectivity was successful, use the following command to extract the first page:

    curl -k -o page1-raw.json https://${mycompany}.atlassian.net/rest/api/2/search?maxResults=100 --user "${mycredentials}"
    

    This command will then let you check the output file:

    ls -l page1-raw.json
    

    It will be helpful to prettify the JSON so it's more easily human-readable, using this command:

    jq "." < page1-raw.json > page1.json
    

    If you run an ls command, you should now be able to see two files: the raw (page1-raw.json) and the prettified (page1.json) JSON.

    If you need to copy the prettified JSON file onto your local workstation, you can do so with an scp command such as this:

    scp -i YourKey.pem YourUser@YourIP:YourFolder/page1.json .
    

    Open the page1.json file in your preferred editor to verify that the data you have extracted looks like the example from the Jira Cloud API documentation.

    page1.json

    Now you are ready to upload this as a JSON file into a new API Profile. Follow the steps in Manage Query Profile to do this. Generate a new RSD script as described in that article, using page1.json as the sample file, with a table named t1 and /issues as the repeating element.

    Once the new RSD file has been created, remove the rsb:script sections associated with INSERTS, UPDATES, and DELETES, then click the Test button. A table called t1 should appear at bottom left. Click t1 and the data from that table should appear at bottom right.

    You should now take the opportunity to refine which fields you want to see. Jira issues have a lot of fields, so you can remove any that you don't intend to use. Do this by editing the elements within the rsb:info near the beginning of the file.


    Switch the RSD to the real URI

    Once you reach the step where the first page of data is appearing correctly, you need to switch to using the actual API instead of the sample file, by editing your API Profile as described in Manage Query Profile .

    In the RSD definition, replace the "uri" value with the actual API endpoint:

    Changing the endpoint uri

    Now that the API query will use the real endpoint, you will need to add authentication. The Atlassian API documentation indicates that Jira Cloud's token method uses HTTP basic authentication, so you can add the two necessary HTTP parameters: User and Password.

    After saving the changes, test the API again and verify that it still retrieves the correct data, at the required granularity.


    Step 3 - Establish Paging

    The first two steps are common to all paging implementations.

    1. Add a new input named Rows@Next to the list of columns inside the existing rsb:info block, after all the other declarations.
    <input name="Rows@Next" desc="Identifier for the next page of results" />
    
    1. Add the following to the rsb:script section just after the rsb:info block:
    <rsb:set attr="EnablePaging" value="TRUE" />
    

    According to the Jira Cloud API documentation, pagination is performed by the client using the startAt parameter for all pages after the first.

    The documentation also warns that the consumer needs to be careful with the row counts:

    Note

    The actual number of items returned is an implementation detail and this can be changed over time. You can ask for more than you are given.

    In other words, the actual number of rows returned may differ from the number requested, even between pages. So you need to implement a client-side record counter.

    To declare the counter, add this line after the EnablePaging added earlier:

    <rsb:set item="userns" attr="rowcounter" value="0" />
    

    To make it count the records as they are returned, add the following two lines together between the jsonproviderGet and the rsb:push/:

    <rsb:set item="userns" attr="rowcounter" value="[userns.rowcounter|add(1)]"/>
    
    <rsb:set attr="Rows@Next" value="[Rows@Next|add([userns.rowcounter])]" />
    

    The URL will need to change from page to page, so instead of hard-coding it as you did previously, you must replace the fixed <rsb:set attr="uri" ...>  with a conditional block like this:

    <rsb:check attr="Rows@Next">
    <rsb:set attr="uri" value="https://matillion.atlassian.net/rest/api/2/search?startAt=[_input.Rows@Next]&amp;maxResults=100" />
    <rsb:else>
    <rsb:set attr="uri" value="https://matillion.atlassian.net/rest/api/2/search?maxResults=100" />
    <rsb:set attr="Rows@Next" value="1" />
    </rsb:else>
    </rsb:check>
    

    Note that special characters in the URI must be replaced by an appropriate XML character reference. These include:

    CharacterReplacement
    &&
    <<
    <>
    Space 

    Test the API inside the Manage API Profiles editor again, to make sure it's still working. It's not possible to verify the paging in this way because it only shows the first 50 records. But assuming you still see some data, you can move on to using an API Query component in an Orchestration job.

    Switching to an API Query component

    Add an API Query component to a new, empty orchestration job, and fill in your profile name, data source, and so on.

    In the connection options you will need to add your authentication details again, by adding settings for User, Password and AuthScheme. It's fine to hard-code these for now: they will be parameterized later as part of productionization.

    You may find a default limit of 100 on the component. If so, remove it, leaving that setting blank.

    When you run the component, it should automatically page through all the Jira Issues, and create a database table containing the columns you chose in the API profile. Depending on your data volumes, this step may take several seconds to complete.


    Step 4 - Productionize

    By this step of the process you should have a fully working API profile, but using hard-coded authentication parameters.

    It's bad practice to leave hard-coded authentication parameters in place, for two reasons:

    1. It's insecure. Anyone logged into Matillion ETL can read the credentials in plain text, so they need to be properly protected.
    2. Authentication parameters usually expire at some point, so the existing API profile has a limited shelf life.

    To proceed, follow the directions in Using Parameters With API Profiles, which describes how to use parameters with API profiles.

    Matillion ETL's password manager function is not available for use with API profiles, so you may choose to follow Using KMS encrypted passwords in Python, which describes how to implement your own password management functionality to protect your private API password.