How to Receive Matillion Alerts in Teams with WebhooksOctober 17, 2023
How to Automatically Shut Down an Azure Matillion Instance After a Schedule FinishesNovember 15, 2023
Matillion customers, in their effort to optimise credit consumption, are eager to reduce unnecessary costs by minimizing the uptime of their instances. One particularly tricky aspect of this optimisation is managing instance shutdown after a routine schedule has completed, be it a successful or failed run. Unfortunately, Matillion doesn’t offer an inherent feature to automatically switch off instances as part of a pipeline. Furthermore, the execution duration of these schedules can vary due to factors like data volumes and the day of the week, making it impractical to implement a fixed-time shutdown. Consequently, a flexible alternative solution is required. The configuration process for enabling this functionality is slightly different between AWS and Azure.
This blog will cover the steps for AWS; the steps for Azure can be found here
The “Death Loop” issue discussed below is relevant to any instance: AWS, Azure or other.
The “Dead Loop” Issue
Before delving into the steps for enabling this functionality, it is crucial to address an issue concerning VM deallocation during a running job. Consider this scenario: your nightly schedule is running, all jobs complete (regardless of success or failure), and you want the last component in your pipeline to deallocate the VM (we’ll cover how to create a deallocate component in the following sections). Matillion will expect the deallocation component to return a success or failure response, like any other component, before it can mark the running task as complete. But, the deallocation component will never be seen to complete by the Matillion task manager due to the server deallocating in that instant. Consequently, when the VM is switched back on, the task scheduler detects the job didn’t fully complete and automatically resumes the job from where it left off, which was at the “Deallocate Server” component. As a result, this will enter the VM into what I like to call a “death loop” where the VM repeatedly switches itself off every time it’s turned on. Breaking this loop is challenging, but this approach avoids this problem by decoupling the deallocation from the scheduled job. The key to the solution is to do the deallocation by calling a bash script for deallocation instead of putting the deallocation command in an embedded bash script in Matillion. Below are the steps to achieve this.
Step 1: Assigning a Role to the Instance
Firstly, an AWS role withs the ability to turn off the instance needs to be created and given to the Matillion EC2 instance.
- Create a policy in AWS via IAM by selecting ‘Create policy’ in the Policies page.
- Select ‘EC2’ as the Service.
- Search for and select the ‘StopInstances’ Action.
- We will want to restrict this to only work for the specific Matillion instance so select ‘Add ARNs’. In the pop-up choose the appropriate account radio box and enter the resource’s region and ID.
We will want to restrict this to only work for the specific Matillion instance so select ‘Add ARNs’. In the pop-up choose the appropriate account radio box and enter the resource’s region and ID.
- Feel free to add request conditions such as the requester’s IP address being the Matillion IP. Click ‘Next’.
- Provide a Policy name, then create the policy.
- Next, we need to create a role to assign the policy to. Select ‘Create role’ in the Roles page.
- Select the ‘AWS service’ Trusted entity type and ‘EC2’ as the Use case. Click ‘Next’.
- Search for and select the Policy created in the previous steps. In my case, this is ‘EC2StopInstancePolicy’. Click ‘Next’.
- Provide a Role name, then create the role.
- Lastly, we need to assign the newly created role to the Matillion EC2 instance. Head to the EC2 Dashboard, and then to the Instances page.
- Select the Matillion instance, in the top right click ‘Actions’ > ‘Security’ > ‘Modify IAM role’.
- Select the Role created in the previous steps and click ‘Update IAM role’.
Step 2: Installing the AWS CLI
The AWS CLI is a powerful tool for interacting with the AWS Cloud Platform in various ways. Here, we will use a simple CLI command to deallocate an EC2 instance. You will need to install the AWS CLI on the Matillion VM, which can be done by following this installation guide.
Step 3: Creating a Deallocate Bash Script
Create a file with the below script in the following directory:
by SSHing into the VM and ensure that the centos user owns the file.
sleep 30 aws ec2 stop-instances --instance <Your Instance ID>
The first command sleeps for 30 seconds to ensure that the Matillion schedule has enough time to complete safely before the VM is deallocated. The second command executes the VM deallocation using the AWS CLI. It is worth mentioning that if you have a separate production Matillion instance in a different AWS account, the above steps will need to be redone in that account, and the new instance ID will need to be used in the deallocate_server script.
Step 4: Implementing in Matillion
From here, we will use a Bash Script component to execute the above deallocate_server script. A wrapper job will be needed around your main pipeline where you can attach a Bash Script component to the end of the pipeline (this wrapper job will be the one run by your Matillion schedule). Important: the flow from the main pipeline (in this case e2e_nightly) will need to be unconditional (grey) so that the server is turned off regardless of whether the pipeline was successful. Otherwise, your VM will stay on in the event of a pipeline failure if the Bash Script is only set to execute when the main pipeline is successful (unless you have perfect pipelines… 🤔).
Within the Bash Script, place the below command which will execute the deallocate_server script that we created on the VM in step 2.
sh /home/custom_scripts/deallocate_server >/tmp/deallocate_server.log &
Crucially, the ampersand symbol (&) at the end of the command enables the command to be executed without waiting for the script to finish. This allows the Bash Script component to immediately flag as completed in the eyes of the Matillion task scheduler, and therefore the schedule will be marked as complete. This avoids the aforementioned “death loop” as there is no dependency on the deallocation commands completing before the Matillion schedule can finish. Additionally, the script exports the output of the deallocate command to a log file for auditing purposes.
The solution proposed in this blog uses the AWS CLI to deallocate your Matillion VM by simply running a Bash Script component. It should be noted that there are a number of alternative ways to achieve this, such as using message queues to trigger a cloud function to shut down the VM, which is equally valid.
Once you have this deallocation functionality configured, you can rest assured that your Matillion VM will dynamically shut down once your schedule completes. Please feel free to reach out to me on LinkedIn or drop a comment on this blog if you have any further questions.