Cloud

What is AWS Fault Injection Simulator?

AWS-Fault-Injection-Simulator-Blog
Follow us
Published on August 29, 2024

Quick Definition: The AWS Fault Injection Simulator (FIS) is a managed service that lets you run chaos engineering tests on AWS infrastructure. It finds weaknesses by simulating real-world failures, such as network outages and instance crashes.

Anyone with cloud experience knows one thing: complex systems can yield complex problems. Often, we can prevent these issues with unit and integration tests. However, large systems need to be tested in ways that are not always easy to replicate. 

Suppose you wanted to see what would happen if a cluster of EC2 Instances experienced latency. Doing so will allow you to gather a plethora of statistics you wouldn’t otherwise discover. The best way to do that would be with the AWS Fault Injection Simulator (FIS). 

FIS is a critical application used to test the resilience of an AWS ecosystem and find its strengths and weaknesses. There are several templates you can use to conduct experiments. Soon, you'll see why using it could harden your AWS security and improve its efficiency. Let's first highlight some of the most important FIS features.

Exploring AWS Fault Injection Simulator Features

FIS provides scenarios that allow you to test the upper and outer limits of your AWS configuration. It will enable you to carry out controlled experiments on your infrastructure to identify its strengths and weaknesses. 

FIS simulates the most common scenarios like: 

  • Network Disruption

  • Instance Failures

  • DNS Errors

You can simulate each one of these using a pre-existing template. An FIS template is a YAML file that simulates disruptions to your system, like this one

Simulate Real-Time Observability

AWS FIS does not just run its experiments; it allows you to observe them in real time. AWS services such as CloudWatch are integral for proper observation. CloudWatch allows you to set up alarms and metrics that will trip based on the experiment. So, you'll set up alarms and metrics, run the experiment, and then observe the metrics. 

Automated Testing

In cloud computing, automation is always the name of the game. Luckily, there are several ways to do that.

  • Amazon EventBridge (formerly AWS CloudWatch Events): Allows you to create scheduling rules. You can schedule EventBridge to trigger at any time of the day. This will trigger an AWS Lambda that will kick off the experiment. Often, IT professionals will do this during the weekend or other peak off-hours. 

  • AWS SDK: You can kick off a job using an SDK (Software Development Kit) as well. Python, Java, JavaScript, or pretty much any language will have an SDK available. Like EventBridge, you can schedule a cron job to start the experiment at any time. 

  • Continuous Integration: If you want your experiments to run on a regular basis, you can make them part of your build pipeline. Jenkins, Bamboo, GitLab, and AWS CodePipeline all support experiment integration. This allows you to run experiments as part of your deployment process automatically. This will test your infrastructure for resilience before it goes live.

These are just three ways of automating your tests. Always remember—the less human interaction, the better. 

AWS Service Integration

We've already discussed several integrations such as CodePipeline, EventBridge, and CloudWatch. Those aren't the only ones; FIS also integrates with infrastructure, such as in EC2 instances. To target a specific EC2 Instance, you'll need to declare it in your template. That would look like this:

# Initialize the FIS client
fis_client = boto3.client('fis')
# Define the experiment template
response = fis_client.create_experiment_template(
description='Test EC2 instance resilience',
roleArn='arn:aws:iam::account-id:role/fis-role',
targets={
'instanceTargets': {
'resourceType': 'aws:ec2:instance',
'resourceArns': ['arn:aws:ec2:region:account-id:instance/instance-id'],
'filters': [
     {
    'path': 'InstanceState.Name',
    'values': ['running']
    }
    ]
  }
 },
actions={
'stopInstances': {
'actionId': 'aws:ec2:stop-instances',
'parameters': {},
'targets': {
'Instances': 'instanceTargets'
            }
        }
    }
 )
# Start the experiment
experiment_template_id = response['experimentTemplate']['id']
fis_client.start_experiment(experimentTemplateId=experiment_template_id)

You'll need the boto3 SDK to use this properly. This script sets up and starts an FIS experiment that stops an EC2 instance. This simple test will help you check the resilience and recovery of your EC2 environment. 

As an FYI, to run this script, you'll need your AWS information. Things like your IAM credentials, instance ID, and AWS region, for instance. Remember, you can automate this script anywhere. You can use EventBridge, a cron scheduler, or AWS Lambda; the cloud's the limit.


Online Course
EARN A CERTIFICATION

AWS Certified Cloud Practitioner (CLF-C02)


  • 114 Videos
  • Practice Exams
  • Coaching
  • Quizzes

MONTHLY

$59.00

USD / learner / month

YEARLY

$49.91

USD / learner / month


How to Utilize AWS Fault Injection Simulator: Best Practices

To effectively use FIS, you should follow best practices. Let's go over each of them.

Start Small and Scale Gradually

Start small. Write an experiment to check the latency between two EC2 instances. Then, gradually increase the complexity. You may be tempted to write the exact experiment you want. 

The problem is that the templates become more complex as the ecosystem tested grows. So start by testing something very easy, and gradually add more to the experiment. As the adage goes, haste makes waste.

Define Clear Objectives

Know exactly what you want to test. Often, developers code first and ask questions later, producing a vague and inefficient experiment. Identify key metrics that you want to see (or not see). Make sure CloudWatch and EventBridge measure and schedule all the required data.

Always Integrate Monitoring and Alerting

Make sure you have some way of measuring the data. Use CloudWatch alarms and real-time tools like Splunk. Make sure everything is set up and prepared before running the experiment. Otherwise, it's like flying blind.

Document and Analyze Results

Keep detailed logs of each experiment so you can determine which experiments are beneficial and which ones aren't. Conduct a post-mortem analysis to determine the key problems with your system. Make sure experiments are never run and then abandoned. 

Pricing and Availability of AWS Fault Injection Simulator

FIS itself is inexpensive at .10 per action-minute. It's a fraction of a penny for each second the simulator runs. During the experiment, you will be charged for the resources you consume. FIS is not currently supported on the AWS free tier. 

Let's say you are running an experiment. For the experiment, you need a cluster of EC2 instances, CloudWatch, AWS Lambda, and all the subsequent services. AWS will charge a small fee for the invocation of each service. Use tags to identify each service used in the experiment.

Final Thoughts

AWS Fault Injection Simulator (FIS) is a must-have tool to improve your cloud infrastructure's resilience. FIS simulates real-world failures in a controlled environment so you can find and fix weaknesses before they affect your production systems. FIS can run both simple latency tests and complex, multi-service experiments to ensure your systems withstand and recover from unexpected disruptions.  

By following the best practices we shared, FIS can vastly improve your AWS environment. Start small, set clear goals, and add monitoring. This will make it more robust and reliable. As cloud systems grow more complex, tools like FIS are vital to keeping infrastructure stable and secure.

Want to become a Cloud Engineer? The AWS Cloud Practitioner is the perfect place to start.

Learn how to use EC2 Instances like a pro with this course.


Ultimate Cloud Cert GuideUltimate Cloud Cert Guide

By submitting this form you agree to receive marketing emails from CBT Nuggets and that you have read, understood and are able to consent to our privacy policy.


Don't miss out!Get great content
delivered to your inbox.

By submitting this form you agree to receive marketing emails from CBT Nuggets and that you have read, understood and are able to consent to our privacy policy.

Get CBT Nuggets IT training news and resources

I have read and understood the privacy policy and am able to consent to it.

© 2025 CBT Nuggets. All rights reserved.Terms | Privacy Policy | Accessibility | Sitemap | 2850 Crescent Avenue, Eugene, OR 97408 | 541-284-5522