Hema working in cognizant for more than 4.5 years as a Performance test engineer. She is  a person with knowledge on Operation Acceptance Testing and conducted various Live DRs , Integration resilience and conducted various aspects of non-functional system testing. Interested in concepts of High Availability and how companies are evolving from huge traditional servers to the server less cloud computing concepts. Implemented an Observability framework using a combination of tools such as Telegraf, InfluxDB , Grafana, and Zipkin .

Experimented chaos concepts on microservices & Kubernetes using open source chaos testing tools.

Test your Infra Resilience in a Boring way or in “The Best way”

Huge amount of effort is required to test your Infrastructure resilience.

Is it possible to shrink the timeline of your testing as well as the resource costing?

Traditional way of Resilience Testing requires a lot of resources and support from the respective team, when so many resources are required to perform the test, it increases the cost & dependencies. We all know how important timelines are when it comes to the deliverables. We should always have to coordinate with the respective teams to be in sync. otherwise, our ship will drown. In other words, testing might not give you the best results in a stipulated time.

How it's being tested all these days -traditional Way

We will prepare the test plan, check for resources availability, explain the testing procedures to them & assign them the tasks to perform the execution as per the test plans. Finally, we will do our analysis. The test completion purely depends on the availability of the resources and if one person is not available it will completely block our pipelined actions, thus affecting the timelines & deliverables.

Have you ever thought of trying it with open source where we have access to the environment?

Yes, our solution is to achieve it without any dependency on the resources, save some cost, reduce the time & effort.

One time Installation and configuration of the agents or the driver on the infrastructure i.e., Hosts, Kubernetes, Containers, Bare metals is required. Execute the test cases using the commands or using dashboards & get the report for the conducted test using any monitoring tool. Here you can add the flavor of Chaos engineering to your infrastructure only if required. Most of us go with a system resilience test with the system or service shutdown. But what if our system crashes because of CPU, Disk, Network. Using this framework, we can inject a variety of scenarios with a combination of attacks & prove the resiliency without any support.