Ramya Moorthy

Speaker

Ramya Moorthy

A seasoned & dynamic QA Professional with 18+ years of experience, specialized in Performance Testing/Engineering & Security Testing. She has expertise in providing technical consulting to clients across business domains for analysing and assuring the web system for its performance, scalability, high availability, resiliency, server sizing, capacity planning & security.

She is a consistent top rated performer, an inspirational leader known for her systematic & strategic thought process & problem solving capabilities. She is an EC-Council certified Penetration Tester (with CEH & ECSA certifications) with experience in doing Web Application & Network VAPT assessments.A seasoned & dynamic QA Professional with 18+ years of experience, specialized in Performance Testing/Engineering & Security Testing. She has expertise in providing technical consulting to clients across business domains for analysing and assuring the web system for its performance, scalability, high availability, resiliency, server sizing, capacity planning & security.

She is a consistent top rated performer, an inspirational leader known for her systematic & strategic thought process & problem solving capabilities. She is an EC-Council certified Penetration Tester (with CEH & ECSA certifications) with experience in doing Web Application & Network VAPT assessments.

Title:The Know-Hows in Resilience & Reliability Testing for building an anti-fragile & highly scalable system

Abstract:

As we continue to experience the multiple pandemic waves, we quickly learn to adapt and sustain to new ways of living while building on the opportunities manifested by the crisis. Today, even IT systems are expected to have a high user expectation to be ‘Always-on’, make chaos a part of Business-As-Usual (BAU) while ensuring a healthy sustenance to overcome the failures.

The shift in the user expectation demands a rigorous move towards resilience engineering practices to build anti-fragile applications. Smart balancing of proactive (shift-left) and post-production (shift-right) testing strategies are required to ensure the application is designed and built with resiliency first, instead of considering reactive chaos testing as a quality gate ritual for production release.

To facilitate early understanding of system recovery characteristics, we need to develop a resilience culture to experiment and fail fast, early in the life cycle and implement chaos testing/engineering activities as a part of delivery pipeline. By provisioning a robust observability platform with intelligence (AI/ML) to trace, correlate and report the failure impact on the critical business processes, the product engineering team gets the opportunity to proactively detect, diagnose, and improve the system resiliency.

Our approach towards low-blast radius focused early chaos testing combined with ongoing failure attacks (across application, network, and infrastructure) on production environment has helped systems to recover from real time failures while reducing MTTR by more than 50%. Resiliency further improves over time as more intensive failure experiments are carried out in production as well as test environment and self- healing automation is employed.

Managing stringent SLOs to meet the ever-increasing user demand though continuous monitoring of service level indicators, meaningfully managing error budgets & TOIL automation powered by Site Reliability Engineering (SRE) principles helps in maintaining the customer trust and preserving the promise of high availability despite a complex distributed multi-cloud ecosystem.

In this session, audience will learn and experience the practical challenges faced and how a successful Resilience & Reliability Testing strategy helped a Financial Fintech. Join the session