Chaos Engineering by Nikhil Barthwal #AgileIndia2018

Thanks! Share it with your friends!

You disliked this video. Thanks for the feedback!

Added by admin
Modern software-based services are implemented as large scale, highly distributed systems running in cloud or data centers. Disruptive real-world events like hardware failures or software bugs can create turbulent conditions in the environments where these systems and can lead to unpredictable outcomes. Chaos Engineering is a study of system’s ability to withstand such disruptive turbulent conditions. It works by purposefully injecting failure into the production environment that mirrors the actual failure modes and monitors the recovery.

Chaos engineering uses experimentation to study effects of such disruptions. These experiments typically start by defining “steady state” of the system and come up with metrics that can be used to measure this steady state. Then various events that mirror the failure modes (aka “Chaos”) that are possible in our production environment (e.g. server crash), are injected systematically in the system in controlled environment.

Effect of the injected “Chaos” is observed by collecting and analyzing the metrics identified above. If the system is able to recover successfully, this builds confidence in system’s ability to handle an actual unplanned outage.

If a failure to recover is observed, then it becomes a target for improvement before that behavior manifests in the system at large. By continuing to run these simulations, it is possible to identify several such vulnerabilities. Fixing these vulnerabilities strengthens the system over a period of time. Extensive monitoring and logging is essential for the success of Chaos Engineering in its goal to improve the resiliency of the system.

Slide and Other details: https://confengine.com/agile-india-2018/proposal/5791
Conference: https://2018.agileindia.org
Agile India 2018

Post your comment