Tester's Digest

A weekly source of software testing news


ISSUE #32 - September 17, 2017

Chaos Engineering tells us to experiment in production to validate fault tolerance of our systems. This is not for the faint hearted! With practical writeups from companies who have run Chaos-inspired GameDay exercises.

Topic: Chaos Engineering and GameDay Exercises

Principles of Chaos Engineering are outlined here as a living community-maintained document:


The above originated at Netflix, the makers of Chaos Monkey tool for random fault injection. This post describes their Chaos Kong exercise and another experiment with Subscriber service in 2015:


What is Chaos Engineering, by Gremlin guys who promise to provide fault injection as a service:


The argument for, and practice of, running GameDay exercises in production at Etsy (back in 2012!)


Datadog ran a Game Day event on their ElasticSearch cluster, describes results of different faults here:


PagerDuty included a Chaos-inspired Breakathon event in their recent summit:


There are really good arguments in favor of testing your system for fault resistance directly in production:


But what if you are a medium-size startup with high-value customers, and the idea of injecting failures into production environment doesn’t sit well with you? Or perhaps you want to intentionally go beyond the tolerance limits of your system to train the team on incident response? That was us at Quid, so we ran a GameDay in staging, and described how we organized the event:



Great overview of learning resources for today’s test engineer:


If you received this email directly then you’re already signed up, thanks! Else if this newsletter issue was forwarded to you and you’d like to get one weekly, then you can subscribe at http://testersdigest.mehras.net

If you come across content worth sharing, please send me a link at testersdigest@mehras.net