Human Factors and PostMortems
presented at Continuous Lifecycle 2014, Mannheim on Nov 11, 2014

Abstract

Our daily work takes place in a myriad of systems. They are comprised of software, hardware and humans. And everybody who has worked with complex systems at any scale knows: Failure is not an option, it’s inevitable. At Etsy we are embracing the fact that failures happen and that the only way to understand how the accident happened is to investigate it without blaming the humans involved. This is why we have a blameless postmortem for every outage that occurs. It is an open meeting and everybody is invited to join and find out what happened and how we can make the system safer. This talk will explain how postmortems at Etsy are conducted and how we maintain and scale the process as the team grows and new people start. It will go over the tools we built and utilize to make postmortems efficient and also share the learnings from each one with all the people in the company.

Slides