Statistics for Engineers

Tuesday, 25 October, 2022 - 11:0011:40 CEST

Heinrich Hartmann, Zalando


As an SRE you are constantly confronted with a wealth of telemetry data collected from your systems. Interpreting this data to extract operational information is a key part of your job as an SRE. Statistics is here to help! Statistics is the art of extracting information from data. In this talk, we will discuss the statistical methods that are most relevant to your daily work as an SRE. You will get up to speed with the basics and see how they apply to the operational domain. We will discuss statistical pitfalls that are commonly found in telemetry systems. Specifically, we will cover the following subjects:

  • Summarizing and Visualizing data with Mean values, Percentiles, and Histograms
  • Implementing Latency-SLOs
  • Impact of Sampling to Rate, Error, and Duration (RED) metrics

Heinrich Hartmann is leading the Site Reliability department at Zalando and is responsible for all telemetry tooling at the company. Before joining Zalando, he designed and built telemetry analysis systems at the monitoring vendor Circonus. Heinrich holds a Ph.D. in mathematics and has been frequently talking about statistical analysis of telemetry data over the past 8 years.

Presentation Video