Customer Reliability Engineering

You can read generally about working at TableCheck here: Site Reliability Engineering - Public

Why does TableCheck need a Customer Reliability Engineering team?

What’s the current strategy?

The IT team is currently made up of squads, each of which is focused on a particular product or feature category. These teams often have extremely tight deadlines to focus on major feature delivery, but this can (and is often) skewed by customer requests. We have to deliver these relatively minor improvements within a short period of time, and it often involves cross-team resources being diverted. Of course, our product teams try to minimise the impact a client’s demands on delivery of major features, but a lot of the time it is unavoidable.

Why is this bad?

In Site Reliability Engineering, we call the above “toil”. Toil is any tedious or repetitive tasks associated with running a production environment. For Site Reliability Engineering (SRE) teams, the aim is to reduce or even eliminate toil in order to maximise the time spent on engineering and innovation. Similarly, this is where Customer Reliability Engineering comes in. For development teams, toil work slows velocity for product delivery, the tasks are often quite small (up to a maximum of a week of work), have high impact for a particular client, and come at the cost of both morale in teams and customer satisfaction when it goes on too long.

What is the solution?

Customer Reliability Engineering! Aka. SRE for Enterprise Customers. Google came up with this to help their Enterprise Customers on Google Cloud.

In other organisations, CREs may also be called “Tier 3 support”, but I also think this is not the right focus as CREs are primarily developers.

What structure does the CRE team take?

The CRE team is a organisational-level shared resource just like the SRE team.

What does Customer Reliability Engineering handle?

The CRE flower-like responsibility model.

CREs,

  • Take tooling from SRE

  • Learn about the product / features’s issues from Sales / Consulting / QA / etc

  • Help support teams (Payments / Backend / Integration / etc)

  • Build needed features with code

CREs handle code and clients where SREs cannot. Sometimes they are given titles such as “Technical Account Manager”, though these can often be more of a sales-oriented or upselling role.

  • Our CREs focus on Customer Satisfaction by delivering short-term work across systems.

  • Our CREs handle reliability for integrations, using the same tooling stack as our SREs.

  • Our CREs handle customer-facing issues through code, such as by developing features or fixing issues.

  • Our CREs train and support our internal Support, Consulting and Implementation teams.

  • Our CREs are constantly on the lookout for and building tools to discover issues, and have a high degree of familiarity with our products in order to deliver solutions.