Articles


This is an article from DZone’s 2022 Performance and Site Reliability Trend Report.

For more:

Read the Report

Distributed tracing, as the name suggests, is a method of tracking requests as it flows through distributed applications. Along with logs and metrics, distributed tracing makes up the three pillars of observability. While all three signals are important to determine the health of the overall system, distributed tracing has seen significant growth and adoption in recent years. 

Source de l’article sur DZONE

Modern systems and applications span numerous architectures and technologies — they are also becoming increasingly more dynamic, distributed, and modular in nature. In order to support the availability and performance of their systems, IT operations and SRE teams need advanced monitoring capabilities. This Refcard reviews the four distinct levels of observability maturity, key functionality at each stage, and next steps organizations should take to enhance their monitoring practices.
Source de l’article sur DZONE

The application development landscape has fundamentally changed in recent years. In a recent interview with Ambassador Labs, Mario Loria from CartaX said he believes this is still uncharted territory, particularly for developers in the cloud-native space. As he sees it, site reliability engineers (SREs) play a key role in guiding developers through the learning curve toward comprehensive self-service of the supporting platforms and ecosystem, and ultimately to service ownership. This requires a major shift in company and management culture, and developer (and SRE) mindset and tooling as well as insight to make the journey to full lifecycle ownership not just smoother and more transparent but also technically feasible.

Two Worlds Colliding: The Monolith and Service-Oriented Architecture

The traditional monolith continues to exist in parallel with cloud-native application development. The operations side of the equation, according to Mario, understands that this has caused a big shift in deploying, releasing, and operating applications, and now the role of SREs is to help developers understand and own this shift. Developers know how to code, but building in the necessary understanding (and ownership) of the “ship” and “run” aspects of the lifecycle introduces a steep learning curve. For developers, this means taking on new responsibilities with the support of SREs.

Source de l’article sur DZONE

In the software industry’s recent past, the biggest disruptive wave was Agile methodologies. While Site Reliability Engineering is still early in its adoption, those of us who experienced the disruptive transformation of Agile see the writing on the wall: SRE will impact everyone.

Any kind of major transformation like this requires a change in culture, which is a catch-all term for changing people’s principles and behaviors. As your organization grows, this will extend beyond product and engineering. At some point you also need to convince the key power-holders in your organization to invest in this transformation.

Source de l’article sur DZONE

We live in an era of reliability where users depend on having consistent access to services. When choosing between competing services, no feature is more important to users than reliability. But what does reliability mean?

To answer this question, we’ll break down reliability in terms of other metrics within reliability engineering: availability and maintainability. Distinguishing these terms isn’t a matter of semantics. Understanding the differences can help you better prioritize development efforts towards customer happiness.

Source de l’article sur DZONE

On-call: you may see it as a necessary evil. When fast incident response can make or break your reputation, designating people across the team to be ready to react at all hours of the day is a necessity.  But, this often creates immense stress while eating into personal lives. It isn’t a surprise that many engineers have horror stories about the difficulty of carrying a pager.

But does on-call have to be so dreadful? No way. Here are five best practices to help your team respond quicker and build more resilient systems.

Source de l’article sur DZONE

If you’ve spent any time in tech circles lately, there are three letters you’ve surely heard: SRE. Site Reliability Engineering is the defining movement in tech today. Giants like Google and Amazon market their ability to provide reliable service and startups are now investing in reliability as an early priority.

But what makes reliability engineering so important? In this blog, we’ll look at three big benefits of investing in reliability and explain how you can get started on your journey to reliability excellence.

Source de l’article sur DZONE