DevOps constitutes a flexible collection of practices, guidelines, and cultural norms aimed at dismantling silos within IT development, operations, networking, and security. Coined by John Willis, Damon Edwards, and Jez Humble, the acronym CA(L)MS encapsulates the key tenets of DevOps, representing Culture, Automation, Lean (drawing from Lean management and continuous delivery), Measurement, and Sharing. Emphasizing sharing and collaboration, DevOps revolves around enhancing processes through automation, measuring outcomes, and disseminating these results to colleagues for widespread organizational improvement. A supportive culture underpins all CALMS principles.
In the broader context, DevOps, Agile, and various business and software reengineering methodologies collectively embody a comprehensive worldview on optimal modern business practices. The elements of DevOps philosophy are intricately interconnected, deliberately designed to be inseparable.
However, certain key concepts within DevOps can be explored in relative isolation, allowing for focused discussions on specific aspects of the philosophy.
Reduce Organizational Silos: Enhance teamwork by eliminating barriers between teams, fostering improved collaboration and increased throughput.
Accept Failure as Normal: Understand that computers are inherently unreliable, and imperfections arise with human involvement. Embrace failure as a natural part of the process, providing valuable learning opportunities.
Implement Gradual Change: Implement changes incrementally, making them easier to review. In the event of issues, the ability to roll back ensures a quicker recovery, minimizing downtime.
Leverage Tooling and Automation: Utilize tools and automation to streamline processes, reducing manual efforts and enhancing overall efficiency, accuracy, and repeatability.
Measure Everything: Emphasize the significance of metrics in assessing the success of implemented changes. Measurement provides a quantitative basis for evaluating the impact of organizational improvements.
By prioritizing collaboration, embracing failure as a learning opportunity, implementing changes gradually, leveraging automation, and measuring outcomes, organizations can evolve in a controlled and effective manner, fostering a culture of continuous learning and innovation.
Considering DevOps as a philosophy, Site Reliability Engineering (SRE) can be viewed as a prescriptive method for realizing that philosophy. If DevOps were analogous to an interface and programming language, one could liken SRE to a concrete class that effectively implements the principles and practices of DevOps.
In this analogy, SRE serves as a practical and structured instantiation of the broader DevOps philosophy, providing specific guidelines and approaches for achieving the overarching goals of collaboration, automation, and improved reliability within an operational context.
When discussing the elimination of organizational silos, my immediate reflection was on the practice of sharing production ownership with our developers. We ensure a unified perspective and approach by employing the same tooling across the board.
Regarding the acceptance of failures as normal and the mention of blameless postmortems, our approach aligns with many DevOps practitioners. We strive to learn from production failures, conducting postmortems to prevent the recurrence of similar issues. Embracing failures as part of the process is reflected in the implementation of an error budget, setting thresholds for system deviations from specifications.
The notion of making gradual changes resonates with our strategy of rolling out updates to a small percentage of our fleet before a full-scale release to all users.
In the context of leveraging tooling and automation, our emphasis lies in minimizing manual intervention. We measure the extent of manual work, aiming to automate repetitive tasks and reduce toil.
When highlighting the importance of measuring everything, my immediate connection was to the quantitative assessment of toil levels and the overall reliability and health of our systems.
The analogy that the class of SRE implements DevOps resonates strongly. It might be fitting to express this idea on a shirt. However, akin to a class in a programming language, SRE may incorporate additional functions or methods that don’t precisely align with the DevOps interface. It’s plausible for SRE to fulfill multiple interfaces or diverge in certain practices from other implementations of DevOps, given the flexibility and adaptability inherent in the role.
Indeed, SRE may implement DevOps in its unique way, reflecting the contextual nuances and specific requirements of the organization.
DevOps and Site Reliability Engineering (SRE) are not adversaries; instead, they are closely aligned allies working in tandem to dismantle organizational barriers and enhance the delivery of high-quality software at an accelerated pace. Together, they form a collaborative approach aimed at fostering efficiency, collaboration, and continuous improvement in the software development and operations landscape.
Subscribe on LinkedIn