Platform engineering: it’s not a tool stack; it’s a set of capabilities

by Philippe Tiede on Mar 24, 2023

Platform engineering: it’s not about a tool stack, it’s a set of capabilities image thumbnail

Platform engineering is the hottest topic in 2023. While some say DevOps is dead and platform engineering is now replacing DevOps, others focus more on the enablement of developers due to Platform Engineering. For me, Platform Engineering is the logical next step in the DevOps evolution — especially for large enterprises. But what exactly is a platform, what is platform engineering, and how exactly can platform engineering be smarter?

The 2023 State of DevOps Report stated that using a platform is a key differentiator to increasing a company's DevOps maturity. The common thread of a platform is to enable developers by abstracting away complexities of modern infrastructure, keeping the systems reliable and maintainable without compromising developers’ development experience and enabling developers to focus on their software delivery. To close the gap between Dev and Ops, it should be noted that setting up a platform is not the same as providing tools. A platform, along with the platform team, should acquire various capabilities that go far beyond pure technology.

The curse of shift left!

Benefiting from cloud native, development teams can get infrastructure more easily. With the ease of setup, pay-as-you-go consumption, and on-demand provisioning of infrastructure, development teams can focus on effectiveness and a faster time to market. Startups can bring their ideas to fruition faster, and in existing enterprises, Dev no longer relies on Ops to develop new services. However, the rise of cloud native also leads to the challenge of shifting more and more infrastructure overhead to dev teams.

      • SHIFT LEFT! While setting up the infrastructure is a seemingly simple task, maintaining it is time-consuming. In the long run, cognitive overload increases due to context switching between Dev and Ops tasks within the development team. 
      • Cloud native complexity. Added to the cognitive overload is the complexity of the cloud native environment. A steep learning curve must be climbed before one can speak of real team productivity (or even product maturity). Then there is the volatility of the market with a constant well of new tools and trends. FOMO hits hard. 
      • Cultural challenges. While digital native companies are unlikely to have challenges with this (they'll have them elsewhere), using cloud native (especially in centrally structured) companies leads to cultural challenges. How often do you have discussions with the CISO about specific policies that prevent you from effectively pulling up a cluster? 
      • Same same but different. 10 autonomous teams, 10 different CI/CD pipelines. Ultimately, Shift Left leads to thousands of solutions being created for the same problem. Built solutions are not reused by other teams and the 'not invented here syndrome' prevents cross-team reuse.
      In the long run, cognitive overload, complexity, cultural challenges, and lack of synergy make maintaining a specific solution a burden. This prevents development teams from focusing on adding value and kills developer productivity.

Developer enablement!

So, what needs to change? In order to deliver software faster and more independently, development teams need to be offloaded from infrastructure tasks by using provided capabilities. Infrastructure solutions must be deployed in a centralized manner to enable consistent and predictable use of cloud native solutions. Solve a problem once and share it across teams. Reduce cloud native complexity by delivering infrastructure solutions to increase development efficiency and productivity.

But, wait, stop! Doesn't that sound like ITIL? Since 1989, ITIL has been concerned with the question of how to develop a framework to standardize the selection, planning, delivery, maintenance, and entire lifecycle of IT services. Entire companies have been aligned with it for years, and only very few have managed to align themselves in the direction of DevOps until now. And now everything is to be turned back again? No, it’s the next step to delivering software faster, more often, and in higher quality, the overarching goal to achieve with DevOps. Even though DevOps has commonly been reduced to pure automation, to achieve highly involved DevOps practices a DevOps transformation always means the adaptation of organizational structures and processes. With Platform Engineering you can reach the next step in your DevOps transformation and shift different developer teams with different maturities onto the same level and enable the overall development process.

The key to platform engineering is to treat your platform as a product (what?! Not as a service?). By my definition of a product, I don't mean a collection of tools that achieves feature completion. In the end, this approach only leads to the construction of a platform that is technically very sophisticated but ultimately delivers questionable benefits for developers. With delivering a product,  platform engineering should always fulfill its purpose: bundle the desired value for the consumer and the developer, in the form of capabilities. The primary benefit is to minimize cognitive load for developers and to enable fast-flow software delivery. This purpose must be continuously proven within the company’s setup. Strategically, I have to work on which product I build, which capabilities are useful for my consumers, how I refine my capabilities continuously, and how I optimize their delivery in order to increase the long-term value for my consumers and my company. Only if the developers achieve value by consuming the capabilities provided have I achieved my goal as a platform team.

Platform engineering should provide an abstraction for the developers. How a platform capability is delivered for developers should be abstracted away from why a capability is provided and what capabilities my platform team in general provides. It is not enough to simply throw a tool like Velero over the fence according to defined company standards. The developer must first look at what they do with it or how to use it exactly (throwing tools over a fence, take that devs!). For a developer, it is of course important to know that the platform team is able to provide a specific capability like backup and disaster recovery and why this capability is important in order to develop sustainable services, but how it is realized in the end technically (or even process-wise) should not be important.

Note that I am speaking of the capabilities a platform team provides via platform engineering. The reason for this is that platform capabilities can have different expansion stages. The implementation of platform capabilities can be, in the first instance, a simple advisory service or joint process.

With extensive understanding, the delivery of a capability can be optimized over time, for example:

      • Using self-service to reduce development waiting times and accelerate development cycle times through on-demand consumption.
      • Consolidating infrastructure by providing shared infrastructure to save costs, leverage synergies and deliver consistency.
      • Improving capability delivery technically and making it more efficient.

Only when I manage to abstract the how from the why and the what for my developers, can I empower them in their work. Simply providing tools has never been part of value creation. However, tools can be valuable to developers when combined with the right capabilities to help them in their daily work.

The platform vision

Ironically, the more I abstract away from developers, especially on a technical level, the closer I have to work with my developers. The value of the platform is determined by the context of use. Within the value stream, only the developer creates value by using the platform’s capabilities!

This requires close customer engagement. The key concepts of team topologies can provide deeper insights into that. Platform engineering means not only virtual interaction between developers and infrastructure via self-services but physical collaboration in joint processes to deliver structured support for interaction with the platform: self-service, physical exchange between development and operations, joint processes, and a dedicated platform team that not only supports the developers in using platform capabilities but also manages the platform as a product. 

As a dedicated platform team, starting with a clear product vision, I need to take full product ownership and continuously develop, build, maintain, and support platform capabilities. This requires a product mindset and the implementation of product management principles and processes. As a platform team, I need to keep close feedback loops with my developers as part of my platform engineering, and always be looking for the right product with the right capabilities.

A dedicated platform team. A clear platform vision. Full product ownership. And a Platform that is managed as a Product: a new understanding of operations. The role of the infrastructure person is shifting from the techy in the engine room to a dev enabler who configures and provides development-enabling solutions based on existing building blocks and ensures their use. Building blocks are pre-existing infrastructure solutions that can be assembled and customized based on needs, requirements, and use cases.

The overarching goal of platform engineering is to: 

      • Enable developers and teams in your organization to deploy and run their code without having to worry about the complexities of modern infrastructure while providing full flexibility and access (never limiting the capabilities of the developer).
      • Provide a powerful platform built from standard building blocks, continuously evolving and operated by experts to best leverage the benefits of cloud native in a lean, cost-efficient, and secure way.

To achieve this goal, platform teams will act as experts and advisors in the future. This means that the available resources will focus more on dialogue with developers and advising them.. The technical setup of the platform itself is only a tool for implementing the knowledge gained in those dialogues. An important factor in making this possible is to build on existing building blocks available in the market and not reinvent the wheel every time. Remember, the primary goal is to enable dev teams, not to build the cloud native stack. Situational strategies can be developed for missing building blocks in order to obtain these resources sparingly. Strategies can include, for example, co-creation with developers, the use of existing open source components, the use of different cloud models, the use of managed services, or the outsourcing of certain operational tasks.

Side note: Wardley mapping can help here to define appropriate building blocks based on the developer's need and to make make-or-buy decisions. Where is my core business? Will I create a competitive advantage by building a building block myself? Has this problem already been solved by existing building blocks on the market?

But what is the “Platform”? 

A Cloud Native Developer Platform is the set of capabilities provided by the platform team that enables devs to develop software, consolidate infrastructure solutions, unify disparate building blocks and different infrastructure abstractions, and configure them in an enterprise-specific way. The capabilities of a platform include everything necessary to achieve the overarching platform goal. Platform engineering describes the activity of acquiring and continuously improving these capabilities as a platform team.

These capabilities are not limited to building a tool stack according to best practices and the platform is more than a shiny cloud native infrastructure landscape based on Kubernetes. While it is a component of the necessary capabilities, to best enable devs and different use cases, smarter platform engineering must combine capabilities in the areas of strategy, technology, processes, and the needs of the customer developer to create value for developers in a targeted and demand-driven way.

Capabilities of a cloud native developer platform 

In different phases of the cloud native journey, different capabilities are relevant to each platform team. The figure shows an exemplary cloud native journey and attempts to map the different capabilities to the respective stages of the journey. This "roadmap" is not a linear construct as depicted, but capabilities can be refined and extended iteratively, horizontally, and vertically. In this post, I only provide a high-level description of the respective steps of the journey and their purpose. In subsequent posts, I will then go into more detail about the individual capabilities per stage. The map claims to be holistic, but I know that it will never be complete, as necessary capabilities change over time.

Platform capabilities based on the cloud native journey

Strategy

This stage describes the strategic direction of a platform. Every platform team should think about their strategic capabilities, especially at the beginning, but also continuously throughout their platform journey. The basic prerequisite is clear product ownership, which requires a dedicated team and a product owner. This team needs the appropriate mandate from management and a sufficient budget to take long-term responsibility for the product. In the context of product ownership, the team is empowered to know and decide at any time what is the most valuable thing that the team is working on. Based on  a clear vision, the individual platform strategy should focus on the needs of the developers, the stakeholders of the company, and the application landscape. Derived from this, a transparent roadmap should be developed and coordinated with all relevant stakeholders. Cost and revenue streams and sourcing strategies for the necessary building blocks should be part of strategic planning right from the start in order to ensure the long-term success of the platform. During this stage, the main focus should be on understanding the developer's needs and tasks. It is important to note that the platform strategy and the resulting capabilities are not immutable, but must continuously adapt depending on internal and external influencing factors.

Design

The high-level design of the platform capabilities should be aligned with individual strategic requirements. Based on the design and strategy, platform teams can derive the necessary capabilities. With the help of value stream management, the focus can be continuously placed on the service delivery performance of the developers, corresponding bottlenecks identified, gaps highlighted, necessary building blocks derived, and the impact of platform engineering demonstrated. Ultimately, the aim is to generate a valid design of the platform capabilities for the corporate environment and determine which capabilities are currently crucial in order to ensure the best possible success for developers.

Build

Within this stage, the platform design is technologically implemented and realized. The north star of implementation is a maximum level of configurability, security, feature completeness, and production readiness. At this stage, 100% can never be achieved and platform teams must continuously build and evolve the platform.

At the beginning, platform teams should start with an initial MVP in order to demonstrate the added value of the platform, collect feedback, and gain insights as early as possible. The decisive factor is not only what is to be built (in the form of features and technical capabilities), but also how (process-related capabilities). “Make it work, make it right, make it fast” is the motto here. As a first starting point, a CNCF working group provides a good starting point for what to build on a technical level. At the beginning, a platform capability can also be realized as a simple advisory service or joint process but should be implemented as a self-service in the long run. 

Enablement

At this stage, it is determined whether the platform provided generates the required added value for developers and whether developers are enabled by it in the best way possible. It is essential here to provide the road to production, to enable undisturbed and independent experimentation and releases between dev teams, and to provide the appropriate capabilities to enable smooth management of the application. The success of this stage can be measured, for example, by the DORA metrics, the number of active devs on the platform, and the overall satisfaction of the developers with the platform. Deploying end-to-end golden paths further accelerates dev teams, especially for recurring use cases.

Manage

Platform teams must ensure long-term secure and stable operations. This includes rapid troubleshooting (preferably proactive), preventative maintenance and lifecycle management, root cause remediation, and ensuring that the platform is fail-safe, thereby reducing risks to the business. This also includes standardization. Deploying a platform once is one challenge; deploying the platform in a standardized, repeatable way is another. Platform teams must enable consistent, efficient, and predictable use of cloud native. Configure Once Deploy Everywhere (C.O.D.E), for example, helps unify platform configurations and relieves dev teams of the burden of individually configuring their own environments. 

Advisory

The platform has been built and standardized, now the platform team must enable all stakeholders to use the platform. This can include advising developers on the use of the platform, special cloud native training, or even the onboarding of applications and use cases. A direct line between dev and platform teams makes it possible to react quickly to potential questions and to provide support (e.g. via a shared Slack channel). But also advisory services such as platform marketing to promote the added value of the platform for other business units, building use case-specific capabilities or general benchmarking are part of advisory services.

Adopt

As the number of users and use cases increases, it is critical for platform teams to continue to expand their platform capabilities and manage the platform at scale. It is essential to enable more and more specific use cases, the use of different hyperscalers, and teams of different maturity levels. This will enable them to gain common insights and make them available to all users of the platform in the form of platform capabilities. This stage is closely linked to the capabilities gained in the build stage and how well a platform team implements these capabilities. This phase is all about how a platform team delivers its capabilities. Building platform capabilities is time- and cost-intensive. It is therefore extremely important to continuously work on how to optimize the provision of these capabilities. This can be done using pure cost optimization or optimal utilization of cloud resources in order to reduce cloud costs and make the delivery of capabilities more affordable. But it can also be done through the use of multi-cloud adoption or service arbitrage by repeatedly exchanging and replacing corresponding building blocks from different providers for existing dev-enabling services. It is crucial that the delivery performance of the capabilities is measured and optimized in terms of costs, provisioning, and benefits.

Conclusion

A cloud native developer platform can close the gap between Dev and Ops. However, for this to happen, the platform itself must be understood as a product, and platform ownership must be perceived by the platform team. Building platform capabilities is not limited to the technical capabilities of the platform, but also requires that platform teams understand their role as enablers and align their process, support, and advisory capabilities more closely with the developers. Platform engineering describes the activity of acquiring and continuously improving these capabilities as a platform team. Smarter platform engineering must combine capabilities in the areas of strategy, technology, processes, and the needs of the customer dev to create value for developers in a targeted and demand-driven way. Only then will platform engineering become smarter and deliver added value to developers.