DevOps Zone is brought to you in partnership with:

Willie Wheeler is a Principal Applications Engineer with Expedia, working on continuous delivery, including build automation, test automation, configuration management and application performance management. He's also the lead author of the book Spring in Practice (Manning). Willie is a DZone MVB and is not an employee of DZone and has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

A Fatal Impedance Mismatch for Continuous Delivery

11.15.2012
| 10429 views |
  • submit to reddit

Most of the time, when organizations pursue a continuous delivery capability, they’re doing that in pursuit of increased agility. They want to be able to release software at will, with as little delay between the decision to implement a feature and the feature’s availability to end users.

I’m a big fan of agility, and agree with the idea that agility and continuous delivery go hand in hand. There are unfortunately ill-conceived approaches to implementing agility that can prove fatal to a continuous delivery program. In this post we’re going to take a look at one that occurs in larger organizations. We’ll see one reason why it can be challenging to implement continuous delivery in such environments.

Software development in the enterprise

One fairly common configuration in large enterprises is for there to be a shared production environment and multiple development groups creating software to be released into that environment. Sometimes the development groups have the ability to push their own changes into production. But often there’s some central release team, whether on the software side of the house or on the infrastructure/operations (I’ll call them IT in this post) side, that controls the change going into the production environment. Here’s what the red-flag—but common—configuration looks like:

Let’s see what tends to happen in such enterprises.

The quest for agility leads to development siloing

When there are multiple development groups, they usually want to be able to do things their own way. They set up their own source repos, configuration repos, continuous integration infrastructure, artifact repos, test infrastructure (tools, environments) and deployment infrastructure. They have their own approaches to using source control (including branching strategies), architectural standards, software versioning schemes and so on. They see themselves as being third-party software development shops, or at least analogous to them. Releasing and operationalizing the software is largely somebody else’s concern. They certainly don’t want some central team telling them how to do their jobs.

There’s a reason the development groups want things this way: agility. The central release team is either seen to be a barrier to agility, or in many cases, actually is a barrier to agility. There are tons of reasons for both the perception and the reality here. If the central team lives in the IT organization instead of living in a software organization, the chance for misalignment is very high. Common challenges include:

  • IT doesn’t understand best practices around software development (e.g., continuous integration, unit testing, etc.).
  • IT takes on a broad ITIL/ITSM scope when the development groups would prefer that they focus on infrastructure like IaaS providers do.
  • IT chooses big enterprise toolsets that aren’t designed around continuous delivery, integration with development-centric tools and so forth.
  • IT prioritizes concerns differently than the development groups do. In many cases IT is trying to throttle change whereas development is trying to increase change velocity.

But even if the central team manages to escape the challenges above, fundamentally shared services balance competing concerns across multiple customers, and they’re therefore usually suboptimal for any one customer. All it takes is for one or two developers to say, “I can do better” (a pretty common refrain from developers), and suddenly we end up with a bunch of development teams doing things their own way.

This is really bad for continuous delivery. Let’s see why.

Development siloing creates a fatal impedence mismatch

Let’s start with a little background.

Because continuous delivery aims to support increased deployment rates into production, it becomes especially important to test the deployment mechanism itself (including rollbacks). The challenge, though, is that any given production deployment is a one-time, high-stakes activity. So we need a way to know that the deployment is going to work.

Continuous delivery solves this through something called the deployment pipeline. This is a metaphorical pipeline carrying work from the developer’s machine all the way through to the production environment. The key insight from a deployment testing perspective is that earlier stages of the pipeline (development, continuous integration) involve high-volume, low-risk deployment activity, whereas later stages (systems integration testing, production) involve low-volume, high-risk deployment activity. If we make earlier stages as production-like as possible and we use the same deployment automation throughout the pipeline, then we have a pretty good way to ensure that production deployments will work. Here’s an example of such a pipeline; your environments may be different depending on the needs of your organization:

The area of any given environment in the pyramid represents the volume of deployment activity occurring in that environment. For development, the volume is large indeed since it happens across entire development teams. But notice that everything other than the production deployment itself helps test the production deployment.

As you can see, even the developer’s local development environment (e.g., the developer’s workstation or laptop) should be a part of the pipeline if feasible, since that’s where the greatest deployment volume occurs. One way to do this, for instance, is to run local production-like VMs (say via Vagrant and VirtualBox), and then use configuration management tools like Chef or Puppet along with common app deployment tools or scripts throughout the pipeline.

With that background in place, we’re now in position to understand why development siloing is bad for continuous delivery. When development teams see themselves as wholly separate from the operations side of the house, two major problems arise.

Problem 1: Production deployments aren’t sufficiently tested

This happens because the siloed development and operations teams use different deployment systems. In one example with which I’m personally familiar, the development team wrote its own deployment automation system, and stored its app configuration and versionless binaries in Amazon S3. The ops team on the other hand used a hybrid Chef/custom deployment automation, sourced app configuration from Subversion and versioned binaries from Artifactory.

Generically, here’s what we have:

The earlier pipeline stages use a completely different configuration management scheme than the later stages. Because of the siloing, only a small region of the pyramid tests the production deployment. So when it’s time to deploy to SIT, there’s a good chance that things won’t work. And that’s true of production too.

The next problem is closely related, and even more serious.

Problem 2: Impedance mismatch between development and operations

Having two disjointed pipeline segments means that there’s an impedance mismatch that absolutely prevents continuous delivery from occurring in anything beyond the most trivial of deployments:

From personal experience I can tell you that this impedance mismatch is a continuous delivery killer. Keep in mind that the whole goal of continuous delivery is to minimize cycle time, which is the time between the decision to implement a change (feature, enhancement, bugfix, whatever) and its delivery to users. So if you have a gap in the pipeline, where people are having to rebuild packages because development’s packaging scheme doesn’t match operation’s packaging scheme, painstakingly copy configuration files from one repository to another, and so forth, cycle time goes out the window. Add to that the fact that we’re not even exercising the deployment system on the deployment in question until late in the development cycle, and cycle time takes another hit as the teams work through the inevitable problems and miscommunications.

Avoiding the impedance mismatch

In Continuous Delivery, Humble and Farley make the point that while they’re generally supportive of a “let-many-flowers-bloom” approach to software development, standardized configuration management and deployment automation is an exception to the rule. Of course, there will be differences between development and production for cost or efficiency reasons (e.g., we might provision VMs from scratch with every production release, but this would be too time-consuming in development), but the standard should be production, and deviations from that in earlier environments should be principled rather than gratuitous.

So to avoid the impedance mismatch, it’s important to educate everybody on the importance of standardizing the pipeline across environments. If there’s a central release team, then that means all the development teams have to use whatever configuration management infrastructure it uses, since otherwise we’ll end up with the disjointed pipeline segments and impedance mismatch. But even if development teams can push their own production changes, it’s worth considering having all the teams use the same configuration management infrastructure anyway, since this approach can create deployment testing economies.

Some teams resist standardization instinctively, largely because they see it as stifling innovation or agility. Sometimes this is true, but for continuous delivery, standardization is required to deliver the desired agility. It can be useful to highlight cases where they accepted standardization for good reasons (e.g., standardized look and feel across teams for enhanced user experience, standardized development practices reflecting lessons learned, etc.), and then explain why continuous delivery is in fact another place where it’s required.

One sort of objection I’ve heard to a standardized pipeline came from the idea that the (internal) development team was essentially a third-party software vendor, and as such, ought not have to know anything about how the software is deployed into production. In particular it ought not have to adopt whatever standards are in place for production deployments.

This objection raises an interesting issue: it’s important to establish the big-picture model for how the development and operations teams will work together. If the development team is really going to be like a third-party vendor, where it’s independent of any given production environment, then it’s correct to decouple its development flow from any given flow into production. But then you’re not going to see continuous delivery any more than you would expect to see continuous delivery of software products from a vendor like Microsoft or Atlassian into your own production environment. Here leadership will have to choose between the external vs. internal development models. If the decision is to pursue the internal development model and continuous delivery, then pipeline standardization across environments is a must.

Published at DZone with permission of Willie Wheeler, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jonathan Fisher replied on Fri, 2012/11/16 - 1:44pm

Honestly, I find facebook-style CD very obnoxious. They seem to test in production, and it's really ruined my appetite for CD. I'd rather have a shaken out product fully tested before it reaches the user along with release notes. Instead, every time you log in, something is moved around and something is tweaked so it never quite works right but is "better" than before in some way. Really really obnoxious user experience.

Willie Wheeler replied on Fri, 2012/11/16 - 2:20pm

Hi Jonathan. Thanks for the thoughts.

People sometimes use the terms "continuous delivery" and "continuous deployment" as a way to distinguish the deploy-on-demand capability from the decision to exercise it in an aggressive fashion. My post is agnostic with respect to the latter. Instead I am highlighting a problem that, unaddressed, makes the capability itself impossible.

Asher Sterkin replied on Tue, 2012/11/27 - 9:24am

Facebook does not test in production at least not in the sense that the do not debug new changes in production. Actually they take pre-release testing quite seriously (for more details see here: http://server.dzone.com/articles/release-engineering-facebook). Some other companies are doing this and it might work for them, but not facebook. I'm pretty sure they embed a lot of testing and monitoring to be used in production environment, but that's another story.

Willie Wheeler replied on Tue, 2012/11/27 - 12:38pm in response to: Asher Sterkin

The topic of testing in production isn't really directly tied to the point of the post, but I'm happy to go where the comments take me. :-)

Here's a video from Facebook's Girish Patangay:

http://www.youtube.com/watch?v=rSlLB_kI1mw

It sounds like they use a staged canary process:

  1. Release to a small handful of internal-facing servers for engineers to test
  2. If OK, then release to 2% of the user population
  3. If OK, then release to 100% of the user population


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.