In a perfect world, every time we rolled out code at the end of a sprint, it would work perfectly in production. There would never be any bugs, and there would never be any issues that forced us to roll back code that has already been deployed.
Of course, we don't live in a perfect world. That's one of the reasons why we have agile in the first place. Agile isn't about pretending that your world is perfect. It's about adapting to reality, and iterating to improve your processes and your flexibility so that when problems arise you're able to deal with them.
One of the problems that comes up frequently for teams is the discovery of a new bug in production right in the middle of a sprint. Your team has finished deploying, all the tests passed, and everything has been pushed out to production so customers can start using it.
But maybe an edge case that wasn't considered comes up. Maybe some aspect of the code that wasn't fully tested comes to the surface, and starts causing problems for users. How's your agile team supposed to respond to that?
There are many different approaches to dealing with bugs in production that come up during a sprint. Choosing the one that works best for your team is dependent on how your company is structured, how critical the bug is, and what matters most to your product owner and your customer.
The Minimal Impact Option
If a bug in production is the result of a previous sprint's work, and it's having a negative effect on users, the simplest thing to do whenever possible is to roll back the production server to the state that it was in before it was updated after the last sprint. At the very least, this will minimize the impact of the bug on new users.
Doing this requires having a production deployment system setup that supports clean rollbacks. An agile team with the ability to push code into production should ideally be working in an environment that supports continuous deployment, or at the very least deployment tags that allow you to roll back your production servers to a previous state. It's times like this that you really appreciate having strong deployment or devops engineers on the team.
If it's possible to solve the problem that simply, the product owner may choose to write a bug story to be worked on in the next sprint. That will prevent this current sprint from being interrupted, and reduce the impact on the team's velocity. Handling bugs this way also allows the team to consider more carefully the potential impact of the bug, and the best way to fix it.
The Deep Exploration Option
Sometimes fixing a bug in production isn't as simple as it sounds. For example, the bug could have had an effect on the data being entered into the application, or the bug may actually exist in the data layer. In this case, database recovery may be necessary, which introduces a whole range of other difficulties.
Recognizing the potential scope of a bug is the responsibility of the product owner in concert with the engineering team. When a bug is discovered, it may be necessary for the product owner to pull one or more engineers into meetings to discuss the depth of the impact and make a plan of action. Of course, the team's velocity in the sprint will likely be reduced merely because of the need to assess the extent of the damage and propose a viable solution.
If the bug is urgent enough and the prognosis is uncertain, it may be necessary to introduce a new spike within the current sprint, and have somebody on the engineering team start looking ahead toward what's going to be necessary to fix the bug in the next sprint. Bugs can be difficult to estimate because of their unknown nature, and it's usually a good idea not to assign points to a bug for that reason. However, having one engineer take away a little bit of effort from the current sprint can pay off in the long run, without holding back the whole team.
The Urgent Effort Option
It's not always possible to put off a bug fix until the next sprint. Sometimes a bug is so critical, and affects such an important aspect of the product, that it's necessary to implement a fix during the current sprint. Ideally, this effort won't require the entire development team. It's the product owner's responsibility to assess the scope of the damage, and decide whether it's worth introducing a new story in the middle of a sprint to address a critical bug.
Introducing new stories in the middle of a sprint is never a good idea. A good scrum master should work with the product owner to try to limit changes to a sprint that's in progress. But that doesn't mean that it's never necessary, and a good scrum master should also be able to communicate clearly to the team when and why it's important to adjust the backlog if that's the best option.
The goal in this case is to have as small an impact on the sprint as possible. Perhaps the developers who worked on the section of code that is causing the problems can be pulled off of the stories they're working on, and temporarily assigned to fix the bug. Of course, any stories they're working on will suffer, and there won't be any points earned in the sprint for work done on a bug from a previous sprint.
The Nuclear Option
If a critical bug is discovered in production code, the presence of the bug is causing serious problems, and more than half of the development team is needed to work in concert to fix it, sometimes the only thing to do is to stop the sprint and start a new one.
Continue reading %4 Agile Ways to Handle Bugs in Production%
by M. David Green via SitePoint