Firstly, I do not think there is an universal best practice which suits everyone's need. Depends on the product you deliver to your customer, how sensitive your customer to new features and how much tolerance your customers have against bugs, you will need to choose different ways to implement continuous delivery. For example, if your product is a banking system used internally by banks. The banks do want new features, but they do not need new features next day while they would want the system to be stable as much as possible. In this case, you may want the continuous delivery practice to have more tests to reduce the chance of bugs introduced by new features. If your product is an e-commerce website used by public users. You may want to deliver new features to users as quick as possible to increase the revenue. In this case, you could choose less tests.
Secondly, standardized. It does not matter how do you practice continuous delivery, you must have standards in your organization. Coding style standard, code review standard, continuous integration standard, testing standard, deployment standard, tracking standard, etc. All these have to be setup and agreed in your organization before starting to implement continuous delivery. Without standardized, continuous delivery hardly success from my experience.
Thirdly, choose the right tools. There are a lot of tools to implement continuous delivery in the market, some are good and some are not that good. I do not want to recommend any tool here, as every organization has different requirements and it is your job to choose the right tools. If you cannot find a tool suit you, write your own one.
Now, I will summarize how continuous delivery is implemented in my company.
Standardized
Coding style - We are following Google's Java style. And use SonarQube to control code quality.
CI - We are using Git flow. Develop branch and master branch does not allow direct code push. All code changes require a feature branch or hot fix branch, and pull request is needed in order to merge into develop branch. Code reviews are performed in pull request. Once a pull request is merged into develop branch, develop branch is built on our CI tool and regression tests are triggered to make sure no existing function is broken. Once a release is pushed to production, the release branch is merged into master branch.
Testing - new code require unit test coverage. Integration test is optional. Acceptance test covers critical functions which have direct impacts to user experience.
Deployment - Each team has their own team environment, and deployment is allowed with any code branch. Integration environment only accepts develop branch or release branch deployment (This will make sure integration environment is stable as much as possible). Release Candidate environment only accepts release branch. Production environment only accepts branches from Release Candidate environment. Every engineer can do deployment to team and integration environments. Release Candidate environment does not allow manual deployment, the deployment is performed by our releasing tool automatically when releases are created. Production environment allows manual deployment by a group of software engineers with permissions and operations team. Our releasing tool can also do automatic deployment to production environment.
Tracking - Every commit to code base requires a JIRA ticket. A standard release note is created automatically when a release is created, and all JIRA tickets are collected into the release note for future reference. The name of the person who creates the release, and the person who pushes it to production (if manual deployment is involved) are recorded in the release note for tracking purpose.
Release flow
The flow chart below shows how does a change goes to production in my company.
A change is initially committed to a feature branch and deployed to team environment. Engineer tests the change there and creates pull request for merging the change into develop branch.
Other engineers review the change on the pull request. Once being approved, the feature branch is merged. Develop branch then deployed to integration environment and acceptance tests are triggered.
Once acceptance tests are passed, a release branch can be created from develop branch and then deployed to release candidate environment. Acceptance tests are triggered again.
Once acceptance tests are passed on release candidate environment, the release branch is deployed to production environment.
In the above steps, only the steps when the commit goes to develop branch and release branch are manual, all other steps are automatic. With this automation, ideally, our engineers can push their changes to our customer within one day, comparing to at least one week in old days without this automation.
Tools
The tools we are using are as below. All these tools provide RESTful API, so we integrate them into our own releasing management tool. It gives our engineers a single portal when they want to create a release and push the release to production.
- Bitbucket is our code repository
- JIRA is our issue tracking tool
- Confluence hosts our release note
- Bamboo is our CI tool
- We developed our own releasing management tool.
- Rundeck is used for our production deployment.
When something goes wrong
You may ask, what if something goes wrong. Good question. In the release flow above, there are some steps may fail.
Acceptance tests fail on integration environment - When this happens, release creation is not allowed. As I mentioned in the Tools section, we have developed our own releasing management tool. Our engineers create releases from there. It makes it easy to forbid release creation by disabling the release creation button.
Release creation fail - This should be rarely happen. But when it happens, we revert any actions have been done and displays the error to the release creator. If release branch has been created, we delete it. If release note has been created, we delete it too.
Acceptance tests fail on release candidate environment - The first thing to do when this happens is rollback to the previous version on release candidate environment. Then inform the release creator to investigate the problem. If the failure is caused by the code change and can be fixed with minor efforts, we allow to merge a pull request to release branch and rebuild the release branch. Otherwise, the release has to be cancelled, and new release is not allowed until the failure has been fixed. If the failure is caused by data issue or test itself. Release is allowed to push to production by senior engineers manually with certain permissions. The data issue and test should be fixed afterwards.
Issues found after release is deployed to production - When this happens, it is bad. We have two options. One option is to create a hot fix or patch release, if the issue can be fixed in short time. Another option is to roll back to previous release version. If a rollback happens, anyone involved in this release will have to do a postmortem to find out what we did wrong and how do we prevent it to happen again.
More we can do
At the moment, we rarely do automatic deployment to production. The reason is we do not have a reliable alerting tool to inform us when something goes wrong on production. We still rely on manual check after a release is deployed to production. That is why we formed a new SRE team to take care of our production environment. One of the new team's first priority is creating the alerting tool. When it is ready, we will be able to automate production deployment.
After the production deployment automation, we can improve our continuous delivery practice by automate release creation. The idea is when a certain number of features have been merged to develop branch (This can be tracked by JIRA ticket), create the release automatically and follow the release flow to production. If this is implemented, we can free our engineers time from releasing, and they can focus on development.