Tech
February 3, 2025
When software engineering is your core business, you should have values and beliefs about the development process. For example, we believe that developers should own their tasks from start to finish. This means that developers also deploy their own changes… Which is not what most developers want to spend their time on. If the deployment process is not optimized then complicated plans, repetitive steps and a lot of waiting will actually take up developer time rather than developing software.
It’s up to us as the Developer Experience team to shape our tooling and processes to make the development process into something enjoyable and efficient guided by our principles. As mentioned before, we want developers to own their issues from start to finish. However, we also want developers to spend as little time deploying as possible. Deploying less is not the solution, as another principle of ours is to deploy often.
As if that’s not enough, we also want deploying to be safe, and not cause any downtime. All of this and a robust test suite should make developers comfortable and confident in deploying their own changes. In this post we will explain how deploying works at Channable, and how we let our beliefs about software development shape our workflow.
At Channable, we do upwards of 50 deployments per day. We believe iterating quickly creates a better product for our customers and more satisfaction for the developers. Small changes are better tested, reviewed, easier to debug and easier to roll back. However, iterating quickly with small changes requires a highly automated continuous integration and continuous delivery (CI/CD) workflow.
We have several git repositories for the different services that make up the Channable tool. We work according to the not rocket science rule, with a main branch that should always be ready for deployment (and usually already deployed) you can base a new branch off. A branch only lives for a few days to a few weeks, and we try to keep changes small. This approach has worked really well for us over the years. Sticking with our git methodology was a manual process for years, requiring us to educate all contributors. As with all manual processes, mistakes do happen, and with git a mistake quickly becomes practically permanent. At the time we already strongly believed that any repetitive action should be automated, so we set our sights on finding a merge bot.
No merge bot suited our requirements, except Hoff, which was written by one of our engineers based on his own convictions about what a merge bot should do.
Regarding deployment, we started out automating steps with scripts, then automating them in Ansible, and then automating Ansible. Eventually, non-CI/CD related reasons made us switch to using mostly Nix and Nomad, allowing us to cut out Ansible. Luckily, because we already abstracted deployment to be done by a tool we built ourselves, we only had to change the tool to support this alternative flow while the developers doing the actual deploying were none the wiser. We architected our workflow to be based on the idea that the deployment tool does deployment, not that it is, as it was initially, a wrapper for Ansible.
Deploying was now entirely standardized, but required manual coordination. Manually coordinating deployments really doesn’t scale to a team of 100 developers, especially if some of them are working from home. To ensure safe and reliable deployment, we needed to automate one last thing, namely coordinating deployments. For this we built a queueing system that will trigger the deployments rather than the developers. We will later describe this queuing system in a separate blog post.
We could leverage our existing CI system to also build and deploy packages, moving steps from a command line interface to a web interface. But that is just unnecessary hassle if our belief is that all changes on the main branch should be deployed! Given we already automated merging, we gave developers the ability to, when merging, also queue a deployment of the code they just merged in our queueing system. Without spoiling too much about the design and implementation of the queuing system, this was the final piece of the automation puzzle.
To give you an idea what this means in practice, let’s start with the steps to get a new feature to production:
Create a new git branch in the applicable repository
Develop your feature
Create a PR and ask for review
CI passes and you have review approval? Tag the merge bot and tell it to merge and deploy
Wait for a bit
Your feature is live!
It’s actually that simple. Sure, some deployments are more complicated, but in general, ceremony and manual steps have no place in the process. In the next section I will explain how the deployment process works at Channable. You will hopefully find that it requires very little knowledge of the underlying systems from the developers for them to get their changes live.
Every deployment starts with a change to deploy, and every change starts with a problem that needs to be solved. Once the problem to solve is established, you create a new branch in the relevant git repository.
Obviously, regardless of how much we automated the deployment process, the development process is still ‘manual’ (despite appearances, AI is not replacing software developers any time soon). Our stack contains mostly Python, Haskell and a little Rust, with a sophisticated Nix setup to manage dependencies and build artifacts. As is the norm in modern software development, we make small incremental changes. Once the change is done and works, it’s time to ask for a review.
We use GitHub as our git server and code review tool. All code changes should be reviewed by at least one other developer. As soon as code is pushed to a branch on GitHub, CI will do the required tests. Our CI setup is highly optimized for minimal latency - we can probably write a separate blog post about this, as we have over the years done several projects to greatly reduce the amount of time spent between pushing code and the CI pipeline completing.
At various times, CI was not fast enough, in response to which we did various optimizations. Very aggressive caching played a large role. We also made sure we only tested code that actually changed. In the early days of our CI adventure we switched CI providers twice before settling on Semaphore for performance and ergonomics reasons.
For most users and repositories, we have the ability to merge to master disabled. We really want to keep a neat git history, which means squashing fixup commits and rebasing before merging. This is tedious to do manually - although we do teach it to all developers as part of our onboarding process. The de facto way to merge is instead by tagging a bot and telling it to merge. This is where the magic truly starts. Tagging the merge bot looks like this:
There are more commands the merge bot takes, and we’ll get to that in the next section. The mergebot - Hoff, in our case - will rebase and merge the pull request on a testing branch. Having the testing branch ensures that the tests will still pass once the code is merged with the main branch.
Sometimes, too many people tried to merge at the same time, making merges slow. We implemented merge trains, see the Hoff documentation on how they work. Similarly, too many people tried to deploy at the same time, leading to a long queue - we introduced separate queues for every service that can be deployed without impacting another service.
You can then ask Hoff to merge and create a tag, triggering a new version of the software you were working on to be built. However, usually you might as well deploy your changes straightaway:
where Hoff will do all of the above, and also add a special message to the tag, thereby starting the Continuous Delivery process. The special message triggers our CI software (Semaphore) to start building a production release. After it has finished, it triggers our deployment coordination tool - Deploydocus.
Deploydocus adds the deployment to one of its queues, and when the deployment is at the front of the queue, processes the deployment. We will write another blog post about what happens then, but it comes down to deploying the release to a staging environment. Once it passes checks on staging, the new package is deployed to the production environment.
Despite speeding up CI, builds were initially not fast enough - we use a combination of Nix and a multi-layered caching solution to speed up builds and improve performance.
While this is just one of the ways in which things can be deployed - every step can be manually triggered in case it’s necessary - it is by far the most common way. Note that only the first four steps required any user involvement, which is why we can deploy so many versions every day.
The central message we want to convey is that if you want to move fast without breaking things there is only one solution: automation. And as you can probably tell, we did a lot of automation. Once you have automation, you need to invest time to make all the checks and processes you do for every release fast. In fact, as your codebase expands, you will have to keep investing into speeding up CI to prevent it from becoming slow again.
We built a system that automates deployment so well that our convictions about software development, like deploying often, are now the reality of our workflow. By having a vision on how we want to work and then building a process around that, we not only created a CI/CD system, but also a day-to-day that is consistent with our engineering culture. To achieve this, we had to build a great deal of things ourselves because no off-the-shelf solution appeared to fit. We feel this was worth the effort, because we think there is great value in making deploying an enjoyable experience.
Are you interested in working at Channable? Check out our vacancy page to see if we have an open position that suits you!
Apply now