@Otonomo: An Innovative Approach to Software Delivery
Meet NirIn our Behind the Scenes Otonomo series, we talk to people from across the Otonomo family and hear what makes their job unique, and the innovative ways they take on their role within the company. Today, we’re talking to Nir, our software engineering team leader. Nir left Microsoft to join Otonomo in 2018, first as a software engineer, and now as the head of both the data-ingestion team and the authorization team. We spoke to Nir about his approach to software engineering and delivery metrics.
What would you say is a major challenge for your typical software manager?One of the most common answers is likely to be, helping the engineers to focus on the main core of their job and removing any possible blockers. This answer might seem obvious, but engineers are often weighed down with extra obstacles to doing their job, and it’s increasingly hard to spot the exact reasons why, or how much time and resources it takes up. This was a task I took upon myself, a challenge that I wanted to solve. I knew that if I could understand what obstacles arise at each step of the development cycle, from coding, review, and merging, through to deployment in both pre-production and production, I could help engineers save wasted resources at each stage. Even more importantly, I could start to understand why those efforts were being wasted, and stop it from happening in the future.
Sounds like a huge challenge! How did you go about it?The first step was to gather all the pain points in the development cycle. I needed data for this, so I turned to retrospective meeting notes, logs, emails, anything that could show me the blind spots in our development process. I wanted to get all of this information in one place so that it could be structured into something usable. Therefore, we split our development cycle into several logical stations. 1. Coding, 2. Reviewing and Merging, 3. Pre-production and 4. Production itself.
How important was data to the process?Oh, essential. Data is technology’s most valuable natural resource. I always try to make data-driven decisions as much as possible, because heterogeneous data from multiple sources and voices tells you that you’re dealing with a common problem across teams. It also helps you to get buy-in for what you’re trying to do. Once you have data, you’re showing the executive teams and anyone else that there’s a real issue here. I wanted to approach this internal challenge the same way as I would any customer-facing feature, and that included setting metrics for what we wanted to measure and the desired outcomes from the start. We wanted to measure how much time is spent at each stage of the process, and so we added a metric to track each station at the Beginning and the End. Of course, every company will have its own process for CI/CD, with different levels of automation and manual effort involved, and different stages to go through from build to production. For example, let’s deep dive into our CI/CD data source, which is based on Jenkins. Each stage in our development cycle uses a dedicated and complex Jenkins job. Failures in these jobs are slowing us down and therefore require collecting data to pinpoint the most common issues. I can explain the process that we went through, but some of the stages may not work for all environments. To make things simple for us, we defined a general delivery metric with a shared structure: a status, runtime, and a reason. Following this process should help with analyzing data, which you can do both for measuring your current situation, and also for establishing baselines uncovering any regressions, which could be the result of flaky tests or runtime problems for example.
You started with the coding itself?Yes exactly. And everyone will need to start here. If you’re looking to do something similar for your delivery model, I would suggest you start with your source control. For example, if you are working with GitHub, GitHub webhooks is your address for collecting this station’s data. It can give you all the information you need: your times, and your branch and user data. You can then look at the amount of effort devoted to a specific task, whether that’s a new feature or a bug fix, and you’ll get a sense of where you can add automation. Top candidates include overly complex local/cloud development environments and a lack of peer reviews.
How about Merging and Reviewing?Making your code available is a major step. Your merging time can be concluded from the end time of the first station, coding. In some cases, your organization might allow merging to the code base with a simple action such as a git push. In other cases, you’ll have to go through several validations. Each phase in your merging process (from lint validation to tests and review) has its own characteristics. We reflect this in the ‘reason’ field of our general delivery metric.
At this point, were you ready to look at the pre-production stage?For us at least, the time spent here, between station #2 and station #3 is very meaningful. The ideal scenario is that your new feature’s code is merged to the base code and then pushed out to your happy customers and it’s all a done deal. In reality, things are more complicated than that. Un-deployed code, which is lying in your base code, can cause misunderstandings and downtime. Therefore, you need to understand what slows deployments down. The trick at this station is being able to track the code which jumped on the train at station #1. Possible solutions for that are using Git tags for annotation or comparing code versions, for example, any existing deployment against your new deployment.
Lastly, it sounds like you were ready to look at production!Yes, the end station! It’s time to drop off safely. Of course, after production, there could still be bugs, discovered hopefully by tests, but sometimes by a customer, and then you need to correlate between different tasks to achieve a fix. This is important for product quality, so that the customer gets a great experience, and also to start an iterative process where we can plan and fix any new issues or updates that come into the pipeline. Since we stored all the data from previous tries, we are able to correlate much easier. Remember, production isn’t a drop it and forget it kind of station, we need to stick around to make sure the code is fully functioning as planned.
What impact would you say these delivery cycle metrics have had on the business?Within a software organization, we are always talking about improving velocity, reducing work bandwidth, and shortening development cycles. But the barriers related to these concerns are not always crystal clear. With this approach, we’ve been able to dig in in order to find the challenges and what’s slowing our engineers down. Once found, you can look into internal and external tools to make improvements. For example, we realized that there are live cases where you need more information than the current log data that you have. So we started using Rookout for debugging our various components’ code remotely, across all the development cycle’s stations, to save precious deployment time. I would recommend to anyone that collecting your development cycle metrics can be a great start towards a better understanding of those very important goals. Ready to do things differently? Check out our open positions and come work where innovation happens!
More for Developers
Otonomo is more than a car data exchange. Read these blogs written by developers, for developers, about coding, technology and culture.