« Blog Home

Podcast: Kubernetes and GitLab

February 18, 2020 ¬ 7:51 amh.Tamir Gefen

gitlab kubernetes podcast

A few days ago I encountered an interesting interview in the Kubernetes Podcast website, talking about GitLab and Kubernetes.

The hosts interviewed Marin Jankovski — an Engineering Manager at GitLab and the first employee of GitLab (!) .

The interview concerns GitLab, Kubernetes, deployment, the SaaS offering of GitLab, transparency and an anecdote around the logo of GitLab.

We made a transcription and also embedded here the original audio.

Listen to the podcast:

Interview with Marin starts at 09:04

Transcription:

Marin Jankovski is employee number one of and an engineering manager at GitLab, a DevOps application available in open-source enterprise and SaaS versions.

Welcome to the show!

[Marin] Thanks for having me

Q: What was the origin of GitLab, where they come from?

The project actually started in 2011.

The first commit was in October 2011 created by Dmitriy Zaporozhets from Ukraine, and I joined the company at its conception back in 2012 when the current CEO saw the open source project and thought, this is the future, this is what we need to do and yeah, once I joined, my first task was get GitLab running on GitLab.com in AWS. So that was my first two weeks of working, trying to automate installation of GitLab and its latest versions.

Q: All on VMs at the time?

All on VMs at the time and all installation from source — no automation whatsoever. You clone and then you try to set up Redis, try to set up NGINX and all of that from scratch.

Q: What technology did you use to do that first automation?

I used Chef because I was at that time very much familiar with it, so he actually stuck with us. For what happened till last year, basically, we were using Chef to deploy a lot of our stuff. So it’s one of those things, where that was the most convenient, and I knew it at the time, and that worked the best for me.

Q: Do you now use your own tools for it?

Right now we have a combination of our own tools plus some Ansible. So that is the gist basically.

Q: Gitlab started out as a source code hosting but has evolved to include so much more. What are some of the features in the GitLab today?

In addition to source code management, we also have CI/CD, package features meaning various package hosting options, we also have security features, that is security scanning, dependencies scanning, licensing management. More recently we added items like monitoring, logging and so on. So we are covering everything from the moment you push your code to the moment you deploy and need to debug, run and so on.

Q: How do you decide which pieces to add to the platform?

It really depends on the needs of our users, so given that we are a completely transparent company, our direction, future development is completely out in the open. We actually do listen to our wider community, and we get a lot of questions on “Hey, it would be cool to get this feature in because it’s useful to have it in this one tool”, so I think a lot of direction comes from our users and from our customers.

Q: You mentioned that you’re a transparent company. I also understand you’re a completely decentralized company.

Correct! We currently have almost eleven hundred employees, and we are fully distributed across the world, and I think last time I checked we had people in 50 locations, not a single office, and day-to-day always through laptops. So very exciting stuff actually.

Q: When you say you’re a completely transparent company, what drove that decision and what challenges has that created for you?

I’m not the best person to answer what drove the decision, but I was there when some of those decisions were made. I’ll use the example of our database incident when we deleted a database by accident, by actually having us debugging and like trying to restore the backup out in public live on YouTube. We actually gathered a lot of support, but not even support actually: we also got a lot of folks recommending what we could try to make this faster, to make to restore faster, and the community kind of rallied around us to help us out, and I think we saw that very early on as a superpower, in my opinion.

I know that when I was trying to recover a database when we were much much smaller back in like 2013 or something, at 4am in the morning, publishing a document that folks saw when they woke up, actually gave me a lot of directions on where to investigate next. So I didn’t know much about databases at that time, so folks submitting their suggestions actually helped me out get the database back. So I think that is one of the superpowers of being transparent.

One of the challenges is not everyone likes what they read. Everyone has an opinion as well but not everyone has all the backgrounds that is provided. So that is one of the challenges for sure.

Q: You have a very similarly named competitor in GitHub. How would you describe the difference between their platform and yours?

I personally would say, we have a fundamental difference in how we view workflows. So while we did start as a GitHub clone and then a competitor, we did expand by adding our CI/CD offer, which enabled completely different workflows. GitHub only recently added CI tools, we have had them for a while now, and by having those extra tools inside of the source code management we enabled folks to ship faster but not only do social coding that GitHub offered at that time.

Q: You mentioned that you’re a single application. Why is being a single application important?

I think that’s also comes from the early days when I spent enormous amounts of time integrating a number of services that GitLab consisted of. So we also saw that companies invest a lot of time and money into integrating the tools they need to use for their work, and if we offer the integrated single application for them so they don’t have to take care of bunch of tools that they have, they can actually focus on shipping their own stuff. So they save the money in the effort in maintaining the tool, and they can focus on actually delivering value for their company. I think when we realized that we can also control how we ship things, meaning if we want to change how the interaction between different items works between different components, this came from the moment when we integrated CI into GitLab. We realized that that also allows us to ship way faster to our customers and change features, add new features way quicker and enabling our customers to be able to use those tools way quicker, because they don’t have to think about different sorts of versionings, how are they going to transition: we are handling all of that for them, and they can only focus on using the actual tool.

Q: GitLab has an open source edition and then there are Enterprise editions and hosted versions available. How do you decide which features go into the open source platform versus the commercial platforms?

This comes from workflow item that I mentioned earlier: we want to enable all the basic workflows that a vast majority of users need, and we want to make sure that only very advanced features that larger companies, that multiple users in a company need, is actually a part of Enterprise Edition. So that means that we want to ensure that whoever uses source code management can also use CI and deploy whatever they are doing very quickly, but when they need additional controls, when they need to make sure that they have additional approvals, for example, all of those items are very much a requirement of a developed company, right, so this is where we decide that those additional features need to be in our paid versions, and actually more often than not, depending on how much interest there is to open-source certain features, we do push items back into open source as well.

We made a promise very early on, as stewards of GitLab open source community, that we will not do the other way around, right, like we are not gonna make something that is open source …

Q: GitLab started before containerization and Kubernetes took off. What made you decide that you wanted to pivot what you were doing and start to adopt that kind of architecture?

First of all I think this came from us observing that Kubernetes is on its way up. So the first time I actually heard about Kubernetes is when it was open-sourced and again heard it from the CEO. He read about it somewhere and said, “This is amazing, this is gonna change the game for a lot of folks, we need to focus on this”, and we started following Kubernetes the project, but when we realized that we are growing, our gitlab.com source offering is growing way faster than we can hire people to manage the growth, we realized that we also need to change how we operate GitLab. So the primary driver of us migrating to Kubernetes on gitlab.com is the need to be able to react quicker, ship quicker, scale faster with the demand that we have.

Q: Did you first implement support into the product for the plan to communities or did you first move the product to running on Kubernetes itself?

We did both in parallel, so all of the development in actually deploying GitLab on Kubernetes was happening at the same time when we were developing features for our users to be able to deploy to Kubernetes.

Q: Did you make different choices as someone who is not only making a service that you run yourself but you offer to customers to run in their environment versus if you were only running the SaaS platform?

We had to make different choices. This comes from the fact that we use the same platform that we shipped to our customers, and in order for us to be able to support our customers we need to know what we are running. So the complexity of the whole platform needs to be reduced as much as possible because we want our customers to be able to run it themselves as well. So think of it this way: gitlab.com is the largest GitLab installation in the world, but we want to ensure that any customer that wants to become the biggest installation in the world can just leverage the same path that we already took, so that they can take the tried and tested setup.

Q: What was the actual process of transferring the code which is, you said before, was runbooks and Chef and so on to containers and Kubernetes?

We needed to ensure that we have no downtime. That is a very simple requirement but a very complex one as well. So what we ended up looking is how can we make sure that whatever we do in the background does not affect our customers and that we direct traffic to the new setup as we have it ready tried and tested. So what we ended up doing is for a long time and we still actually are, running a hybrid between VMs and Kubernetes clusters. So depending on which workload you hit on gitlab.com you will either be serving or receiving your traffic from a VM or from a Kubernetes cluster. So we ensured that the change behind the lines is not observed by our customers and that we can part-by-part replace our traffic from old VMs to new Kubernetes workloads.

Q: Are there any things that caught you along the way that you would offer as advice to someone else making this transition?

Yeah, there was a lot of surprises actually and one of the biggest surprises was how we do a hands-free deployment on Kubernetes. You always see kubectl here and there and everywhere but we are in automation business, so one of the tasks I gave to my team was how will you ensure that when a developer merges something into their master branch, you as an SRE engineer don’t have to do anything. And it is surprisingly complex actually to connect multiple services and ensure that they are working correctly and then also that you don’t have to do any work. So I would definitely recommend anyone who is migrating their workloads right now to look into what can they do to make sure that they don’t actually have to do anything anymore once their items are running. So kubectl commands are nice, but how do you put that in your CI or your CD platform to ensure that everything is hands-free?

Q: And there’s a faster side that I can make GitLab to do all that for me?

We are using GitLab to deploy GitLab, so yes.

Q: Would you consider yourself a monolithic or a microservices-based architecture?

I would definitely consider GitLab a monolithic architecture, for sure.

Q: A lot of the trend we’re seeing these days is focused on microservices and people breaking up monoliths. You are following a different trend there. What made you decide to move towards more of a monolithic design?

Not necessarily move. We are staying with it still. So what we are seeing is that even at large scale there is a benefit of running items in a monolithic architecture. The benefits are mostly related to how quickly can you turn around the feature, how quickly can you ship, how quickly can your developers do their job. What we also saw is that only when we start seeing some bottlenecks, it makes sense for us to start transitioning certain parts of the monolithic structure into like a microservice component. We have done that multiple times in the past, so GitLab now has a couple of satellite repositories, as we call them, but they’re actually the microservice part of our stack. So I originally said we are more of a monolith for sure but we are a hybrid when it comes to the architecture of GitLab.

Q: What parts of your infrastructure don’t run on Kubernetes and why?

On gitlab.com we run our databases and our Git storage outside of Kubernetes. I think the primary reason for that is these are the hardest things to get right, in at least my opinion, in Kubernetes, and we are growing too fast to be able to experiment with moving that over to Kubernetes as well. So we are still very much learning in how do we run our application at scale even with only stateless part of our stack.

Q: Given the GitLab has continuous delivery components, you will be able to know through support volume or what people are saying in terms of customer demand in the targets of places that they want to deploy to. Do you see a huge shift in people who are now deploying? For example they used to deploy in VMs, and they’re now moving to communities as a target?

I absolutely can say that we see that. Recently we had an issue with one of our features on gitlab.com that enables customers to use Kubernetes, and to my surprise we got within the hour of us seeing the first errors, we saw a lot of support pressure to ensure that we fix the problem really really quickly, and with each hour that passed, total of eighteen six, the support workload was quite high, which meant that a lot of people were depending on their deployments to Kubernetes from inside of GitLab.

Q: What we’ve seen recently at the most recent KubeCon and in about the year leading up to it, is a real shift towards people viewing Kubernetes as the environment that they want to use for hybrid and multi-cloud deployments. What do you see is the demand with your customers in terms of using GitLab in that kind of design?

Our customers run on various clouds, our customers run various workloads on various different platforms. We are no different: we also run on different cloud providers, and the fact that it takes a lot of effort to migrate from one provider to another or migrate your workloads from one provider to another is actually encouraging this new path. I would say where we are no longer seeing that one cloud is going to win them all, it’s more that the collaboration between different clouds is going to enable customers to ship their products faster.

Q: You offer both an on-premises version and open source as well as a commercial version and a SaaS version. Do you see people using GitLab where they’re actually using it in multiple locations at the same time?

Yes, we also have a combination of customers using our own SaaS platform and using their own self-managed instances and also distributed across different platforms. We have features inside of GitLab that allow geographical distribution of our application and that kind of shows that you cannot have one size prescribed for all of your customers. They have different needs for various reasons, compliance or their own internal requirements, and they will use whatever is easiest for them, and if you as a tool enable them to use what they need, you can do a lot.

Q: What is your experience taking your SaaS platform as you’ve moved it to multiple clouds, where it’s been and as you expand it further?

One of the biggest challenges was whenever you need to transfer a lot of data. A lot of data also means creating a lot of load on your platforms, and it was not simple at all to migrate different workloads to different clouds. So with the rise of multi-cloud it is becoming imperative that the software we are using to deploy directly to different stacks is robust enough that you can just do a lift and shift to another provider without needing to take a lot of downtime.

Q: A number of your developers will be looking to not worry about the infrastructure they deploy on, and there has been a rise especially in the last 18 months in serverless platforms, functions and services and so on. GitLab has adopted Knative and built a service product as well. Do you see that developers are looking to run hundreds of services more?

Yes, there is a real need for that. For very simple workflows you get a lot of bang for your buck in iteration, meaning how quickly can you get your product to market, and if you don’t have to think about the infrastructure behind it, then the story that I mentioned earlier about how you can focus on your product becomes even more interesting. So absolutely.

Q: Do you think that is the logical end goal for enterprise software, for example?

I think so as well. As long as you don’t have to deal with complex data structures, the serverless seems to me to be a clear winner there, and if you consider that a lot of applications that you see online don’t have a complex data structure, there is no reason why enterprises wouldn’t be able to adopt to the same thing. After all people working in the enterprises are also folks who are contributing a lot of items to the open source and have their own startups and so on.

Q: Gitlab has a very unique logo. It’s a tanuki, sometimes referred Japanese raccoon dog. How did that become a GitLab thing?

That is a very interesting story, in my opinion. The original logo that was created and contributed by an open-source contributor was around with us since 2011. The logo was looking pretty angry for a lot of folks. We actually even received a bunch of tweets telling us that our logo is giving folks nightmares. So when we graduated from Y Combinator, one of the items that we were talking about is how do we look less threatening, and we reached out to the creator of the original logo who – just kind of shows how small the world is – turned out to be very close to us at that point in time. We were in Mountain View, and we met him there by sheer accident, and he gave us a contact of a person who created our current logo, and one thing that we wanted to make sure is that we are not mistaken for another Fox-related company. So we also didn’t want to be a raccoon, and this is where a tanuki a combo basically came from, and it always is an interesting question that we get from folks, “What are you a fox or something else?”

What’s coming next? What are you excited about? Maps on your roadmap?

The story of multi-clouds that was mentioned multiple times is something that is definitely high on my list. We have a long-running partnership where a bound for integration with Crossplane project, and in the latest version of GitLab we are releasing that integration for anyone who connects their Kubernetes cluster to a GitLab project. So with the one click you can install Crossplane inside of your cluster, and that will allow you to create workloads in any of the clouds that Crossplane supports. So if you need an S3 bucket, you can select whichever account you want to run it on, and all of that with a single click or a single command, which is I think very impressive and I think the future for us.

Another thing that I’m actually excited about, is similarly to how we have GKE integration for GitLab projects. We are releasing the same integration for EKS Amazon clusters, so it’s going to be simpler for folks running in Amazon to also deploy their projects to their clusters. So with a couple of configuration options they’ll be able to provision a new cluster with the single click.

Marin, it was wonderful having you on the show!

Thank you, Adam!

You can find Marin Jankovski at gitlab.com/marin.

Relevant Links:

GitLab, GitLab CI, KubernetesInterview, podcast