Overseas business trip triggered involvement in Kubernetes
Reduced OpenStack downtime
Site Operation Division, System Management Group
My first job was OpenStack operations
I’m an OpenStack operations engineer. In graduate school I was working on electrical research, but I fell in love with the open atmosphere of the web and joined Yahoo Japan Corporation.
From the start I’ve been interested in backend work. In grad school I really enjoyed it when people used the equipment I made to do research. Infrastructure operations also has a background aspect, and it’s fun for me when people I'm close to use the things I’ve made.
The important task for engineers who work on OpenStack operation is to fix any problems that come up in the OpenStack layers. When we find an OpenStack bug, we also try to find the root cause.
I encountered OpenStack for the first time when I joined the company. At first I didn’t understand a thing, and I was on the verge of tears the whole time. (laughter) In the field of OpenStack, a wide range of knowledge is required. For example, OpenStack abstracts away lower-level components like servers and networks. We don’t touch the lower levels directly, but we need technical knowledge of that underlying environment.
There wasn’t time to study things methodically, so I acquired the necessary knowledge by event-driven learning: repeating the process of detection -> investigation -> resolution for the problems that came up on the job. It was a very tense time, knowing that one mistake by me could bring down the company’s internal services.
Overseas business trip triggered involvement in testing new technology
The turning point for me was when I participated in the OpenStack Summit held in Austin, Texas in April 2016. During the keynote speech at this conference, I discovered that efforts were underway to deploy and manage OpenStack using Docker, the lightweight container framework, plus Kubernetes, the container cluster manager for Docker. Even at Yahoo! JAPAN the OpenStack team was starting to work on Kubernetes. I stayed in Yahoo! JAPAN office in the United States from April until the end of the summer, testing Kubernetes. I also worked overseas from October through December of that year.
My main goal was to communicate actively with the Yahoo! JAPAN teams in America. The conference stimulated me to absorb the American teams’ latest knowledge, and the next thing I knew I’d moved away from OpenStack operations to testing the new Kubernetes-based environment.
Naturally, if Docker + Kubernetes are used in addition to OpenStack, the system has more moving parts, but the advantages more than make up for that. Operational costs go down, and downtime is shorter.
Adopting Kubernetes shortened downtime
We used to talk about going from “pet to farm animal.” With previous infrastructures, we pampered servers as if they were adorable pets. Whenever there was a problem, we had to immediately investigate and restore. This changed when we started using Docker and Kubernetes. Now we’re able to operate our services as if we’re handling a herd of farm animals.
OpenStack provides APIs with a variety of functions. In the conventional environment, if there was a problem with those API servers, we needed to investigate and restore immediately. The situation changes when we use Kubernetes to manage multiple servers. If a problem arises with one server, Kubernetes automatically restores its containers on another server. This way, we don’t spend time on restoring the server, and instead we can devote our time to resolving the bug that caused the problem, and focusing on the approach going forward.
Kubernetes isn’t used by the whole company; it’s at the stage where we on the infrastructure team are using it as a tool to run OpenStack. Probably the departments that use infrastructure don’t notice the difference. In the past, if an OpenStack controller node crashed, sometimes we couldn't work in the cloud. By using Kubernetes, we were able to massively shorten the recovery time and so shorten the down time. This should raise the level of our SLAs (Service Level Agreements).
Sometimes I even get phone calls in the middle of the night if there’s a serious problem with infrastructure operation. Once OpenStack is stable, this kind of scenario should happen less often, which makes our colleagues in the trenches happy. I’d like to make everyone happy.
I want to work with people who tell me “This setup is obsolete!”
Well, I’d like to work with the kind of engineers who can discuss things. It would be nice to see more people around who can have conversations about their ideal system, for example. I’d welcome someone who’d tell me “Your OpenStack setup is obsolete!” It’s good to have people who bring a breath of fresh air.
At the company, you can also find something to do and then volunteer to do it. For me, working at Yahoo! JAPAN as an engineer, having everything from the data center to the frontend services in-house leads to personal growth for me. There aren’t any other companies with such an enormous infrastructure, and we have lots of opportunity to study the infrastructure layers. I feel as if I’m in the midst of a constantly changing technology battlefield. In that situation, you have to take in new things or you can’t survive; nor can you survive if you don’t take care of what’s right in front of you now. It’s a real test of one’s abilities.
So many colleagues who’ve been here longer have an abundance of knowledge, and a lot of people also have their own specialization. I’d also like to find an area that I like and delve more deeply into it. I want to focus not only on OpenStack, but also pursue the new ideas and technologies that keep appearing one after another.
※ Information as of February 2017.