Managing physical servers with OpenStack
Catching up with hardware progress
Site Operation Division, System Management Group
Time to provision a physical server reduced from 1 month to 10 minutes
My first job after joining was to image the operating system (OS). I worked on preparing an OS image to be used company-wide and providing support for it.
When I joined Yahoo Japan Corporation, it was just about the time when OpenStack was being introduced to the infrastructure. As virtualization was progressing on the one hand, we still had a lot of physical servers that weren’t virtualized, and their numbers were growing. So efforts to manage them with OpenStack were starting, and I participated in that.
The conventional wisdom says that physical servers can’t be moved when they’re running services. Because of that, when a new physical server is launched, you have to find an empty physical server and shift to it gradually. As a result, you can end up in an awkward situation where a server in the rack that’s being used and can’t be moved coexists with a server that isn’t being used. This is like when gears don't mesh or something. This makes controlling them difficult.
Under OpenStack, you have a pool of fungible physical servers that you can hand out to users and have them use. We use Ironic as the OpenStack component for handling physical servers. We start up the physical server and then reimage it with the OS to make it available to users. From the user’s viewpoint, it means the time to provision a physical server shrinks from one month to about 10 minutes. It also has the advantage of being easier to manage.
Running our own data centers lets us do this
Yahoo! JAPAN’s main image is of a company that provides a lot of services, so I suppose many people aren’t aware that we operate our own data centers. There are so many things that that lets us do. When you reach the size of Yahoo! JAPAN, it’s also more cost-effective to run it yourself.
From an engineer’s point of view, it’s very attractive to have an in-house data center, so that you can be exposed to the kind of environment that you don’t normally have access to. For example, you can handle hardware that’s not used in public Iaas clouds. Recently, there’s NVMe, which interfaces with SSDs over PCIe at higher speeds. Or machines with lots of GPUs, or beefed-up hardware with tens of terabytes of storage. We have the chance to work with servers with specs that the average user wouldn’t use.
When we talk about managing server infrastructure, outsiders might get the impression that it’s work where we only touch a few layers. But Yahoo! JAPAN has everything from data centers to services, so we deal with a wide range of layers, including hardware. That’s also a big attraction.
Virtualization is a trend these days, so you might ask why the use of physical servers is growing. It’s because virtual machines are weak in the area of disk I/O. You want to use physical servers in order to be able to make full use of things like Elasticsearch and Hadoop.
I want to work with people who have a continuing interest in new technology
I’d like to work with people who are interested in pursuing the latest technology. It’s true for OpenStack, but hardware is also a rapidly changing field. As an infrastructure engineer, I’d like to keep providing new technology.
If new technology is successful, efficiency also improves, and we always need people who can handle new technology. Some people might think infrastructure engineers are stiff and uninteresting, but our infrastructure team is insatiable when it comes to new technology – we have a real “go for it!” attitude.
As new technology keeps being introduced, it may seem as if hurdles are being put in the way. But I think if you’ve had experience, before joining the company, in one programming language and Linux, then you’ll be OK. For example, I don’t think there are many environments where you can be exposed to OpenStack and study it. But if you have some basics, you can keep up if you study.
Automated management is also a challenge
One way of expressing the feeling of managing server infrastructure is that you feel like you yourself are a member of a cluster. Now we’re shifting from pampering one server at a time to lining up large numbers of servers and re-shuffling them. Even so, there are still times when we have trouble with one particular server. Because the hardware might break, for instance.
If you ask when I feel my work is most rewarding, it’s when I see on the OpenStack dashboard that the OS image is “burned” on lots of servers. Then I really get the feeling that “I did it!” (laughter) It puts me in a good mood to see resources being automatically used up. My goal from now on is to keep working on automation around infrastructure in order to lighten the burden, and to offer a serviceable infrastructure than can be used quickly when it’s needed.
※ Information as of February 2017.