Search engines need science
Yahoo! JAPAN is the ideal environment for learning it
Data & Science Solutions Group
It all started with wanting to work with search engines, then I got immersed in studying
When I joined Yahoo Japan Corporation, I spent the first three years working on business-related services. After that, I wanted to do work that I could only do here, so I asked to do search work and moved to the mobile search division. Ever since then I’ve been involved in search, although I’ve changed departments a couple of times.
About five years after joining, I started seriously studying search engines. At the time, I spent every day totally immersed in my studies. I even studied on weekends, and sometimes woke up startled in the middle of the night, having dreamt that my boss was scolding me. That went on for several years. It was a really tough time, I have to admit. (laughter)
Scientific papers helped me the most in that process of gaining knowledge. I’d read one paper, then in order to understand it, I’d systematically read several more. In the course of repeating that kind of learning method, I also got into the habit of learning. Of course, that doesn’t mean all you have to do is read papers. Inevitably there’s a disconnect between the theory in scientific papers and the practice of designing real-world products, so I had to find knowledge to fill the gap and understand the codebase. Luckily there are loads of engineers at Yahoo! JAPAN who know a lot about search, so I’ve had a lot of help from the people around me.
Search is a treasure trove of issues for engineers
Search engines have their own special difficulties. For example, while updating an index to speed up search, you can’t shut down the search service. It’s pretty difficult to update indexes and respond to search requests at the same time.
Difficult problems like these can’t be resolved without going back to science. For instance, when using a machine learning model, how does one combine it with the actual search engine? An excessively detailed model can be too computationally expensive to train. We need to design models that can be trained within the limited time we have. There are lots of issues like that.
One issue for Yahoo! JAPAN is the difficulties caused by the extremely high level of traffic. Even if an event occurs with only a low probability, because the traffic is so high, we’re bound to encounter it once a day. To cover that kind of rare case, we need to have a grasp of every detail of the specifications.
There’s a lot of work still to be done in search. That’s because search technology is being used in a broader range of applications. I suppose when people hear the word “search” they think of keyword search, but it’s not just that. There are many more areas that require a “searching for something” function. Search engines very different from what has gone before will be appearing thick and fast, I’m sure.
For example, search is required when displaying ads. We search for ads that have high affinity with and are analogous to the page they’re displayed on. To do that, we need to examine algorithms that carry out high speed feature vector matching.
I think Yahoo! JAPAN is the ideal environment for learning about search engines. There’s a lot of data, and a lot of requests, too. There are also many different kinds of services, and a large number of varied needs. We have plenty of chances to develop new search engines and new functions. There’s also the work of scaling a search engine. For instance, re-thinking the autoscale function and implementing it. There are lots of issues like that, which are real challenges for engineers and for which solutions are needed, so it’s a rewarding place to work. I think you can only find that kind of environment here at Yahoo Japan Corporation.
Efforts to build a platform that every service can apply
These days I’m working more on the systems side of things. Yahoo! JAPAN has more than 100 services that are collecting big data every day. My work is to analyse that big data and build platforms to utilize it.
Data utilization is important, but I believe we need to pour further efforts into data utilization that straddles multiple services. My goal is not just data analysis, but an environment where it can be used by other services. Imagine a common box from which data can be taken out and shared among services.
From now on, I’d like to concentrate on building platforms. We need platforms that cut across multiple fields and link them to improve service. As I go about my everyday work, I have a big dream: to unite back-end knowledge and science, and create an architecture that can only be created at Yahoo! JAPAN.
※ Information as of February 2017.