June 5, 2012 at 1:28 PM ET
Twitter — and news shared on it — moves faster than the speed of its iconic tweeting bird. But search on the short-messaging blog is not quite as efficient, because frequent search terms are always changing, as is the news on the site itself.
That makes it harder to get the right algorithms for search queries. In a blog post Tuesday, Twitter notes its own recent study looking at terms and phrases in 140-character tweets and in real-time search queries shows that "the most frequent terms in one hour or day tend to be very different from those in the next — significantly more so than in other content on the web. Informally, we call this phenomenon churn."
In a paper, "A Study of 'Churn' in Tweets and Real-Time Search Queries," to be presented at the International Conference on Weblogs and Social Media this week, Twitter staffers Jimmy Lin and Gilad Mishne share these findings:
During major news events, Twitter's frequency of queries "spike dramatically," Lin wrote on the blog. On Oct. 5, immediately after the news of the death of Steve Jobs became known:
... the query "steve jobs" spiked from a negligible fraction of query volume to 15 percent of the query stream — almost one in six of all queries issued! Check it out: the query volume is literally off the charts! Notice that related queries such as "apple" and "stay foolish" spiked as well.
What it all means, he said, is that "When news breaks, Twitter users flock to the service to find out what's happening. Our goal is to instantly connect people everywhere to what's most meaningful to them; the speed at which our content (and the relevance signals stemming from it) evolves make this more technically challenging, and we are hard at work continuously refining our relevance algorithms to address this."
The "growing importance of real-time search" means some big challenges for Twitter, both men wrote in the paper. "In follow-up work we plan to evaluate techniques for handling the volatility of the real-time search stream and the limited collection statistics that exist for new queries."