Could a computer pick the next “American Idol”? The next Ludacris or Madonna?
New software by Israeli researchers promises to take much of the guesswork (and endless cover songs) out of figuring out which hip-hop or R & B artist will be the next "it" star in the United States. Using a mathematical formula to sort music requests logged by the massive Gnutella peer-to-peer file-sharing network, the researchers have boasted an enviable 15 percent to 30 percent success rate in automatically choosing artists or bands with breakout potential.
The solution, according to Tel Aviv University electrical engineer Yuval Shavitt and his colleagues, “is based on the observation that emerging artists, especially rappers, have a discernable stronghold of fans in their hometown area, where they are able to perform and market their music.”
Crucially, the researchers were able to extract geographic information from many of the 10 million to 40 million requests logged by Gnutella users every day. In some cases, the locations could be pinpointed to the level of a city borough, as in New York.
“What’s happening on the Internet is that it’s a whole collection of individual actions,” Shavitt said. “Such a huge collection of actions should have a very strong signal about many things, and in this case, it was so obvious that it should be a way to pick up trends — music, for example.”
But how do you mathematically model the concept of local popularity for a song or band? For up-and-coming artists, Shavitt said, the route to national success usually begins in their neighborhood or hometown.
“They’re performing in local clubs and attracting attention, and people will start looking for their music,” he said. “If your music is good, you will create local buzz. And this will be detected by a local rise in the number of queries in the system.”
For the research, presented in August at the Knowledge Discovery and Data Mining conference in Las Vegas, the team collected location-tagged Gnutella data from mid-October 2006 through July 2007. The computer algorithm monitored metropolitan regions around the country and checked for any query streams that were fast becoming local floods, but were relatively unnoticeable elsewhere.
“With Shop Boyz, we managed to get a nine-week lead on the industry,” Shavitt said, even though the group didn’t even crack the 10,000 most popular queries nationwide at the time. With other artists, he said, the algorithm’s advance notice has been more on the order of 6 to 8 weeks — still plenty of time to get a jump on the competition.
For St. Louis-based rapper Huey, Shavitt said the system spotted a local spike in requests for his “Pop, Lock & Drop It” before mid-January 2007. In early March, the song debuted on the Billboard Hot 100 at No. 98.
Requests for songs by an established artist like Madonna, on the other hand, would show a relatively uniform geographical distribution and steady numbers over time.
Rising above all the noise
As for the Israeli research trio, no major record labels have yet come courting them for their business acumen in identifying unsigned artists, though Shavitt said they have been in talks with a U.S.-based company that deals with music. In contrast to their success rate of up to 30 percent, the study suggests that the probability of correctly spotting successful new artists without the algorithm would be less than 0.1 percent.
Instead of relying on the actual number of Gnutella requests, the formula tracks how well a band’s queries rank relative to other groups.
“You have to rise to a certain level in the city where you are — in the top 500 or 1,000 queries — or else it’s just noise,” Shavitt said. To attract some attention, at least from the computer, he figures a band would need to garner at least 100 to 150 queries per day.
So far, the database’s gigantic size has prevented more nuanced algorithms that might increase the sensitivity even more. After eliminating queries that originated outside the U.S., that could not be geographically pinpointed from their origin IP (Internet provider) addresses, that were blocked by firewalls or were otherwise ambiguous, the study still processed more than 300 million queries.
Recently, the team began tracking the Direct Connect network, which handles up to 1.5 million queries per day. Although only a fraction of Gnutella’s size, the smaller system may be amenable to more involved algorithms. The research suggests the same analysis could be applicable to other file sharing networks like BitTorrent and eDonkey or to Web sites like YouTube and MySpace.
Discovering a star
No matter what the database, the ability to pick a winner comes back to an idea attributed to China’s Mao Tse-tung: that revolution is like ink stains on paper. The stains begin small but eventually grow and merge into one large stain. Of course, breakthrough artists might chafe at the notion they’ve become a national stain, but the idea that successful bands begin as little more than droplets of local interest before eventually flooding the airwaves has been borne out repeatedly.
The reverse is true for contestants on “American Idol,” who effectively debut in front of a national audience on their way to stardom or infamy. Even so, Shavitt said, a similar algorithm can tap Internet queries for clues to a contender’s prospects.
“You can see if a certain competitor is gaining momentum,” he said. “It’s slightly different from what we are doing, but it can be done.”
Nancy Baym, an associate professor of communication studies at the University of Kansas who studies online music networks, said Shavitt’s study fits well “into a whole of the ways people are trying to mine all the music-related information that’s out there right now.”
And despite the public animosity between record labels and file-sharing networks, she said, many companies have privately recognized the value of such databases and are paying others to mine the information for them.
The real power of popular peer-to-peer networks like Gnutella, as suggested by Shavitt’s study, is that they not only predict trends, but also cause them, Baym said.
“My big-picture thought on all of this is that the labels would do well to release their material to file-sharing networks because it’s a publicity mechanism and ultimately rebounds to them in sales, although many of them vehemently dispute that,” she said.
Making sense of the growing mounds of data will remain a challenge, but Baym said the recent successes in spotting musical trends portend a race to figure out who can be the best predictor — with big money at stake.
For their ongoing research, Shavitt and his collaborators are testing out a new algorithm to see if they can predict how long a song will stay on the charts. And a separate project is examining cultural differences among countries based on music database searches.
“We talk about the world becoming a smaller village,” he said. “When you look at the tastes of people, there are still huge differences between nations.”
Shavitt can personally attest to that. Despite his research, he’s not liable to rush off to a hip-hop concert anytime soon.
“I’m a fan of the data, not the music,” he said, laughing. Well, maybe the Shop Boyz are OK. “I can actually listen to them for two minutes,” he said.
© 2013 msnbc.com Reprints