Just how wide is the World Wide Web? A statistical survey has measured the Web’s “diameter,” finding that there’s an average of 19 clicks separating random Internet sites. The findings have implications for the future of Web searching as the global network grows.
Although the Internet is 30 years old, the protocol that made the World Wide Web was created only in 1990. But even in that short time, the Web has taken on an organic life of its own, as evidenced by two studies appearing in the Sept. 9 issue of the journal Nature. Both studies found that the Web’s growth dynamics and its topology — that is, the way it’s put together — follow what’s known in physics as a power law.
“The Web doesn’t look anything like we expected it to be,” said Notre Dame physicist Albert-Laszlo Barabasi, who along with two colleagues studied the Web’s topology. A power-law distribution means that the Web doesn’t follow the usual mathematical models of random networks, but instead exhibits the type of physical order found in, say, magnetic fields, galaxies and plant growth.
Barabasi said that although the average Web page has seven links to other pages, “there is a very, very high number of Web pages that have a huge number of connections” — far higher than they anticipated based on traditional mathematical models.
Shape of the web
The power-law connection means it’s possible to figure out the shape of the World Wide Web, even if you can’t precisely map out every site and page on the network. Barabasi and his colleagues studied the distribution of links on a variety of sites — at Notre Dame, at South Korea’s Seoul National University, at the White House, at Yahoo — and found that there was a consistent relationship between size and connectedness.
That relationship can be used to determine the average shortest path between two points in a network; that is, the “diameter.” Thus, if you accept the estimate that there are 800 million documents on the Web, you come up with an average “distance” of 19 links between two randomly selected points.
It’s something like the movie “Six Degrees of Separation” (or the Hollywood name game known as “Six Degrees of Kevin Bacon”) — the idea that everybody on Earth is connected to each other through six intermediate steps.
If you picked out any two random Web pages, they might be linked directly to each other, or it might take hundreds of intermediate clicks to get from one page to the other. But if you went through that exercise thousands of times and tallied all those clicks, the findings indicate that the average would be roughly 19. With a chuckle, Barabasi says it would be just fine to think of it as “19 Clicks of Web Separation.”
More than a game
But the power-law findings aren’t just a game: The researchers say that studying online topography is crucial “in developing search algorithms or designing strategies for making information widely accessible on the Web.” Even if the size of the Web mushrooms to 10 times its current size, the degrees of separation would only rise slightly, from 19 to 21. That is likely to increase the reliance on intelligence search techniques that can adroitly skip from site to site, seeking out the most relevant or most popular sites within the Web behemoth.
Search-engine companies already are relying increasingly on such techniques, said Danny Sullivan, editor of Search Engine Watch. He said that’s the rationale behind search sites such as Google, which ranks its results by link “importance” ... DirectHit, which bases its analysis on “what people are clicking on” ... and Inktomi, which is “looking at what people are actually viewing.”
Yahoo’s success with a hand-picked link database is “actually another example of why we can’t use (just) brute force,” Sullivan said. “One of the reasons why Yahoo is so popular is because human beings actually do a pretty good job of picking out the best sites.
“If I give you a whole bunch of needles (in the proverbial haystack) and you want to pick out the best one, you need to go on something more than the fact that they’re all needles,” he said.
NEC researcher Steve Lawrence, who has done his own statistical surveys of the Web, said the new findings could help the designers of future search tools.
“There’s an opportunity for intelligent agents that take starting points from search engines and follow the links to go find what the user is after,” he said. The “19-click” finding could also provide a ballpark estimate for how deeply a Web crawler needed to dig, he said.
Barabasi’s colleagues in the topology study are Reka Albert and Hawoong Jeong of Notre Dame.
The web's growth dynamics
Another study in Nature, looking at the global network’s growth dynamics of the Web, confirms the idea that the World Wide Web follows natural laws and can be studied as “an ecology of knowledge.”
“There is order hidden in the Web,” said Bernardo Huberman and Lada Adamic of the Xerox Palo Alto Research Center.
They found that a power law was at work in the distribution of Web pages — that a diminishingly small proportion of sites had an increasingly large page count. The proportions appeared to hold steady over various samples of the Web. For example, if data were collected from 250,000 Web sites, the probability of finding a site with a million pages would be 1 in 10,000, the researchers said.
The latest study by Huberman and Adamic is part of a series showing that site size, like site traffic, is distributed unequally: A small number of sites are responsible for a disproportionately large part of the Web’s volume and activity.
Huberman said the growth of the Web was subject to two dynamics: the fact that the total number of sites is growing exponentially, and the fact that the fluctuations in the size of a particular site are proportional to the size of the site.
“The more pages a site has, the more likely it is that more pages will be added to it,” he said. “It’s just like the growth of a tree.”
Like a tree, the total size of the Web will eventually become subject to resource limitations, Huberman acknowledged. But in his view, the current Web is still just a sapling, with plenty of potential for continued exponential growth.
“I think that we might end up in an era where, just as people today have their own e-mail addresses, people will have their own Web sites,” he said. “But eventually it will taper off. Eventually it has to be self-limiting.”