and the web is a very very big place: Google is indexing 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!
To keep up with this volume of information, our systems have come a
long way since the first set of web data Google processed to answer
queries. Back then, we did everything in batches: one workstation could
compute the PageRank graph on 26 million pages in a couple of hours,
and that set of pages would be used as Google’s index for a fixed
period of time. Today, Google downloads the web continuously,
collecting updated page information and re-processing the entire
web-link graph several times per day. This graph of one trillion URLs
is similar to a map made up of one trillion intersections. So multiple
times every day, we do the computational equivalent of fully exploring
every intersection of every road in the United States. Except it’d be a
map about 50,000 times as big as the U.S., with 50,000 times as many
roads and intersections.