Category: Technology

eBay’s enormous data warehouses

Curt Monash meets with Ebay’s Oliver Ratzesberger and gets us numbers on two of the world’s largest data warehouses in the world. Look at these Ebay stats!

Metrics on eBay’s main Teradata data warehouse include:

  • >2 petabytes of user data
  • 10s of 1000s of users
  • Millions of queries per day
  • 72 nodes
  • >140 GB/sec of I/O, or 2 GB/node/sec, or maybe that’s a peak when the workload is scan-heavy
  • 100s of production databases being fed in

Metrics on eBay’s Greenplum data warehouse (or, if you like, data mart) include:

  • 6 1/2 petabytes of user data
  • 17 trillion records
  • 150 billion new records/day, which seems to suggest an ingest rate well over 50 terabytes/day
  • 96 nodes
  • 200 MB/node/sec of I/O (that’s the order of magnitude difference that triggered my post on disk drives)
  • 4.5 petabytes of storage
  • 70% compression
  • A small number of concurrent users

More details.

It is personal

That would be the user generated content conference in February 2009 in California. I will be speaking at the conference and I am looking forward to it. I am surrounded by user generated content: open source software in the software space, wikipedia, flickr and creative commons, ThinkBig,  just to name a few. I am looking forward to meeting some of the other speakers and putting together a kick ass presentation.

I am obsessed these days with search, particularly searching user generated content. You can’t use what you can’t find. I know that everyone else is more concerned about trust and accuracy: like in can you trust what you are reading and is it accurate. Search will find the accuracies and inaccuracies all the same! Searching photographs is particularly painful since we typically rely on keywords to be associated with images to find them. It is a starting point but it is limiting. Large scale image searching on the web is still in its infancy. We have seen a lot of development in the search space this year, with the introductions of visual search features, colour searching, keywords + geotags but we still have a ways to go. What I have been thinking a lot about is the creation of a world visual repository: imagine (soon) being able to take a photograph of anything and getting back (useful) information about what you photographed, via mobile devices preferrably.

How far are we from this dream? I say: not very far. Not very far. Start work on stealth project now!

Multicolr Fun

Every once in a while we get an email from someone who has played with our lab technologies and built something fun and exciting. Last month I received an email from my good friend Patrick and since then it has been sitting in my inbox begging for attention! I am such a delinquent when it comes to emails! But that said: Patrick has an awesome career: he spends his time building exciting project, exploring how technology can enhance people’s lives and experiences. You may have seen some of his projects around Toronto. Just recently he built the TXTris wall version 2 which was showcased during HoHoTo on Monday. It was awesome, and I know a few of us spent time sending Tweets just to see them scroll down the screens (some the tweets were not fit to print so we won’t be repeating those here!).

Patrick was inspired by our Multicolr search to build a little prototype: this prototype basically takes a stream from a webcam and picks out the dominant colours and passes them through our Multicolr Search to find photographs that match those colours. Very, very neat and you can view it here:


Colrfindr from Patrick Dinnen on Vimeo.

HoHoTo

That’s the Toronto Technology Party we are planning! In Toronto, on Tuesday Monday December 16 15. Of course this is a last minute thing, but we are used to it; we bring you DemoCamp, Mesh and all sorts of Toronto geek gatherings so we are used to putting our heads together and getting things done.

We need you to get the word out to everyone in the software/technology space in Toronto as we would love to see everyone before the year end. The number of tickets for the party is limited so getting your ticket is a must – ’cause begging ain’t gonna to get you in the door.

When: Tuesday Monday December 16 15, 2008 starting at 19:00

Where: The ModClub

Who: Geeks! and everyone else.

You are all welcome to join us. All proceeds from the party will go to supporting the Toronto Daily Food Bank. More details to come your way in the very near future but don’t let that stop you from getting your ticket(s) now.

OpenSource

From business week: “For anyone who hasn’t been paying attention to the software industry lately, I have some bad news. The open-source business model is broken.”

Well to some extent that’s true but open source has brought incredible innovation and collaboration in the software industry … it just needs a bit of fixing.

The next 5000 days

Kevin Kelly on the next 5000 days of the web: we are all connected. This reminds me of the conversations Paul and I have been having about an image registry; but that’s for an other blog post. Great talk.

Commenting: the idiot’s way

This idiot’s guide to posting comments on the internet from The Onion made me laugh out loud! Have a read for yourself, here is an excerpt:

"Later this evening, I intend to watch the video in question, click
the ‘reply’ link above the box reserved for user comments, and draft a
response, being careful to put as little thought into it as possible,
while making sure to use all capital letters and incorrect
punctuation," Mylenek said. "Although I do not yet know exactly what my
comment will entail, I can say with a great degree of certainty that it
will be incredibly stupid."

Mylenek, who rarely in his life has been capable of formulating an
idea or opinion worth the amount of oxygen required to express it, went
on to guarantee that the text of his comment would be misspelled to the
point of incomprehension [...]