Google Interview Question: How would you reorder 9 TB of Google data in under 3 seconds using only 2 KB of memory?

 It is possible, but first I will have to make a few assumptions

  • The 1 TB is stored as a string of characters   char data[] = "743b520cd840..."
  • You have access to threads.
  • Creating threads costs no memory. (Distributed System)
  • 1 TB = 1012 (easier for calculations)

Well now what you can do is apply Counting sort . Since the data is a string, you can have only 256 different characters, so you declare an array that will store the frequencies of each character, something like :
long long int freq[256]

Now all you have to do is iterate through the array and update the frequencies. If you can loop 1012 times in 3 seconds then we'll and good, your answer will be stored in the  freq[]  array in sorted order.

Now on a normal computer it can do about 3106 calculations in 3 sec.
So you will have to place a thread at every index which is a multiple of 3106 . So that will require about 334,000 threads.

So after 3 seconds you will have your data sorted and compressed :) Hope this was a satisfactory answer. :)

UPDATE :

I became curious as to how fast counting sort can compress its data so I actually ran some tests : (link to code : Counting Sort)
​Turns out that counting sort can actually compress 109 characters in 2.5 sec. So the original  334,000 would now reduce to 1,000 threads . And since we are talking about Google Servers which are much more powerful than my puny laptop they can do it even faster and would probably require even lesser threads.

Note 1 : Firstly  I am NOT an expert on Threads and Distributed Systems, just an average C++ programmer.I did have fun answering this question though.

Comments

Popular posts from this blog

Which is the best movie on your list?

The best movie theory ever?

What are some of the most fascinating instances of attention to detail in movies?