Google Interview Question: How would you reorder 9 TB of Google data in under 3 seconds using only 2 KB of memory?
It is possible, but first I will have to make a few assumptions
Well now what you can do is apply Counting sort . Since the data is a string, you can have only 256 different characters, so you declare an array that will store the frequencies of each character, something like :
Now all you have to do is iterate through the array and update the frequencies. If you can loop times in 3 seconds then we'll and good, your answer will be stored in the
Now on a normal computer it can do about calculations in 3 sec.
So you will have to place a thread at every index which is a multiple of . So that will require about 334,000 threads.
So after 3 seconds you will have your data sorted and compressed :) Hope this was a satisfactory answer. :)
Turns out that counting sort can actually compress characters in 2.5 sec. So the original 334,000 would now reduce to 1,000 threads . And since we are talking about Google Servers which are much more powerful than my puny laptop they can do it even faster and would probably require even lesser threads.
Note 1 : Firstly I am NOT an expert on Threads and Distributed Systems, just an average C++ programmer.I did have fun answering this question though.
- The 1 TB is stored as a string of characters
char data[] = "743b520cd840..." - You have access to threads.
- Creating threads costs no memory. (Distributed System)
- 1 TB = B (easier for calculations)
Well now what you can do is apply Counting sort . Since the data is a string, you can have only 256 different characters, so you declare an array that will store the frequencies of each character, something like :
long long int freq[256] Now all you have to do is iterate through the array and update the frequencies. If you can loop times in 3 seconds then we'll and good, your answer will be stored in the
freq[] array in sorted order.Now on a normal computer it can do about calculations in 3 sec.
So you will have to place a thread at every index which is a multiple of . So that will require about 334,000 threads.
So after 3 seconds you will have your data sorted and compressed :) Hope this was a satisfactory answer. :)
UPDATE :
I became curious as to how fast counting sort can compress its data so I actually ran some tests : (link to code : Counting Sort)
Note 1 : Firstly I am NOT an expert on Threads and Distributed Systems, just an average C++ programmer.I did have fun answering this question though.
Comments
Post a Comment