Find missing numbers?

ThisGuy · 3/30/11

From Andy's recently resurrected post "DE Shaw interview questions":

" 3. The integers from 1 to n are stored in an array in a random fashion. but one integer is missing. write a program to find the missing integer."

Let's make this a bit harder: suppose k numbers are missing, how do we identify them? The solution to the original problem adapts pretty easily to give an O(kn) algorithm but I'm wondering if there's anything faster/more elegant.

Lyosha · 3/30/11

imo create array of booleans and check off numbers as found iteratively.

If you are afraid of that, radix sort will sort integers (and just about anything serializable [just about anything nowadays is...] for that matter...) in bigO of N time (well, technically bigO kN time but who actually cares about the K...).

ThisGuy · 3/30/11

Oh, of course I have to tie your hands behind your back: you're not allowed to use (very much) additional memory. (Or to put it in a less arbitrary way: what if n is so huge that your additional array of booleans won't fit in memory?)

About radix sort: I care about the k! Radix sort is linear time when your integers lie in a fixed range but here they don't. Radix sorting the integers 1 through N will take O(N log N) time.

Tuan Pham · 4/1/11

ThisGuy said:
From Andy's recently resurrected post "DE Shaw interview questions":

" 3. The integers from 1 to n are stored in an array in a random fashion. but one integer is missing. write a program to find the missing integer."

Let's make this a bit harder: suppose k numbers are missing, how do we identify them? The solution to the original problem adapts pretty easily to give an O(kn) algorithm but I'm wondering if there's anything faster/more elegant.

Sorry if this is a dumb question. But without additional memory, how can you find the k missing numbers in O(kn)? I can do it in O(n) (the time it takes to read in the list of numbers) for k = 1 and 2. But I don't know how to do it for bigger k.

Tsotne · 4/1/11

Tuan Pham said:
Sorry if this is a dumb question. But without additional memory, how can you find the k missing numbers in O(kn)? I can do it in O(n) (the time it takes to read in the list of numbers) for k = 1 and 2. But I don't know how to do it for bigger k.

Could you clarify your question? I don't have C/C++ on this computer and will do it straight in C#. It is very easy.

Tsotne · 4/1/11

I just did it and will upload it here.

Tuan Pham · 4/1/11

Problem: The integers from 1 to n are stored in an array in a random fashion. but k integers are missing. Write a program to find the missing integers.

Without using a boolean array to check off the numbers or storing them and sort, how can you find the k missing numbers in O(kn) (as claimed by @ThisGuy)?

@Tsotne: I believe you sorted the array, which is O(n*logn). And if you are only missing 1 number, you don't need to do that. Let N = n(n+1)/2. Every time you read in a number, just subtract it from N. The remainder is the missing one. This has complexity O(n) (for reading the list of numbers). You can use a similar trick for k = 2 (and possibly 3 and 4, but not higher).

Tsotne · 4/2/11

Tuan Pham said:
Problem: The integers from 1 to n are stored in an array in a random fashion. but k integers are missing. Write a program to find the missing integers.

Without using a boolean array to check off the numbers or storing them and sort, how can you find the k missing numbers in O(kn) (as claimed by @ThisGuy)?

@Tsotne: I believe you sorted the array, which is O(n*logn). And if you are only missing 1 number, you don't need to do that. Let N = n(n+1)/2. Every time you read in a number, just subtract it from N. The remainder is the missing one. This has complexity O(n) (for reading the list of numbers). You can use a similar trick for k = 2 (and possibly 3 and 4, but not higher).

Hi Tuan! I just did it on the first problem. I hadn't read the second while doing. Yes the second looks easy one too.

Tuan Pham · 4/2/11

I agree that it is easy if you sort the list. What if you are not allowed to sort the list?

Tsotne · 4/2/11

Tuan Pham said:
I agree that it is easy if you sort the list. What if you are not allowed to sort the list?

It is easy again. I would simply count to n in the vector and get the missing value. To be more specific, let's assume I have a vector holding n-1 numbers. n itself is the number from which random numbers are chosen. We cannot duplicate a number chosen so we have exactly one missing. So I would simply choose every next number in a vector and check (in for cycle for example)if it exists. If not, then that is the missing number. I'll just do it and upload here.

alain · 4/2/11

Tsotne said:
... We cannot duplicate a number chosen so we have exactly one missing. So I would simply choose every next number in a vector and check (in for cycle for example)if it exists. If not, then that is the missing number. I'll just do it and upload here.

what do you mean by check if it exists? That could be very expensive.

Tsotne · 4/2/11

alain said:
what do you mean by check if it exists? That could be very expensive.

Actually you are right but I did it at the first thought so I'll think how to simplify it. I'll be uploading it in a moment.

Tsotne · 4/2/11

@alain @Tuan Pham Here it is. I'll reduce the amount of codes when come up with new idea. Iteration is kinda rude way for this kind of problems.

Tsotne · 4/3/11

Any more ideas about how to implement it without member-by-member iterations???

alain · 4/3/11

what do you mean by member-by-member iteration? looping? I saw your program and you have 2 loops doing the search, worse case scenario means the algorithm is (O(n^2))

I was thinking about this problem today. If we only need to find one missing element, the algorithm already mentioned of subtracting the elements from the total seems like the right solution.

If there are multiple elements missing, things get more complicated. If the amount of memory is not restricted, my first idea was to hash the array. Then loop through all the real values to find the missing ones. Search on a hash is (O(1)) so the overall algorithm is (O(n)).

Tsotne · 4/4/11

Yes in member-by-member iteration I meant looping. What my program does is to go over all the elements of the array and check if each of them match the predefined number "j" which is increasing in a parent loop by 1. If the match confirms, variable (counter) jCount retains the value 0 so the missing number is caught. Either way, we cannot escape loop I think. All in all, I'll just do it for the second version you mentioned and for multiple missing elements too.

ThisGuy · 4/4/11

Tuan Pham said:
Let N = n(n+1)/2. Every time you read in a number, just subtract it from N. The remainder is the missing one. This has complexity O(n) (for reading the list of numbers). You can use a similar trick for k = 2 (and possibly 3 and 4, but not higher).

It sounds like you're thinking the same thing that I have in mind, so hopefully I didn't drive you crazy with the statement that "you're not allowed to use (very much) additional memory" since I don't think you can do this using only constant additional memory (I suppose that I was assuming that k << n so that even O(k) is not "very much"). As you say, it's easy to find out the sum of the missing numbers. For k = 1, this simply tells you which number is missing. For k > 1, knowing the sum of the numbers tells us their mean. For k = 2, this works particularly nicely: exactly one of the two missing numbers will be less than their mean m and now adding up all the numbers in our array less than m and subtracting that from m(m - 1)/2 will give us the smaller missing number which of course tells us the larger missing number.

For k > 2, you can do basically the same thing. Let's instead solve the more general problem: given an array of size n which contains each of the numbers between a and b at most once, find which numbers between a and b are missing. To do this, we go through the array adding up all the numbers between a and b as well as counting how many we find (which of course tells us how many are missing). As above, this tells us the mean m of the missing numbers. If only one number is missing then we are done, otherwise we just recurse to find the missing numbers between a and m - 1, and m + 1 and b. If that doesn't find all the missing numbers, it must be that m is itself an integer and missing which will then complete the set. Since there will be missing numbers both above and below the mean, both of the recursions will have a smaller number of missing numbers and induction tells us that the time taken for this will be O(nk). It seems to me like there should be some more efficient way to do this but I can't see how that would work.

How much memory is used is actually an interesting (to me, at least) combinatorics question: the memory usage will be equal to the depth of the recursion tree which clearly will be O(k) but I'd bet that it's actually O(log(k)). If that's true, though, I haven't figured out how to prove it.

Tuan Pham · 4/5/11

@ThisGuy: Thanks for the explanation. That was not what I had in mind

. I did the cases when k=1, 2,3,4 differently, and my method doesn't generalize like yours.

ThisGuy · 4/6/11

Oy, I was so focused on thinking recursively that I failed to notice how much extra calculation that's costing me. For example, after the first run through I've found the mean m of all the missing numbers. Now if I just recurse then I'll compute the sum of the numbers below m and the sum of the numbers above m independently, using 2 passes through the array and 2*n operations. But of course I could just compute that in a single pass through the array, keeping two running sums and adding each number to the appropriate one. The algorithm works out to something like iteratively building the recursion tree that would result from doing the computation recursively, each stage taking less than n*d operations where d is the depth of the current tree. So this algorithm will be O(n*D^2) where D is the depth of the full recursion tree, which I suspect means O(n*log(k)^2) although I can't actually prove that at the moment.

yongge · 4/6/11

I'm not trying to criticize the original question. The n is so huge and you are able to store these n numbers in memory (or in an external file), but you are not able to store the additional array of booleans. For me, it sounds very unrealistic. The answer to your original question can only increase the problem size n by two fold.

Your solution is quite interesting and very general. It opens my eyes

ThisGuy said:
Oh, of course I have to tie your hands behind your back: you're not allowed to use (very much) additional memory. (Or to put it in a less arbitrary way: what if n is so huge that your additional array of booleans won't fit in memory?)

Find missing numbers?

ThisGuy

Lyosha

Psychic in Training

ThisGuy

Tuan Pham

Tsotne

Tsotne

Attachments

Tuan Pham

Tsotne

Tuan Pham

Tsotne

alain

Older and Wiser

Tsotne

Tsotne

Attachments

Tsotne

alain

Older and Wiser

Tsotne

ThisGuy

Tuan Pham

ThisGuy

yongge