sábado, 25 de octubre de 2014

Finding the minimum common element between two arrays of integers


Find the common minimum element in two arrays



In my first post, I would like to write about the challenging  programming problem that I faced. Actually it is not so difficult, but it is the kind of problem that someone can face applying for a job in Facebook or Google. So this is the seducing part.


The problem is following: Let's supposes that we have two arrays of positive integers.


Such as:
array1={1,4,5,2,8,9,7}
array2={6,5,2,3,4,5,6,9}

The minimum common number between  these two arrays is 2.

Clearly, we are facing a problem of sorting and then finding the common number that, at the same time, is the minimum in both arrays.

In order to resolve it, I will propose two solutions, one under a programming environment C and another less complex in Perl.

First Solution in C

If we got two sorted arrays like:

{2,3,4,6,7,9} and {3,4,6,9,11}

clearly the common minimum element (cme) is 3. Then it seems that we need to sort both arrays.

Sorting the arrays is not a problem, we can use algorithms like merge-sorting or counting sort. The real problem to beat is how to get the common number without iterate through the two arrays. To do that I propose to use the strategy  that counting sort uses to sorting an array.

In counting sort, it uses an array to count the number of occurrence among the elements in the array, we can use that idea to extract the elements that appear more than 1 time.

Using the fact that the arrays are sorted, and considering the maximum value between the last elements of these two arrays, we can construct the array of occurrences with the size of that maximum value. In the previous case the maximum between 11 and 9 is 11.

0  0  1  2  2  0  2  1  0  2  0   1    occurrences
0  1  2  3  4  5  6  7  8  9 10 11   indexes

In this case, extracting the minimum common element is done in a time cost O(n). Because we can do it iterating through the array of  occurrence and stopping when the value is equals to 2.
 For this case, the element 2 is present just one time, but the element 3 is present 2 times, then we pick up that element and it is our answer to the problem.
Summarizing, the steps are:

  1. Order the arrays with counting sort approach   O(n)
  2. Remove the repeated elements                            O(n)
  3. Construct the array of occurrences with both arrays  O(n)
  4. Iterate through the array of occurrence from the begging and stopping when the value is equal to 2. Return the index when the iteration stopped. O(N)
The Code is presented bellow:

void countsort(int *a, int n)
{
        int *b, *result;
        int max = 0, i;
        for (i = 0; i < n; i++) {
                if (a[i] > max)
                        max = a[i];
        }
        ++max;
        b = (int *)calloc(max, sizeof(int));
        result = (int *)calloc(n, sizeof(int));
        for (i = 0; i < n; i++) {
                b[a[i]] += 1;
        }

        for (i = 1; i < max; i++) {
                b[i] = b[i - 1] + b[i];
        }

        for (i = n - 1; i >= 0; i--) {
                int temp = b[a[i]];
                result[temp - 1] = a[i];
                b[a[i]] -= 1;
        }

        for (i = 0; i < n; i++) {
                a[i] = result[i];
        }
}

void main(int argc, char * argv[])
{

    int array1[7]={4,3,4,5,6,2,4};
    int array2[3]={3,4,1};

    int n1=sizeof(array1)/sizeof(int);
    int n2=sizeof(array2)/sizeof(int);


    //ordering the arrays
    countsort(array1,n1);
    countsort(array2,n2);


    //removing repeated elements
    unique(array1,&n1);
    unique(array2,&n2);




    print(array1,n1);
    print(array2,n2);   
   
    int i;
    int max=array1[n1-1];
   
    if(max<array2[n2-1]) max=array2[n2-1];
   
    ++max;

    // creating the array of ocurrence
    int * count =(int *)calloc(max,sizeof(int));

    for(i=0;i<n1;i++)
    {
        ++count[array1[i]];
    }
   
    for(i=0;i<n2;i++)
    {
        ++count[array2[i]];
    }
    //extracting the common number
    for(i=0;i<max;i++)
        if(count[i]>=2) break;

    printf("the number is %d\n",i);
}




This approach is time cost O(n) which is very  efficient. However if we use a language like Perl the implementation turns into even simpler code.

Another point on this implementation is the use of memory. If we got an array of two elements like {2,1000}, the array of occurrence would be of 10001 elements full of zeros but for the index 2 and 1000.

Second Solution in Perl


In this case, we will use the hash tables structure to remove the repeated elements and count the repetitions.

The code is right bellow

#!/usr/bin/perl
use strict;
use warnings;

use sort '_mergesort';
my @array1 = (
4,3,4,5,6,2,4);
my @array2 = (
3,4,1);



# Removing repeated elements
my %fields1 = map {$_,1} @array1;
my %fields2 = map {$_,1} @array2;

@array1 = keys %fields1;
@array2 = keys %fields2;



# Ordering the both arrays
my @sorted_array1 = sort { $a <=> $b } @array1;
my @sorted_array2 = sort { $a <=> $b } @array2;



# Counting the occurrence
foreach(@sorted_array1){
        $fields1{$_} = 1;
}

foreach(@sorted_array2){
        ++$fields1{$_};
}

my @results = ();


# Extracting the common minimum element
foreach(keys %fields1)
{
        if($fields1{$_}>=2){
        push(@results,$_);
        }
}

@results = sort { $a <=> $b } @results;


print "minimum common value is $results[0]\n";
 

In this implementation the sorting process takes 0(n*log(n)) because the implementation of the sort function is merge-sort. Then the cost of the whole implementation is 0(n*log(n)) which is more expensive that the C implementation

 
This second solution has a simple implementation and uses very useful structure like hash tables. In contrast with the solution in C where the implementation is a little more complex, the implementation in Perl will show in front of the interviewers your domains on the programming structures.


Here is a table that summarizes the advantages and disadvantages of both approaches.




C Implementation
Perl Implementation
Advantages
  • Time Cost O(n)
  • Simple implementation
  • Use of properties of Hash tables
Disadvantages
  • If the max value is significantly bigger than the number of items, the implementation is not space-efficient
  • Time Cost O(n*log(n)) for sorting the arrays


Codes:
  1. Implementation in C here
  2. Implementation in Perl here