Wednesday, September 20, 2017

Deleting duplicate entries and sorting the rest

I learned an interesting (and very easy) way to remove duplicate entries as well as to sort them. Basically, in C++ it is achieved by creating a new set and initialize with values from the vector.  We shift all the hardwork to the std library to do this.

#include <iostream>
#include <vector>
#include <set>


void print(std::vector<int> &v)
{
    std::cout << "[";
    for(std::vector<int>::iterator it = v.begin(); it != v.end(); ++it)
    {
        std::cout << *it << ",";
    }
    std::cout << "]" << std::endl;
}


void print(std::set<int> &v)
{
    std::cout << "{";
    for(std::set<int>::iterator it = v.begin(); it != v.end(); ++it)
    {
        std::cout << *it << ",";
    }
    std::cout << "}" << std::endl;
}



int main()
{
    int inits[] = {1,10,3,2,2,4,4,5,6,7,8,8,9};
    std::vector<int> vec(inits, inits + sizeof(inits)/sizeof(inits[0]));
    print(vec);
    std::set<int> ves(vec.begin(), vec.end()); // both sorted & unique already
    print(ves);
}

Tuesday, September 19, 2017

Finding missing sequential numbers

Say you have an array with integer sequential numbers in it, but there are missing numbers in it.  How to find what are those numbers?  Assume the array is already sorted ascendant.

Solution
On the Internet, I found cases where there is only one number missing and the approach is to do sum of the numbers and verify if the sum S equals to n*(n+1)/2.  S(n) = n + (n-1) + (n-2) + .... + 0.  If not equal, then difference is equal to the missing number, because S + missing number must be = n*(n+1)/2.

def MissingSingleNum(A):
    sum = 0
    n = A[len(A)-1]
    for i in A:
        sum = sum + i
    computed = n*(n+1)/2
    if sum != computed:
        return computed - sum
    else:
        return "None"

A = [1,2,4,6,7,8]
MissingSingleNum(A)

Some weaknesses of the above algorithm it cannot find more than one missing number and numbers in the array should not be negative.  So in case of [-3,-2,-1,0,1,2,3,5], the computed sum is 8*(8+1)/2 = 36 (remember n is = number of elements in the array), where it is supposed to be 5, and the missing number is 4.  The same issue for A=[0,1,2,4,6,7,8] where [3,5] are missing, the algorithm cannot find that either.

To solve multiple missing numbers, we cannot use the above approach.  Basically we need to classify the cases.  Say we have A = [-2, 0, 2, 6, 12]
By looking at it, we know for the range [-2..12], the missing numbers are [-1,1,3,4,5,7,8,9,10,11].

One approach is, we scan from the lowest number to highest number and see if there are missing sequential numbers. Because we already know the array is sorted, we scan a range from A[0] to A[len-1] and check against a dictionary we build (see below).  The dictionary (or hash table) is very fast, it's only O(1) when the numbers are unique.

def IsExist(d,k):
    try:
        d[k]
        return True
    except:
        return False


def MissingNumbers(A):
    n = len(A)
    miss = []
    d = {}
    j = 0
    for e in A:
        d[e] = j
        j = j+1

    for i in range(A[0], A[n-1]):
        try:
            if not IsExist(d,i):
                miss.append(i)
        except:
                pass
                
    return miss


Test:

A[] = [0, 2, 4, 5, 6]
Missing numbers: [1, 3]

A[] = [-2, 0, 5, 16]
Missing numbers: [-1, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

A slightly better way or an improvement of above code is the factthat we can use <array>.index(value) to check if a value exists in the array (so no need to use dictionary).

def MissingNumbers(A):
    n = len(A)
    miss = []

    for i in range(A[0], A[n-1]):
        try:
            if A.index(i):
                pass
        except:
                miss.append(i)
                
    return miss


Wednesday, September 13, 2017

Finding two items in array that has sum of s

Suppose we have an array A with n items (A[0]...A[n-1]) and we want to find two members in the array (must be unique items, e.g, cannot be A[i] + A[i]) that has the sum of s.

With Brute-Force approach, we actually scan the array from i=0 to n-1 and check if the sum with its next item equals s.

For example, a Python code like below:

#!/usr/bin/python3

A = [-1, 3, 5, 6, -2, 10, 7, 9, 8, 4]

s = int(input("Enter the wanted sum: "))

# brute-force

for k in range(len(A)-1):
    for l in range(k+1,len(A)):
        if A[k] + A[l] == s:
            print("Found it: A[%d]+A[%d] = %d" % (k,l,s))
            print("\t%d+%d = %d" % (A[k],A[l],s))
            break


The problem with this approach is obviously not optimal (the complexity is O(n2)).

A better method is if we don't need to walk through the array second time (the second loop).  We still need to walk through each item in the list, though.

def ArrayToDict(A):
    h = {}
    for i in enumerate(A):
        key = i[1]
        val = i[0]
        h.setdefault(key, []).append(val)
    return h

def IsValExist(H,k):
H = ArrayToDict(A)
for enum in enumerate(A):

    ret = True
    try:
        H[k]
    except:
        ret = False
    return ret


    i = enum[0]
    v = m - A[i]
    if IsValExist(H,v):
        for iv in H[v]:
            if iv != i:
                print("GOT IT: A[%d](%d) + A[%d] (%d) = %d" % (i, A[i], iv, A[iv], m))


With this approach, the complexity is only O(n) (assuming only very few key duplicates, if any, in the hash/dictionary).

Sunday, September 10, 2017

Horse Racing Puzzle

Question: There are 25 horses available.  We want to select the fastest three horses out of this 25 but are allowed to do horse-racing with 5 horses at a time only.  How many minimum numbers of horse races we need to do to get the best 3?

Answer:

If you answer 11, it might not be wrong, but that's not optimal.  The key here is "can we quantify the speed of each horse doing the race?".  Say we do 5-horses racing, can we note the individual speed of each horse?  If we can, then the life is beautiful.

Here is the steps (if can measure the speed of each horse):

  1. Do 5-group races (5-horse for each group).  We record the speed of top-3 (say a,b,c,d,e correspond to race group, and 1, 2, and 3 correspond to the speed ranking, then we get S[a][1], S[a][2], S[a][3], S[b][1], S[b][2], S[b][3], ..., S[e][1], S[e][2], S[e][3]).  (Note: S[group][rank] := speed value for group and rank).
  2. Sort the table (We can flatten the 2-dimension to 1-dimension). For example S[a][1] = T[1], S[a][2]=T[2], ...., so on. 
  3. Pick the top-3.  We get the top-3 fastest horses.
  4. Total race = 5
What about if don't have the ability to measure the speed (we know the top-3 just from seeing the race)?
  1. Do 5-group races (say Group A, ..., E), each with 5 horses (total = 5 races)
  2. For each group, we position the top-3 winners. For example, for group A: A1>A2>A3, B1>B2>B3 so on. (A1, A2, ..., A5 are arbitraries)
  3. As we are interested in the top-3 only, we should remove the Rank 4 - rank 5 horses (e.g, eliminate all horses in rank 4 and 5.  Thus A4,A5, B4, B5, C4,C5,D4,D5,E4,E5 are eliminated)
  4. We don't know whether A3>B3, or A3>C3, or B3>E3.  The same thing for A2 vs B2 so on.
  5. We do another race (6th race): A1, B1, C1, D1, E1.  If we get the winner: E1>C1>D1>B1>A1 (E1 the 1st winner, C1 the 2nd, D1 the 3rd) so we can eliminate B1 and A1 (the 4th and 5th).
  6. Because we eliminate A1 (the 1st winner in Group A), A2 and A3 (2nd and 3rd winners of group A) are automatically eliminated as well.
  7. Because we eliminate the top winner in group B (B1), there is no chance B2 and B3 gonna be the winners so they would be eliminated as well.
  8. 7th race: we do race C1 against D1 against E1.  Say the outcome is E1>C1>D1.  This way, we are sure E1 would be the fastest of all, C1 would be the runner-up and D1 is in the third place.  What about C2, C3, D2, D3, E2, E3? There is no chance they would win (Hey, if the best in each group [rank 1] couldn't win the top, how the 2nd and 3rd place would win?)
The original table (after 5 races), we rank the winners of each group as follows:
Group A Group B Group C Group D Group E
Rank 1 A1 B1 C1 D1 E1
Rank 2 A2 B2 C2 D2 E2
Rank 3 A3 B3 C3 D3 E3
Rank 4 A4 B4 C4 D4 E5
Rank 5 A5 B5 C5 D5 E5

After eliminating 4th and 5th of each group:
Group A Group B Group C Group D Group E
Rank 1 A1 B1 C1 D1 E1
Rank 2 A2 B2 C2 D2 E2
Rank 3 A3 B3 C3 D3 E3
Rank 4




Rank 5





After 6th race (A1,B1,C1,D1,E1), the winners are E1>C1>D1>B1>A1.  We eliminate A1, B1 (4th and 5th winners of 6th race)
Group A Group B Group C Group D Group E
Rank 1
C1 D1 E1
Rank 2 A2 B2 C2 D2 E2
Rank 3 A3 B3 C3 D3 E3
Rank 4




Rank 5





Because the best of group A (A1) and the best of group B (B1) are eliminated (losers), there is no chance for 2nd and 3rd winner of group A (A2, A3)  and 2nd and 3rd winner of B (B2, B3) would win. We eliminate A1, A2, B1, and B2:

Group A Group B Group C Group D Group E
Rank 1
C1 D1 E1
Rank 2
C2 D2 E2
Rank 3

C3 D3 E3
Rank 4




Rank 5





We've got E1>C1>D1.  If the best of group D (D1) only sits in the 3rd place, the worse two in group D (D2 and D3) would definitely never have a chance to win.  We eliminate D2, D3.

C1 could only sit in the 2nd place, C2 is still possible to sit in the 3rd place, but the 3rd place in group C (C3) would not get a chance to sit in the last 3rd place either. Eliminate C3.
Group A Group B Group C Group D Group E
Rank 1
C1 D1 E1
Rank 2
C2
E2
Rank 3


E3
Rank 4




Rank 5





We already know, E1 is the best of all, so there is no point to do a race with the horse again.  Now we have E2, E3, C1, C2, D1 to race against each other.

7th race: Say we get E2>C1>C2.  We could eliminate E3, D1 then.  Now we have E1>E2>C1>C2.  We eliminate the 4th (C2).

We get the top-3 (E1>E2>C1) !!!

Group A Group B Group C Group D Group E
Rank 1
C1
E1
Rank 2


E2
Rank 3



Rank 4




Rank 5






NOTE:
After second thought, the 4th and 5th elimination in the beginning actually assumes these bottom two are the lowest of all.  This may be dangerous, as we split the race into groups, each group is independent each other.

For example, after 5 races performed, the speed of horses (in Miles/hour and ordered according to the rank) in
group A:  [20.0, 19.0, 18.0, 17.0, 16.0]
group B:  [25.0, 24.8, 24.6, 24.4, 24.2]

It is obvious, B4 and B5 (the bottom two) are still faster than even the fastest in group A.  If we eliminate indiscriminately against all 4th and 5th winners, B4 and B5 are got rid of, while the actual losers (all horses in group A) are picked up (at least A1, A2, A3 are selected, where they shouldn't be).  Now you see the problem?

It is similar to say, the top-3 smartest men among 5-idiots better than 4th and 5th place of smartest men among 5-smart men.

I dunno why the explanations for the issue on the Internet fall into the same fallacy (by the way, this question is one of Google's interview question, I believe).

Monday, August 28, 2017

Essential Open Source Projects for Software Development

I've crossed some these jargon recently when looking at job openings.  Interesting.  Apparently, everything we bought in the past to help full cycle software development now are available in open source (I mean, there are opensource alternatives for almost all the proprietary tools).

Here I list some of them:


  • To do code review (ala Code Collaborator): Gerrit
  • Version control: git, svn, cvs
  • Generic Scheduler for automation: Jenkins
  • Software test automation: Selenium WebDriver
  • Compiler set: GCC
  • Rich C++ Libraries: Boost
  • Dataplane SDK/library (Collection of libs & driver for fast data processing): DPDK
  • Data Packet Sniffer & Analyzer: Wireshark
  • Network Traffic Generator: Ostinato
  • Embedded Operating System: OpenWRT, etc.
  • Embedded Real-Time Operating System: FreeRTOS
  • Full package of tools and libs to build whole Linux and its packages (apps): Yocto
  • Opensource Electronic based on AVR chips: Arduino
  • IoT Development board (running OpenWRT): Onion
  • IoT Connectivity protocol framework: MQTT (pronounced as 'Mosquitto')
  • IoT devices: ESP32, Arduino
  • JTAG, ICE, OCD: OpenOCD