开发者

Find an array inside another larger array

开发者 https://www.devze.com 2023-01-20 05:58 出处:网络
I was recently asked to write 3 test programs for a job. They would be written using just core Java API\'s and any test framework of my choice. Unit tests should be implemented where appropriate.

I was recently asked to write 3 test programs for a job. They would be written using just core Java API's and any test framework of my choice. Unit tests should be implemented where appropriate.

Although I haven't received any feedback at all, I suppose they didn't like my solutions (otherwise I would have heard from them), so I decided to show my programs here and ask if this implementation can be considered good, and, if not, then why?

To avoid confusion, I'll ask only first one for now.

Implement a function that finds an array in another larger array. It should accept two arrays as parameters and it will return the index of the first array where the second array first occurs in full. Eg, findArray([2,3,7,1,20], [7,1]) should return 2.

I didn't try to find any existing solution, but instead wanted to do it myself.

Possible reasons: 1. Should be static. 2. Should use line comments instead of block ones. 3. Didn't check for null values first (I know, just spotted too late). 4. ?

UPDATE:

Quite a few reasons have been presented, and it's very difficult for me to choose one answer as many answers have a good solution. As @adietrich mentioned, I tend to believe they wanted me to demonstrate knowledge of core API (they even asked to write a function, not to write an algorithm).

I believe the best way to secure the job was to provide as many solutions as possible, including: 1. Implementation using Collections.indexOfSubList() method to show that I know core collections API. 2. Implement using brute-force approach, but provide a more elegant solution. 3. Implement using a search algorithm, for example Boyer-Moore. 4. Implement using combination of System.arraycopy() and Arrays.equal(). However not the best solution in terms of performance, it would show my knowledge of standard array routines.

Thank you all for your answers!

END OF UPDATE.

Here is what I wrote:

Actual program:

package com.example.common.utils;

/**
 * This class contains functions for array manipulations.
 * 
 * @author Roman
 *
 */
public class ArrayUtils {

    /**
     * Finds a sub array in a large array
     * 
     * @param largeArray
     * @param subArray
     * @return index of sub array
     */
    public int findArray(int[] largeArray, int[] subArray) {

        /* If any of the arrays is empty then not found */
        if (largeArray.length == 0 || subArray.length == 0) {
            return -1;
        }

        /* If subarray is larger than large array then not found */
        if (subArray.length > largeArray.length) {
            return -1;
        }

        for (int i = 0; i < largeArray.length; i++) {
            /* Check if the next element of large array is the same as the first element of subarray */
            if (largeArray[i] == subArray[0]) {

                boolean subArrayFound = true;
                for (int j = 0; j < subArray.length; j++) {
                    /* If outside of large array or elements not equal then leave the loop */
                    if (largeArray.length <= i+j || subArray[j] != largeArray[i+j]) {
                        subArrayFound = false;
                        break;
                    }
                }

                /* Sub array found - return its index */
                if (subArrayFound) {
                    return i;
                }

            }
        }

        /* Return default value */
        return -1;
    }

}

Test code:

package com.example.common.utils;

import com.example.common.utils.ArrayUtils;

import junit.framework.TestCase;

public class ArrayUtilsTest extends TestCase {

    private ArrayUtils arrayUtils = new ArrayUtils();

    public void testFindArrayDoesntExist() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {8,9,10};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistSimple() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {3,4,5};

        int expected = 2;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistFirstPosition() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {1,2,3};

        int expected = 0;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistLastPosition() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {5,6,7};

        int expected = 4;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayDoesntExistPartiallyEqual() {

        int[] largeArra开发者_如何学Goy = {1,2,3,4,5,6,7};
        int[] subArray = {6,7,8};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistPartiallyEqual() {

        int[] largeArray = {1,2,3,1,2,3,4,5,6,7};
        int[] subArray = {1,2,3,4};

        int expected = 3;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArraySubArrayEmpty() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArraySubArrayLargerThanArray() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {4,5,6,7,8,9,10,11};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistsVeryComplex() {

        int[] largeArray = {1234, 56, -345, 789, 23456, 6745};
        int[] subArray = {56, -345, 789};

        int expected = 1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

}


The requirement of "using just core Java API's" could also mean that they wanted to see whether you would reinvent the wheel. So in addition to your own implementation, you could give the one-line solution, just to be safe:

public static int findArray(Integer[] array, Integer[] subArray)
{
    return Collections.indexOfSubList(Arrays.asList(array), Arrays.asList(subArray));
}

It may or may not be a good idea to point out that the example given contains invalid array literals.


Clean and improved code 

public static int findArrayIndex(int[] subArray, int[] parentArray) {
    if(subArray.length==0){
        return -1;
    }
    int sL = subArray.length;
    int l = parentArray.length - subArray.length;
    int k = 0;
    for (int i = 0; i < l; i++) {
        if (parentArray[i] == subArray[k]) {
            for (int j = 0; j < subArray.length; j++) {
                if (parentArray[i + j] == subArray[j]) {
                    sL--;
                    if (sL == 0) {
                        return i;
                    }

                }

            }
        }

    }
    return -1;
}


For finding an array of integers in a larger array of integers, you can use the same kind of algorithms as finding a substring in a larger string. For this there are many algorithms known (see Wikipedia). Especially the Boyer-Moore string search is efficient for large arrays. The algorithm that you are trying to implement is not very efficient (Wikipedia calls this the 'naive' implementation).

For your questions:

  1. Yes, such a method should be static
  2. Don't care, that's a question of taste
  3. The null check can be included, or you should state in the JavaDoc that null values are not allowed, or JavaDoc should state that when either parameter is null a NullPointerException will be thrown.


Well, off the top of my head:

  1. Yes, should be static.

  2. A company complaining about that would not be worth working for.

  3. Yeah, but what would you do? Return? Or throw an exception? It'll throw an exception the way it is already.

  4. I think the main problem is that your code is not very elegant. Too many checks in the inner loop. Too many redundant checks.

Just raw, off the top of my head:

public int findArray(int[] largeArray, int[] subArray) {

    int subArrayLength = subArray.length;

    if (subArrayLength == 0) {
        return -1;
    }

    int limit = largeArray.length - subArrayLength;

    int i=0;

    for (int i = 0; i <= limit; i++) {
        boolean subArrayFound = true;

        for (int j = 0; j < subArrayLength; j++) {
            if (subArray[j] != largeArray[i+j]) {
                subArrayFound = false;
                break;
            }

        /* Sub array found - return its index */
        if (subArrayFound) {
            return i;
        }
    }

    /* Return default value */
    return -1;
}

You could keep that check for the first element so you don't have the overhead of setting up the boolean and the for loop for every single element in the array. Then you'd be looking at

public int findArray(int[] largeArray, int[] subArray) {

    int subArrayLength = subArray.length;

    if (subArrayLength == 0) {
        return -1;
    }

    int limit = largeArray.length - subArrayLength;

    for (int i = 0; i <= limit; i++) {
        if (subArray[0] == largeArray[i]) {
            boolean subArrayFound = true;

            for (int j = 1; j < subArrayLength; j++) {
                if (subArray[j] != largeArray[i+j]) {
                    subArrayFound = false;
                    break;
                }

            /* Sub array found - return its index */
            if (subArrayFound) {
                return i;
            }
        }
    }

    /* Return default value */
    return -1;
}


Following is an approach using KMP pattern matching algorithm. This solution takes O(n+m). Where n = length of large array and m = length of sub array. For more information, check:

https://en.wikipedia.org/wiki/KMP_algorithm

Brute force takes O(n*m). I just checked that Collections.indexOfSubList method is also O(n*m).

public static int subStringIndex(int[] largeArray, int[] subArray) {
    if (largeArray.length == 0 || subArray.length == 0){
      throw new IllegalArgumentException();
}
    if (subArray.length > largeArray.length){
      throw new IllegalArgumentException();
}

    int[] prefixArr = getPrefixArr(subArray);
    int indexToReturn = -1;

    for (int m = 0, s = 0; m < largeArray.length; m++) {
      if (subArray[s] == largeArray[m]) {
        s++;
      } else {
        if (s != 0) {
          s = prefixArr[s - 1];
          m--;
        }
      }
      if (s == subArray.length) {
        indexToReturn = m - subArray.length + 1;
        break;
      }
    }

    return indexToReturn;
  }

  private static int[] getPrefixArr(int[] subArray) {
    int[] prefixArr = new int[subArray.length];
    prefixArr[0] = 0;

    for (int i = 1, j = 0; i < prefixArr.length; i++) {
      while (subArray[i] != subArray[j]) {
        if (j == 0) {
          break;
        }
        j = prefixArr[j - 1];
      }

      if (subArray[i] == subArray[j]) {
        prefixArr[i] = j + 1;
        j++;
      } else {
        prefixArr[i] = j;
      }

    }
    return prefixArr;
  }


A little bit optimized code that was posted before:

public int findArray(byte[] largeArray, byte[] subArray) {
    if (subArray.length == 0) {
        return -1;
    }
    int limit = largeArray.length - subArray.length;
    next:
    for (int i = 0; i <= limit; i++) {
        for (int j = 0; j < subArray.length; j++) {
            if (subArray[j] != largeArray[i+j]) {
                continue next;
            }
        }
        /* Sub array found - return its index */
        return i;
    }
    /* Return default value */
    return -1;
}


int findSubArr(int[] arr,int[] subarr)
{
    int lim=arr.length-subarr.length;

    for(int i=0;i<=lim;i++)
    {
        int[] tmpArr=Arrays.copyOfRange(arr,i,i+subarr.length);
        if(Arrays.equals(tmpArr,subarr))
            return i;   //returns starting index of sub array
    }
    return -1;//return -1 on finding no sub-array   
}

UPDATE:

By reusing the same int array instance:

int findSubArr(int[] arr,int[] subarr)
{
    int lim=arr.length-subarr.length;
    int[] tmpArr=new int[subarr.length];
    for(int i=0;i<=lim;i++)
    {
        System.arraycopy(arr,i,tmpArr,0,subarr.length);
        if(Arrays.equals(tmpArr,subarr))
          return i; //returns starting index of sub array
    }
    return -1;//return -1 on finding no sub-array   

}


I would suggest the following improvements:

  • make the function static so that you can avoid creating an instance
  • the outer loop condition could be i <= largeArray.length-subArray.length, to avoid a test inside the loop
  • remove the test (largeArray[i] == subArray[0]) that is redundant


Here's #indexOf from String:

/**
 * Code shared by String and StringBuffer to do searches. The
 * source is the character array being searched, and the target
 * is the string being searched for.
 *
 * @param   source       the characters being searched.
 * @param   sourceOffset offset of the source string.
 * @param   sourceCount  count of the source string.
 * @param   target       the characters being searched for.
 * @param   targetOffset offset of the target string.
 * @param   targetCount  count of the target string.
 * @param   fromIndex    the index to begin searching from.
 */
static int indexOf(char[] source, int sourceOffset, int sourceCount,
        char[] target, int targetOffset, int targetCount,
        int fromIndex) {
    if (fromIndex >= sourceCount) {
        return (targetCount == 0 ? sourceCount : -1);
    }
    if (fromIndex < 0) {
        fromIndex = 0;
    }
    if (targetCount == 0) {
        return fromIndex;
    }

    char first = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
        /* Look for first character. */
        if (source[i] != first) {
            while (++i <= max && source[i] != first);
        }

        /* Found first character, now look at the rest of v2 */
        if (i <= max) {
            int j = i + 1;
            int end = j + targetCount - 1;
            for (int k = targetOffset + 1; j < end && source[j]
                    == target[k]; j++, k++);

            if (j == end) {
                /* Found whole string. */
                return i - sourceOffset;
            }
        }
    }
    return -1;
}


First to your possible reasons:

  1. Yes. And the class final with a private constructor.
  2. Shouldn't use this kind of comments at all. The code should be self-explanatory.
  3. You're basically implicitly checking for null by accessing the length field which will throw a NullPointerException. Only in the case of a largeArray.length == 0 and a subArray == null will this slip through.

More potential reasons:

  • The class doesn't contain any function for array manipulations, opposed to what the documentation says.
  • The documentation for the method is very sparse. It should state when and which exceptions are thrown (e.g. NullPointerException) and which return value to expect if the second array isn't found or if it is empty.
  • The code is more complex than needed.
    1. Why is the equality of the first elements so important that it gets its own check?
    2. In the first loop, it is assumed that the second array will be found, which is unintentional.
    3. Unneeded variable and jump (boolean and break), further reducing legibility.
    4. largeArray.length <= i+j is not easy to grasp. Should be checked before the loop, improving the performance along the way.
    5. I'd swap the operands of subArray[j] != largeArray[i+j]. Seems more natural to me.
    6. All in all too long.
  • The test code is lacking more edge cases (null arrays, first array empty, both arrays empty, first array contained in second array, second array contained multiple times etc.).
  • Why is the last test case named testFindArrayExistsVeryComplex?

What the exercise is missing is a specification of the component type of the array parameters, respectively the signature of the method. It makes a huge difference whether the component type is a primitive type or a reference type. The solution of adietrich assumes a reference type (thus could be generified as further improvement), mine assumes a primitive type (int).

So here's my shot, concentrating on the code / disregarding documentation and tests:

public final class ArrayUtils {
    // main method

    public static int indexOf(int[] haystack, int[] needle) {
        return indexOf(haystack, needle, 0);
    }

    // helper methods

    private static int indexOf(int[] haystack, int[] needle, int fromIndex) {
        for (int i = fromIndex; i < haystack.length - needle.length; i++) {
            if (containsAt(haystack, needle, i)) {
                return i;
            }
        }
        return -1;
    }

    private static boolean containsAt(int[] haystack, int[] needle, int offset) {
        for (int i = 0; i < needle.length; i++) {
            if (haystack[i + offset] != needle[i]) {
                return false;
            }
        }
        return true;
    }

    // prevent initialization

    private ArrayUtils() {}
}


    byte[] arr1 = {1, 2, 3, 4, 5, 6, 7, 7, 8, 9, 1, 3, 4, 56, 6, 7};
    byte[] arr2 = {9, 1, 3};

    boolean i = IsContainsSubArray(arr1, arr2);

 public static boolean IsContainsSubArray(byte[] Large_Array, byte[] Sub_Array){
    try {
        int Large_Array_size, Sub_Array_size, k = 0;

        Large_Array_size = Large_Array.length;
        Sub_Array_size = Sub_Array.length;

        if (Sub_Array_size > Large_Array_size) {
            return false;
        }
        for (int i = 0; i < Large_Array_size; i++) {
            if (Large_Array[i] == Sub_Array[k]) {
                k++;
            } else {
                k = 0;
            }
            if (k == Sub_Array_size) {
                return true;
            }
        }
    } catch (Exception e) {
    }
    return false;
}


Code from Guava:

import javax.annotation.Nullable;

/**
 * Ensures that an object reference passed as a parameter to the calling method is not null.
 *
 * @param reference an object reference
 * @param errorMessage the exception message to use if the check fails; will be converted to a
 *     string using {@link String#valueOf(Object)}
 * @return the non-null reference that was validated
 * @throws NullPointerException if {@code reference} is null
 */
public static <T> T checkNotNull(T reference, @Nullable Object errorMessage) {
    if (reference == null) {
        throw new NullPointerException(String.valueOf(errorMessage));
    }
    return reference;
}


/**
 * Returns the start position of the first occurrence of the specified {@code
 * target} within {@code array}, or {@code -1} if there is no such occurrence.
 *
 * <p>More formally, returns the lowest index {@code i} such that {@code
 * java.util.Arrays.copyOfRange(array, i, i + target.length)} contains exactly
 * the same elements as {@code target}.
 *
 * @param array the array to search for the sequence {@code target}
 * @param target the array to search for as a sub-sequence of {@code array}
 */
public static int indexOf(int[] array, int[] target) {
    checkNotNull(array, "array");
    checkNotNull(target, "target");
    if (target.length == 0) {
        return 0;
    }

    outer:
    for (int i = 0; i < array.length - target.length + 1; i++) {
        for (int j = 0; j < target.length; j++) {
            if (array[i + j] != target[j]) {
                continue outer;
            }
        }
        return i;
    }
    return -1;
}


I would to do it in three ways:

  1. Using no imports i.e. using plain Java statements.

  2. Using JAVA core APIs - to some extent or to much extent.

  3. Using string pattern search algorithms like KMP etc. (Probably the most optimized one.)

1,2 and 3 are all shown above in the answers. Here is approach 2 from my side:

public static void findArray(int[] array, int[] subArray) {

        if (subArray.length > array.length) {
            return;
        }

        if (array == null || subArray == null) {
            return;
        }

        if (array.length == 0 || subArray.length == 0) {
            return;
        }

        //Solution 1
        List<Integer> master = Arrays.stream(array).boxed().collect(Collectors.toList());
        List<Integer> pattern = IntStream.of(subArray).boxed().collect(Collectors.toList());

        System.out.println(Collections.indexOfSubList(master, pattern));

        //Solution2
        for (int i = 0; i <= array.length - subArray.length; i++) {
            String s = Arrays.toString(Arrays.copyOfRange(array, i, i + subArray.length));

            if (s.equals(Arrays.toString(subArray))) {
                System.out.println("Found at:" + i);
                return;
            }
        }
        System.out.println("Not found.");
    }


Using java 8 and lambda expressions:

String[] smallArray = {"1","2","3"};
final String[] bigArray = {"0","1","2","3","4"};
boolean result = Arrays.stream(smallArray).allMatch(s -> Arrays.stream(bigArray).anyMatch(b -> b.equals(s)));

PS: is important to have finalString[] bigArray for enclosing space of lambda expression.


FYI: if the goal is simply to search wether an array y is a subset of an array x, we can use this:

val x = Array(1,2,3,4,5)
val y = Array(3,4,5)
val z = Array(3,4,8)
x.containsSlice(y) // true
x.containsSlice(z) // false
0

精彩评论

暂无评论...
验证码 换一张
取 消