开发者

Why is "out of range" not thrown for 'substring(startIndex, endIndex)'

开发者 https://www.devze.com 2023-01-06 18:18 出处:网络
In Java I am using the substring() method and I\'m not sure why it is not throwing an \"out of index\" error.

In Java I am using the substring() method and I'm not sure why it is not throwing an "out of index" error.

The string abcde has index start from 0 to 4, but the substring() method takes startIndex and endIndex as arguments based on the fact that I can call foo.substring(0) and get "abcde".

Then why does substring(5) work? That index should be out of range. What is the explanation?

/*
1234
abcde
*/
String foo = "abcde";
System.out.println(foo.substring(0));
System.out.println(foo.substring(1));
System.out.println(foo.substring(2));
System.out.println(foo.substring(3));
System.out.println(foo.substring(4));
System.out.println(foo.substring(5));

This code outputs:

abcde
bcde
cde
de
e
     //foo.substring(5) output nothing here, isn't this out of range?

When I replace 5 with 6:

foo.substring(6)

Then I get error:

Excepti开发者_Go百科on in thread "main" java.lang.StringIndexOutOfBoundsException:
    String index out of range: -1


According to the Java API doc, substring throws an error when the start index is greater than the Length of the String.

IndexOutOfBoundsException - if beginIndex is negative or larger than the length of this String object.

In fact, they give an example much like yours:

"emptiness".substring(9) returns "" (an empty string)

I guess this means it is best to think of a Java String as the following, where an index is wrapped in |:

|0| A |1| B |2| C |3| D |4| E |5|

Which is to say a string has both a start and end index.


When you do foo.substring(5), it gets the substring starting at the position right after the "e" and ending at the end of the string. Incidentally, the start and end position happen to be the same. Thus, empty string. You can think of the index as being not an actual character in the string, but a position in between characters.

        ---------------------
String: | a | b | c | d | e |
        ---------------------
Index:  0   1   2   3   4   5


It's because the substring function returns an "inclusive" substring. So the index 5 points to a location BEFORE the end of the string, but AFTER the last displaying character of the string.

This is shown in the documentation: http://download.oracle.com/docs/cd/E17476_01/javase/1.4.2/docs/api/java/lang/String.html#substring(int)


I know this thread is quite old but this is such a fundamental problem that I think it warrants clarification.

The question is properly spot on. I view this as a software fault in the Java String.substring(int beginIndex, int endIndex) method.

http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring%28int,%20int%29.

From the Java Docs https://docs.oracle.com/javase/tutorial/java/nutsandbolts/arrays.html

Why is "out of range" not thrown for 'substring(startIndex, endIndex)'

Java/C/C++ and every other language that I know of does NOT view the array index as the 'divider' between array elements.

Parameters: beginIndex - the beginning index, inclusive. endIndex - the ending index, exclusive.

Either endIndex is misnamed because the language does not allow memory access to the address at endIndex + 1 which is required to include the last array element OR endIndex is mis-defined and must be: endIndex - the ending index, inclusive.

The most likely case is that the second parameter was misnamed. It should be: length - the length of the string desired beginning at beginIndex.

We know that Gosling based the Java syntax on the C/C++ languages for familiarity. From C+++ string class http://www.cplusplus.com/reference/string/string/substr/ we see the method definition is:

string substr (size_t pos = 0, size_t len = npos) const;

Note that the second parameter in the method definition is 'len' for length.

len Number of characters to include in the substring (if the string is shorter, as many characters as possible are used).

testString has 10 chars, index positions 0 to 9. Specifying an endIndex of 10 should always throw the IndexOutOfBoundsException() because testString has no endIndex of 10.

If we test the method in JUnit with concrete values looking at the C++ method, we expect:

String testString = "testString"; assertThat(testString.substring(4, 6), equalTo("String"));

but of course we get Expected: "String" but was "St"

The length of testString from index 0 to char 'g' in 'String' is 10 chars. If we use 10 as the 'endIndex' parameter,

String testString = "testString"; assertThat(testString.substring(4, 10), equalTo("String"));

"Pass" from JUnit.

If we rename parameter 2 to "lengthOfSubstringFromIndex0" you don't have to do the endIndex - 1 count, and it never throws the IndexOutOfBoundsException() that is expected when specifying an endIndex, 10, that is out of range for the underlying array. http://docs.oracle.com/javase/7/docs/api/java/lang/IndexOutOfBoundsException.html

This is just one of those times that you have to remember the idiosyncrasy of this method. The second parameter is not named correctly. The Java method signature should be:

public String substring(int beginIndex,
           int lengthOfSubstringFromIndex0)

Or the method redefined to match C++ string::substr method. Redefining of course would mean rewriting the entire internet, so it's not likely.


From String API javadoc:

public String substring(int beginIndex)
    Returns a new string that is a substring of this 
    string. The substring begins with the "" character 
    at the specified index and extends to the end of this string.

public String substring(int beginIndex, int endIndex)
    Returns a new string that is a substring of this 
    string. The substring begins at the specified beginIndex 
    and extends to the character at index endIndex - 1. Thus 
    the length of the substring is endIndex-beginIndex.

Examples:

"unhappy".substring(2) returns "happy" 
"Harbison".substring(3) returns "bison"
"emptiness".substring(9) returns "" (an empty string)

"hamburger".substring(4, 8) returns "urge"
"smiles".substring(1, 5) returns "mile"

Parameters:

beginIndex - the beginning index, inclusive.
Returns:
the specified substring.
Throws:
IndexOutOfBoundsException - if beginIndex is negative or 
larger than the length of this String object.

====

So this is by design. If you give the index as the size of the string, it returns empty string.


substring(5) points to an existing index...it just happens to point to an empty string. substring(6), on the other hand, is just crazy talk. :)

0

精彩评论

暂无评论...
验证码 换一张
取 消