开发者

codingBat separateThousands using regex (and unit testing how-to)

开发者 https://www.devze.com 2022-12-27 20:12 出处:网络
This question is a combination of regex practice and unit testing practice. Regex pa开发者_Go百科rt

This question is a combination of regex practice and unit testing practice.

Regex pa开发者_Go百科rt

I authored this problem separateThousands for personal practice:

Given a number as a string, introduce commas to separate thousands. The number may contain an optional minus sign, and an optional decimal part. There will not be any superfluous leading zeroes.

Here's my solution:

String separateThousands(String s) {
  return s.replaceAll(
      String.format("(?:%s)|(?:%s)",
        "(?<=\\G\\d{3})(?=\\d)",
        "(?<=^-?\\d{1,3})(?=(?:\\d{3})+(?!\\d))"
      ),
      ","
  );
}

The way it works is that it classifies two types of commas, the first, and the rest. In the above regex, the rest subpattern actually appears before the first. A match will always be zero-length, which will be replaceAll with ",".

The rest basically looks behind to see if there was a match followed by 3 digits, and looks ahead to see if there's a digit. It's some sort of a chain reaction mechanism triggered by the previous match.

The first basically looks behind for ^ anchor, followed by an optional minus sign, and between 1 to 3 digits. The rest of the string from that point must match triplets of digits, followed by a nondigit (which could either be $ or \.).

My question for this part is:

  • Can this regex be simplified?
  • Can it be optimized further?
    • Ordering rest before first is deliberate, since first is only needed once
    • No capturing group

Unit testing part

As I've mentioned, I'm the author of this problem, so I'm also the one responsible for coming up with testcases for them. Here they are:

INPUT, OUTPUT
"1000", "1,000"
"-12345", "-12,345"
"-1234567890.1234567890", "-1,234,567,890.1234567890"
"123.456", "123.456"
".666666", ".666666"
"0", "0"
"123456789", "123,456,789"
"1234.5678", "1,234.5678"
"-55555.55555", "-55,555.55555"
"0.123456789", "0.123456789"
"123456.789", "123,456.789"

I haven't had much experience with industrial-strength unit testing, so I'm wondering if others can comment whether this is a good coverage, whether I've missed anything important, etc (I can always add more tests if there's a scenario I've missed).


This works for me:

return s.replaceAll("(\\G-?\\d{1,3})(?=(?:\\d{3})++(?!\\d))", "$1,");

The first time through, \G acts the same as ^, and the lookahead forces \d{1,3} to consume only as many characters as necessary to leave the match position at a three-digit boundary. After that, \d{1,3} consumes the maximum three digits every time, with \G to keep it anchored to the end of the previous match.

As for your unit tests, I would just make it clear in the problem description that the input will always be valid number, with at most one decimal point.


When you state the requirements are you intending for them to be enforced by your method?

The number may contain an optional minus sign, and an optional decimal part. There will not be any superfluous leading zeroes.

If your intent is to have the method detect when those constraints are violated you will need additional to write additional unit-tests to ensure that contract is being enforced.

What about testing for 1234.5678.91011?

Do you expect your method to return 1,234.5678.91011 or just ignore the whole thing? Best to write a test to verify your expectations

0

精彩评论

暂无评论...
验证码 换一张
取 消