开发者

Parsing a complex version string

开发者 https://www.devze.com 2023-02-10 06:30 出处:网络
I have a string that\'s of the following scheme: VersionNumber.VersionString-VersionNumber.VersionStrin开发者_StackOverflow社区g

I have a string that's of the following scheme:

VersionNumber.VersionString-VersionNumber.VersionStrin开发者_StackOverflow社区g

Such that the following example strings can be converted into arrays of information:

1. 1.x-2.x             => (1, 'x', '2', 'x')
2. 1.2-3.4             => (1, 2, 3, 4)
3. 1.2-3.4-beta5       => (1, 2, 3, '4-beta5')
4. 1.2-beta3-3.4       => (1, '2-beta3', 3, 4)
5. 1.2-beta3-4.5-beta6 => (1, '2-beta3', 4, '5-beta6')

The logic for the parse is:

  1. First element is everything before the first period.
  2. Second element is everything up to a hyphen immediately before a number.
  3. Third element always starts with a number and is everything up to the next period.
  4. Fourth element is everything after the period.

Notes:

  • Second element is an arbitrary string, but will never have a hyphen that immediately precedes a number (e.g. 2-3 is not valid, but 2-beta4 is).
  • Third element always starts with a number, and begins right after a hyphen.

I've been able to parse the first three cases using the following expression:

(.+?).(.+?)-(.+?).(.*)

But I'm not sure how to modify it to handle cases 4 and 5 (when the second element contains a hyphen). The two approaches I thought of were:

  1. Modify the second group to match everything before a hyphen immediately preceding a digit.
  2. Modify the second group to match everything until it hits a second hyphen only if the first hyphen immediately precedes a non-digit character.

Presumably the first approach is the correct/simplest way to do it, but I'm struggling with coming up with the correct regexp to express it.


Try this:

(.+?)\.(.*)-(.+?)\.(.*)

actually, even this will work:

(.*)\.(.*)-(.*)\.(.*)

Your problem was that you were not escaping the period so it was treating it as match any char rather than match a period.

UPDATE:

So if a VersionString can contain periods/hyphens, following your parse logic, this should work:

(\d*)\.(.*)-(\d*)\.(.*)

It says,

  1. match all numbers (matches your first VersionNumber)
  2. match everything between the first period and the last hyphen before a digit (thanks to the greedy match)
  3. match all digits after the hyphen but before the period
  4. match the rest

The string:

1.2-b.e.t.a.3-4.5-b.e.t.a.6 => '1'  '2-b.e.t.a.3'   '4' '5-b.e.t.a.6'

It also works if you go crazy with hyphens in the versionstring too:

1.2-b-e.t-a.3-4.5-b-e.t-a.6 => '1'  '2-b-e.t-a.3'   '4' '5-b-e.t-a.6'


Can VersionString ever contain a dot? If not, this should work:

(\d+)\.([^.]+)-(\d+)\.(\S+)

The [^.]+ initially matches everything up to the next dot, but then backtracks a little bit. If VersionString can contain a dot, you can use this:

(\d+)\.(\S+?)-(\d+)\.(\S+)

Matching digits explicitly in the VersionNumber part serves to enforce your "digit preceded by a hyphen" rule.

(Actually, (.+?) works just as well; I used (\S+?) because I was testing the regex plucking the version strings out of the full text of your message.)

EDIT: Per the comments below, here's the final version:

(\d+[^.]*)\.(\S+?)-(\d+[^.]*)\.(\S+)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号