Product code looks like abcd2343, how to split by letters and numbers?_问答_开发者

Product code looks like abcd2343, how to split by letters and numbers?

开发者 https://www.devze.com 2023-01-09 04:19 出处：网络

I have a list of product codes in a text file, on each line is the product code that looks like: abcd2343 abw34324 abc3243-开发者_开发技巧23A

I have a list of product codes in a text file, on each line is the product code that looks like:

abcd2343 abw34324 abc3243-开发者_开发技巧23A

So it is letters followed by numbers and other characters.

I want to split on the first occurrence of a number.

import re
s='abcd2343 abw34324 abc3243-23A'
re.split('(\d+)',s)

> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']

Or, if you want to split on the first occurrence of a digit:

re.findall('\d*\D+',s)
> ['abcd', '2343 abw', '34324 abc', '3243-', '23A']

\d+ matches 1-or-more digits.
\d*\D+ matches 0-or-more digits followed by 1-or-more non-digits.
\d+|\D+ matches 1-or-more digits or 1-or-more non-digits.

Consult the docs for more about Python's regex syntax.

re.split(pat, s) will split the string s using pat as the delimiter. If pat begins and ends with parentheses (so as to be a "capturing group"), then re.split will return the substrings matched by pat as well. For instance, compare:

re.split('\d+', s)
> ['abcd', ' abw', ' abc', '-', 'A']   # <-- just the non-matching parts

re.split('(\d+)', s)
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']  # <-- both the non-matching parts and the captured groups

In contrast, re.findall(pat, s) returns only the parts of s that match pat:

re.findall('\d+', s)
> ['2343', '34324', '3243', '23']

Thus, if s ends with a digit, you could avoid ending with an empty string by using re.findall('\d+|\D+', s) instead of re.split('(\d+)', s):

s='abcd2343 abw34324 abc3243-23A 123'

re.split('(\d+)', s)
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123', '']

re.findall('\d+|\D+', s)
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123']

This function handles float and negative numbers as well.

def separate_number_chars(s):
    res = re.split('([-+]?\d+\.\d+)|([-+]?\d+)', s.strip())
    res_f = [r.strip() for r in res if r is not None and r.strip() != '']
    return res_f

For example:

utils.separate_number_chars('-12.1grams')
> ['-12.1', 'grams']

import re

m = re.match(r"(?P<letters>[a-zA-Z]+)(?P<the_rest>.+)$",input)

m.group('letters')
m.group('the_rest')

This covers your corner case of abc3243-23A and will output abc for the letters group and 3243-23A for the_rest

Since you said they are all on individual lines you'll obviously need to put a line at a time in input

def firstIntIndex(string):
    result = -1
    for k in range(0, len(string)):
        if (bool(re.match('\d', string[k]))):
            result = k
            break
    return result

To partition on the first digit

parts = re.split('(\d.*)','abcd2343')      # => ['abcd', '2343', '']
parts = re.split('(\d.*)','abc3243-23A')   # => ['abc', '3243-23A', '']

So the two parts are always parts[0] and parts[1].

Of course, you can apply this to multiple codes:

>>> s = "abcd2343 abw34324 abc3243-23A"
>>> results = [re.split('(\d.*)', pcode) for pcode in s.split(' ')]
>>> results
[['abcd', '2343', ''], ['abw', '34324', ''], ['abc', '3243-23A', '']]

If each code is in an individual line then instead of s.split( ) use s.splitlines().

Try this code it will work fine

import re
text = "MARIA APARECIDA 99223-2000 / 98450-8026"
parts = re.split(r' (?=\d)',text, 1)
print(parts)

Output:

['MARIA APARECIDA', '99223-2000 / 98450-8026']

Product code looks like abcd2343, how to split by letters and numbers?

精彩评论

关注公众号

热门标签

图文推荐

Product code looks like abcd2343, how to split by letters and numbers?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：