开发者

Postgres regular expressions and regexp_split_to_array

开发者 https://www.devze.com 2023-02-17 20:28 出处:网络
In postgresql, I need to extract the first two words in the value for a given column.So if the value is \"hello world moon and stars\" or \"hello world moon\" or even just \"hello world\", I need \"he

In postgresql, I need to extract the first two words in the value for a given column. So if the value is "hello world moon and stars" or "hello world moon" or even just "hello world", I need "hello world".

I was hoping to use regexp_split_to_array but it doesn't seem that I can use this and access the elements returned in the same query?

Do I need to create a funct开发者_如何学JAVAion for what I'm trying to do?


I can't believe that 5 years ago and no one noticed that you can access elements from regexp_split_to_array function if you surround them with parenthesis.

I saw many people tried to access the elements of the table like this:

select regexp_split_to_array(my_field, E'my_pattern')[1] from my_table

The previous will return an error, but the following will not :

select (regexp_split_to_array(my_field, E'my_pattern'))[1] from my_table


You can use POSIX regular expressions with PostgreSQL's substring():

select substring('hello world moon' from E'^\\w+\\s+\\w+');

Or with a very liberal interpretation of what a word is:

select substring('it''s a nice day' from E'^\\S+\\s+\\S+');

Note the \S (non-whitespace) instead of \w ("word" character, essentially alphanumeric plus underscore).

Don't forget all the extra quoting nonsense though:

  • The E'' to tell PostgreSQL that you're using extending escaping.
  • And then double backslashes to get single backslashes past the string parser and in to the regular expression parser.

If you really want to use regexp_split_to_array, then you can but the above quoting issues apply and I think you'd want to slice off just the first two elements of the array:

select (regexp_split_to_array('hello world moon', E'\\s+'))[1:2];

I'd guess that the escaping was causing some confusion; I usually end up adding backslashes until it works and then I pick it apart until I understand why I needed the number of backslashes that I ended up using. Or maybe the extra parentheses and array slicing syntax was an issue (it was for me but a bit of experimentation sorted it out).


found one answer:

select split_part('hello world moon', ' ', 1) || ' ' || split_part('hello world moon', ' ', 2);


select substring(my_text from $$^\S+\s+\S+$$) from v;

  substring
-------------
 hello world
 hello world
 hello world
(3 rows)

where for the purpose of demonstration, v is:

create view v as select 'hello world moon and stars' as my_text union all 
                 select 'hello world mood' union all 
                 select 'hello world';

if you want to ignore whitespace at the beginning:

select substring(my_text from $$^\s*\S+\s+\S+$$) from v;
0

精彩评论

暂无评论...
验证码 换一张
取 消