I want to extract a word from a string column of a table.
description
===========================
abc order_id: 2 xxxx yyy aa
mmm order_id: 3 nn kk yw
Expected result set
order_id
===========================
2
3
Table will at most have 100 rows, text length is ~256 char and column always has one order_id
present. So performance is not a开发者_Go百科n issue.
In Oracle, I can use REGEXP_SUBSTR
for this problem. How would I solve this in MySQL?
Edit 1
I am using LOCATE and SUBSTR to solve the problem. The code is ugly. Ten minutes after writing the code, I am cursing the guy who wrote such an ugly code.
I didn't find the REGEXP_SUBSTR function in MySQL docs. But I am hoping that it exists..
Answer to : Why cant the table be optimized? Why is the data stored in such a dumb fashion?
The example I gave just denotes the problem I am trying to solve. In real scenario, I am using a DB based 3rd party queuing software for executing asynchronous tasks. The queue serializes the Ruby object as text. I have no control over the table structure OR the data format. The tasks in the queue can be recurring. In our test setup, some of the recurring tasks are failing because of stale data. I have to delete these tasks to prevent the error. Such errors are not common, hence I don't want to maintain a normalized shadow table.
"I didn't find the REGEXP_SUBSTR function in MySQL docs. But I am hoping that it exists.."
Yes, starting from MySQL 8.0 it is supported. Regular Expressions:
REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])
Returns the substring of the string expr that matches the regular expression specified by the pattern pat, NULL if there is no match. If expr or pat is NULL, the return value is NULL.
Like Konerak said, there is no equivalent of REGEXP_SUBSTR in MySql. You could do what you need using SUBSTRING logic, but it is ugly :
SELECT
SUBSTRING(lastPart.end, 1, LOCATE(' ', lastPart.end) - 1) AS orderId
FROM
(
SELECT
SUBSTRING(dataset.description, LOCATE('order_id: ', dataset.description) + LENGTH('order_id: ')) AS end
FROM
(
SELECT 'abc order_id: 2 xxxx yyy aa' AS description
UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
) AS dataset
) AS lastPart
Edit: You could try this user defined function providing access to perl regex in MySql
SELECT
PREG_CAPTURE( '/.*order_id:\s(\d+).*/', dataset.description,1)
FROM
(
SELECT 'abc order_id: 2 xxxx yyy aa' AS description
UNION SELECT 'mmm order_id: 3 nn kk yw' AS description
UNION SELECT 'mmm order_id: 1523 nn kk yw' AS description
) AS dataset
or you can do this and save yourself the ugliness :
select SUBSTRING_INDEX(SUBSTRING_INDEX('habc order_id: 2 xxxx yyy aa',' ',3),' ',-1);
There is no MySQL equivalent. The MySQL REGEXP can be used for matching strings, but not for transforming them.
You can either try to work with stored procedures and a lot of REPLACE/SUBSTRING logic, or do it in your programming language - which should be the easiest option.
But are you sure your data format is well chosen? If you need the order_id, wouldn't it make sense to store it in a different column, so you can put indexes, use joins and the likes?
精彩评论