开发者

Format list of urls in mysql

开发者 https://www.devze.com 2023-03-17 19:02 出处:网络
I have a list of a million or urls in an mysql table. 开发者_如何学编程I need to cleanse the data (extract domains) so I can be confident about DISTINCT type queries.

I have a list of a million or urls in an mysql table.

开发者_如何学编程I need to cleanse the data (extract domains) so I can be confident about DISTINCT type queries.

Data is in several different types: -

www.domain.tld
domain.tld
http://domain.tld
https://vhost.domain.tld
domain.tld/

There are invalid domains and empty data.

Ideally I'd like to do something along the lines of : -

UPDATE table1 SET domain = website REGEXP '^(https?://)?[a-zA-Z0-9\\\\.\\\\-]+(/|$|\\\\?)'

domain being a new empty field, website being the original url.


You can't use regex like that in MySQL as is, but apparently you can some some UDFs that implement it. See:

  • How to do a regular expression replace in MySQL?
  • https://launchpad.net/mysql-udf-regexp
  • http://www.mysqludf.org/lib_mysqludf_preg/
0

精彩评论

暂无评论...
验证码 换一张
取 消