开发者

How do I extract info from a block of URLs in php?

开发者 https://www.devze.com 2022-12-29 16:00 出处:网络
I have a list of urls, which can come in any format. One per line, separated by commas, have random text in between them, etc. the URLs are all from 2 different sites, and have a similar structure

I have a list of urls, which can come in any format. One per line, separated by commas, have random text in between them, etc. the URLs are all from 2 different sites, and have a similar structure

For this example, lets say it looks like this

Random Text - http://www.domain2.com/variable-value
Random Text 2 - http://www.domain1.com/variable-value, http://www.domain1.com/variable-value, http://www.domain1.com/variable-value

http://www.domain1.com/variable-value
http://www.domain2.com/variable-value
http://www.domain1.com/variable-value http://www.domain2.com/variable-value http://www.domain1.com/variable-value

I need to extract 2 pieces of information. Check to see if its domain1 or domain2 and the value that follows "variable-"

So it should create a multi-dimensional array, which would have 2 items: domain +开发者_如何学JAVA value.

Whats the best way of doing that?


This is a possiblity of extracting the urls. The only problem is that the urls itself may not contain a comma. So if is enough....

$lines = explode('\n', $urls);

for($i = 0; $i < sizeof($lines); $i++)
{
    if(preg_match_all("http:\\/\\/[^,]*variable-([^,]+)", $lines[$i], $matches))
    {

    }
}

By the way... matches are stored in the $matches array.

P.S: Edited... i forgot to escape the backslash and you should search the string line for line to ensure a correct behaviour... test the regex at http://www.regex-tester.de/regex.html... it just worked out with my regex.

P.P.S: After further researches i found this page: http://internet.ls-la.net/folklore/url-regexpr.html. It contains the regular expression for a url. You could use it to extract the urls first and in the second step you could go through your urls and extract the variable information looking for e.g. variable-([\W]+).


preg_split, preg_match, parse_url

// split urls
$urls = preg_split('!,\s+!', 'http://www.domain1.com/variable-value, http://www.domain2.com/variable-value, http://www.domain3.com/variable-value');

// check for domain and path variable
foreach ($urls as $url) {

    $parts = parse_url($url);
    // check domain: $parts['host'];
    $matches = array();
    // check path: preg_match('!^/variable-([^/]+)!', $parts['path'], $matches)
}


$text = "http://www.domain1.com/variable-value1, http://www.domain2.com/variable-value2 http://www.domain1.com/variable-value3";
preg_match_all("/http:\\/\\/(.+?)\\/variable-([a-z0-9]+)/si", $text, $matches);
print_r($matches);

Result:

Array
(
    [0] => Array
        (
            [0] => http://www.domain1.com/variable-value1
            [1] => http://www.domain2.com/variable-value2
            [2] => http://www.domain1.com/variable-value3
        )

    [1] => Array
        (
            [0] => www.domain1.com
            [1] => www.domain2.com
            [2] => www.domain1.com
        )

    [2] => Array
        (
            [0] => value1
            [1] => value2
            [2] => value3
        )

)
0

精彩评论

暂无评论...
验证码 换一张
取 消