crawling a html page using php?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-20 00:40 出处：网络

This website lists over 250 courses in one list. I want to get the name of each course and insert that int开发者_Go百科o my mysql database using php. The courses are listed like this:

<td> computer science</td>
<td> media studeies</td>
…

Is there a way to do that in PHP, instead of me having a mad data entry nightmare?

Regular expressions work well.

$page = // get the page
$page = preg_split("/\n/", $page);
for ($text in $page) {
    $matches = array();
    preg_match("/^<td>(.*)<\/td>$/", $text, $matches);
    // insert $matches[1] into the database
}

See the documentation for preg_match.

How to parse HTML has been asked and answered countless times before. While (for your specific UseCase) Regular Expressions will work, it is - in general - better and more reliable to use a proper parser for this task. Below is how to do it with DOM:

$dom = new DOMDocument;
$dom->loadHTMLFile('http://courses.westminster.ac.uk/CourseList.aspx');
foreach($dom->getElementsByTagName('td') as $title) {
    echo $title->nodeValue;
}

For inserting the data into MySql, you should use the mysqli extension. Examples are plentiful on StackOverflow. so please use the search function.

You can use this HTML parsing php library to achieve this :http://simplehtmldom.sourceforge.net/

I encountered the same problem. Here is a good class library called the html dom http://simplehtmldom.sourceforge.net/. This like jquery

Just for fun, here's a quick shell script to do the same thing.

curl http://courses.westminster.ac.uk/CourseList.aspx \
| sed '/<td>\(.*\)<\/td>/ { s/.*">\(.*\)<\/a>.*/\1/; b }; d;' \
| uniq > courses.txt

crawling a html page using php?

精彩评论

关注公众号

热门标签

图文推荐

crawling a html page using php?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：