开发者

help with linux shell script using wget and sed

开发者 https://www.devze.com 2023-02-07 18:31 出处:网络
hi can someone assist me with setting up a shell script that does the following? wget to http://site.com/xap/wp7?p=1

hi can someone assist me with setting up a shell script that does the following?

  1. wget to http://site.com/xap/wp7?p=1
  2. view the html extract all the ProductName's from in between title="Free Shipping ProductName"> ... ex: title="Free Shipping HD7-Case001"> , HD7-Case001 is extracted.
  3. output to products.txt
  4. then loop through the process with step 1. url http://site.com/xap/wp7?p=1 where "1" is page number up t开发者_C百科o number 50. ex. http://..wp7?p=1, http://..wp7?p=2, http://..wp7?p=3

i've done some research on my own and have this much code written myself... definitely needs a lot more work

#! /bin/sh
... 

while read page; do
wget -q -O- "http://site.com/xap/wp7?p=$page" | 
sed ...

done < "products.txt"


#/bin/bash

for page in {1..50}
do
  wget -q "http://site.com/xap/wp7?p=$page" -O - \
    | tr '"' '\n' | grep "^Free Shipping " | cut -d ' ' -f 3 > products.txt
done

The tr is turning each double-quote into a newline, so the output of tr will be something like:

<html>
...
... <tag title=
Free Shipping [Product]
> ...

Basically, it's a way to put each Product on its own line.

Next, the grep is trying to throw away all the other lines except the ones that start with Free Shipping, so its output should be like:

Free Shipping [Product1]
Free Shipping [Product2]
...

Next, the cut is extracting out the third "column" (delimited by spaces), so the output should be:

[Product1]
[Product2]
...


you can combine with PHP for the xml parsing

the wget bash script

#/bin/bash

for page in {1..50}
do
  wget -q -O /tmp/$page.xml "http://site.com/xap/wp7?p=$page"
  php -q xml.php $page >> products.txt
done

xml.php

<?
$file = '/tmp/'.$argv[1].'.xml';
// assumeing the following format
//<Products><Product title="Free Shipping ProductName"/></Products>

$xml = simplexml_load_file($file);
echo $xml->Product->attributes()->title;
/* you can make any replacement only parse/obtain the correct node attribute */
?>

Not a great idea, but PHP simplexml provide some simple way to parse xml.
hope this can be some kick start idea

0

精彩评论

暂无评论...
验证码 换一张
取 消