开发者

extract xml data/content from a url using shell scripting

开发者 https://www.devze.com 2023-02-17 00:56 出处:网络
I need the xml content to be downloaded in file.xml from a url .say for example this is the url http://www.pistonheads.co.uk/xml/news091.asp?c=26I want to extract the xml content of it as follows to a

I need the xml content to be downloaded in file.xml from a url .say for example this is the url http://www.pistonheads.co.uk/xml/news091.asp?c=26 I want to extract the xml content of it as follows to a file.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="0.91">
<channel>
<title>PistonHeads (Motoring News)</title>
<link>http://www.pistonheads.com/news/</link>
<description>Motoring News</description>

<item>
<title>Bowler Nemesis Joins Spyker At CPP</title>
<description>Plans confired for Nemesis EXR road car to be built in Coventry</description>
</item>
</channel>
</rss>

I tried wget "url" -o file.xml ... and when I open file.xml... it is just returning

http://www.pistonheads.co.uk/xml/news091.asp?c=26 => `news091.asp?c=26' Resolving www.pistonheads.co.uk... done. Connecting to www.pistonheads.co.ukhttp://xx.xxx.xxx.xx connected. HTTP request sent, awaiting response... 200 OK Length: 5,016 text/xml

0K .... 100% 445.31 KB/s

开发者_高级运维

13:37:13 (445.31 KB/s) - `news091.asp?c=26' saved 5016/5016

Is there any other way to solve this..?


If you want this as the output:

PistonHeads (Motoring News) http://www.pistonheads.com/news/ Motoring News

Then this will do the trick:

wget -q -O - http://www.pistonheads.co.uk/xml/news091.asp?c=26 \
  | egrep '(title>|link>|description>)' | head -3 \
  | sed -e 's/.*>\([^>]*\)<.*/\1/' | tr '\n' ' '

If however you just want the output of the link written to a file, use this:

wget -O file.xml http://www.pistonheads.co.uk/xml/news091.asp?c=2

Note the capital O for the option to write the file.

0

精彩评论

暂无评论...
验证码 换一张
取 消