开发者

What does this Perl XML filter look like in Python?

开发者 https://www.devze.com 2023-02-05 12:46 出处:网络
curl -u $1:$2 --silent \"https://mail.google.com/mail/feed/atom\" | perl -ne \'print \"\\t\" if /<name>/; print \"$2\\n\" if /<(title|name)>(.*)<\\/\\1>/;\'开发者_开发问答
curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | perl -ne 'print "\t" if /<name>/; print "$2\n" if /<(title|name)>(.*)<\/\1>/;'开发者_开发问答

I have this shell script which gets the Atom feed with command-line arguments for the username and password. I was wondering if this type of thing was possible in Python, and if so, how I would go about doing it. The atom feed is just regular XML.


Python does not lend itself to compact one liners quite as well as Perl. This is primarily for three reasons:

  1. With Perl, whitespace is insignificant in almost all cases. In Python, whitespace is very significant.
  2. Perl has some helpful shortcuts for one liners, such as perl -ne or perl -pe that put an implicit loop around the line of code.
  3. There is a large body a cargo-cult Perl one liners to do useful things.

That all said, this python is close to what you posted in Perl:

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | python -c ' 
import sys
for s in sys.stdin:
    s=s.strip()
    if not s: print '\t',
    else: print s
' 

It is a little difficult to do better because, as stated in my comment, the Perl you posted is incomplete. You have:

perl -ne 'print "\t" if //; print "$2\n" if /(.*)/;'

Which is equivalent to:

LINE:
while (<>) {
  print "\t" if //;         # print a tab for a blank line
  print "$2\n" if /(.*)/;   # nonsensical. Print second group but only 
                            # a single match group defined...
}

Edit

While it is trivial to rewrite that Perl in Python, here is something a bit better:

#!/usr/bin/python
from xml.dom.minidom import parseString
import sys

def get_XML_doc_stdin(f):
    return xml.dom.minidom.parse(f)

def get_tagged_data2(tag, index=0):    
    xmlData = dom.getElementsByTagName(tag)[index].firstChild.data
    return xmlData

data=sys.stdin.read()
dom = parseString(data)

ele2=get_tagged_data2('title')
print ele2

count=int(get_tagged_data2('fullcount'))
print count,"New Messages:"

for i in range(0,count):
    nam=get_tagged_data2('name',i)
    email=get_tagged_data2('email',i)
    print "  {0}: {1} <{2}>".format(i+1,nam,email)

Now save that in a text file, run chmod +x on it, then:

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | 
/path/pythonfile.py

It produces this:

Gmail - Inbox for xxxxxxx@gmail.com
2 New Messages:
  1: bob smith <bob@smith.com>
  2: Google Alerts <googlealerts-noreply@google.com>

edit 2 And if you don't like that, here is the Python 1 line filter:

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" |python -c ' 
import sys, re
for t,m in re.findall(r"<(title|name)>(.*)<\/\1>",sys.stdin.read()):
    print "\t",m
'


You may use an "URL opener" from the urllib2 standard Python module with a handler for authentication. For example:

#!/usr/bin/env python

import getpass
import sys
import urllib2

def main(program, username=None, password=None, url=None):

    # Get input if any argument is missing
    username = username or raw_input('Username: ')
    password = password or getpass.getpass('Password: ')
    url = url or 'https://mail.google.com/mail/feed/atom'

    # Create password manager
    password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
    password_mgr.add_password(None, url, username, password)

    # Create HTTP Authentication handler and URL opener
    authhandler = urllib2.HTTPBasicAuthHandler(password_mgr)
    opener = urllib2.build_opener(authhandler)

    # Fetch URL and print content
    response = opener.open(url)
    print response.read()

if __name__ == '__main__':
    main(*sys.argv)

If you'd like to extract information from the feed too, you should check how to parse Password-Protected Feeds with feedparser.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号