开发者

How do I parse xml into "messages" and print them out in scala using stream parsing?

开发者 https://www.devze.com 2023-01-01 04:14 出处:网络
Now that I know how to parse xml in scala as a stream I need help understanding a non-trivial example.

Now that I know how to parse xml in scala as a stream I need help understanding a non-trivial example.

I'd like to parse the following xml as a stream and send a message (print to console for this example) whene开发者_高级运维ver I've parsed out a full message.

I understand that stream based parsing in scala uses case classes to handle the different elements, but I'm just getting started and I don't quite understand how to do this.

I have this working in java using a stax parser, and I'm trying to translate that into scala.

Any help would be greatly appreciated.

<?xml version="1.0" ?>
<messages>
<message>
   <to>john.doe@gmail.com</to>
   <from>jane.doe@gmail.com</from>
   <subject>Hi Nice</subject>
   <body>Hello this is a truly nice message!</body>
</message>
<message>
   <to>joe@gmail.com</to>
   <from>jane.doe@gmail.com</from>
   <subject>Hi Nice</subject>
   <body>Hello this is a truly nice message!</body>
</message>
</messages>


This is for 2.8.

The typical way to process events is to use a match statement. In my case, i always had the need to store the parents as I process elements (to know for instance in what tag the text is located):

import scala.xml.pull._
import scala.io.Source
import scala.collection.mutable.Stack

val src = Source.fromString(xml)
val er = new XMLEventReader(src)
val stack = Stack[XMLEvent]()
def iprintln(s:String) = println((" " * stack.size) + s.trim)
while (er.hasNext) {
  er.next match {
    case x @ EvElemStart(_, label, _, _) =>
      stack push x
      iprintln("got <" + label + " ...>")
    case EvElemEnd(_, label) => 
      iprintln("got </" + label + ">")
      stack pop;
    case EvText(text) => 
      iprintln(text) 
    case EvEntityRef(entity) => 
      iprintln(entity) 
    case _ => // ignore everything else
  }
}

Because entity are events, you will probably need to convert to text and combine them with the surrounding text.

In the example above I only used label, but you can also use EvElemStart(pre, label, attrs, scope) to extract more stuff and you can add an if guard to match for complex conditions.

Also if you're using 2.7.x, I don't know if http://lampsvn.epfl.ch/trac/scala/ticket/2583 was back-ported so, you may have issues to process text with entities.

More to the point, just dealing with from and to for brevity (though I would not call that the Scala way):

class Message() {
  var to:String = _
  var from:String = _
  override def toString(): String = 
    "from %s to %s".format(from, to)
}

var message:Message = _
var sb:StringBuilder = _

while (er.hasNext) {
  er.next match {
    case x @ EvElemStart(_, "message", _, _) =>
      message = new Message
    case x @ EvElemStart(_, label, _, _) if
        List("to", "from") contains label =>
      sb = new StringBuilder 
    case EvElemEnd(_, "to") => 
      message.to = sb.toString
    case EvElemEnd(_, "from") => 
      message.from = sb.toString
      sb = new StringBuilder 
    case EvElemEnd(_, "message") => 
      println(message)
    case EvText(text) if sb != null => 
      sb ++= text
    case EvEntityRef(entity) => 
      sb ++= unquote(entity) // todo
    case _ => // ignore everything else
  }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号