I and a friend recently implemented link grabbing in my Clojure IRC bot. When it sees a link, it slurps the page and grabs the title from the page. The problem is that it has t开发者_JS百科o slurp the ENTIRE page just to grab the link.
How does one go about reading a page lazily until the first </title>?
Use line-seq
but don't forget to close the underlying stream when done.
I wouldn't count on the HTML necessarily being split into lines in a sensible way; without looking outside of our own backyard, e.g. Compojure (or Hiccup currently, I guess) doesn't bother inserting line breaks, I believe (update: just checked Hiccup -- no line breaks).
What I'd suggest instead is lazy XML parsing (with clojure.contrib.lazy-xml
) on top of a java.io.BufferedInputStream
.
精彩评论