I've written a simple XML parser in Haskell. The function convertXML recieves contents of a XML file and returns a list of extracted values that are further processed.
One attribute of XML tag contains also an URL of a product image and I would like to extend the function to also download it if the tag is found.
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> [String]
convertXML xml = productToCSV products
where
productToCSV [] = []
productToCSV (x:xs) = (getFields x) ++ (productToCSV
(elChildren x)) ++ (productToCSV xs)
getFields elm = case (qName . elName) elm of
"product" -> [attrField "uid", attrField "code"]
"name" -> [trim $ strContent elm]
"annotation" -> [trim $ strContent elm]
"text" -> [trim $ strContent elm]
"categor开发者_Python百科y" -> [attrField "uid", attrField "name"]
"manufacturer" -> [attrField "uid",
attrField "name"]
"file" -> [getImgName]
_ -> []
where
attrField fldName = trim . fromJust $
findAttr (unqual fldName) elm
getImgName = if (map toUpper $ attrField "type") == "FULL"
then
-- here I need some IO code
-- to download an image
-- fetchFile :: String -> IO String
attrField "file"
else []
products = findElements (unqual "product") productsTree
productsTree = fromJust $ findElement (unqual "products") xmlTree
xmlTree = fromJust $ parseXMLDoc xml
Any idea how to insert an IO code in the getImgName function or do I have to completely rewrite convertXML function to an impure version ?
UPDATE II Final version of convertXML function. Hybrid pure/impure but clean way suggested by Carl. Second parameter of returned pair is an IO action that runs images downloading and saving to disk and wraps list of local paths where are images stored.
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [String])
convertXML xml = productToCSV products (return [])
where
productToCSV :: [Element] -> IO String -> ([String], IO [String])
productToCSV [] _ = ([], return [])
productToCSV (x:xs) (ys) = storeFields (getFields x)
( storeFields (productToCSV (elChildren x) (return []))
(productToCSV xs ys) )
getFields elm = case (qName . elName) elm of
"product" -> ([attrField "uid", attrField "code"], return [])
"name" -> ([trim $ strContent elm], return [])
"annotation" -> ([trim $ strContent elm], return [])
"text" -> ([trim $ strContent elm], return [])
"category" -> ([attrField "uid", attrField "name"], return [])
"manufacturer" -> ([attrField "uid",
attrField "name"], return [])
"file" -> getImg
_ -> ([], return [])
where
attrField fldName = trim . fromJust $
findAttr (unqual fldName) elm
getImg = if (map toUpper $ attrField "type") == "FULL"
then
( [attrField "file"], fetchFile url >>=
saveFile localPath >>
return [localPath] )
else ([], return [])
where
fName = attrField "file"
localPath = imagesDir ++ "/" ++ fName
url = attrField "folderUrl" ++ "/" ++ fName
storeFields (x1s, y1s) (x2s, y2s) = (x1s ++ x2s, liftM2 (++) y1s y2s)
products = findElements (unqual "product") productsTree
productsTree = fromJust $ findElement (unqual "products") xmlTree
xmlTree = fromJust $ parseXMLDoc xml
The better approach would be to have the function return the list of files to download as part of the result:
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], [URL])
and download them in a separate function.
The entire point of the type system in Haskell is that you can't do IO except with IO actions - values of type IO a. There are ways to violate this, but they run the risk of behaving entirely unlike what you'd expect, due to interactions with optimizations and lazy evaluation. So until you understand why IO works the way it does, don't try to make it work differently.
But a very important consequence of this design is that IO actions are first class. With a bit of cleverness, you could write your function as this:
convertXML :: (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [Image])
The second item in the pair would be an IO action that, when executed, would give a list of the images present. That would avoid the need to have image loading code outside of convertXML, and it would allow you to do IO only if you actually needed the images.
I basically see to approaches:
- let the function give out a list of found images too and process them with an impure function afterwards. Laziness will do the rest.
- Make the whole beast impure
I generally like the first approach more. d
精彩评论