开发者

Extending of pure function with IO code possible?

开发者 https://www.devze.com 2023-02-02 16:48 出处:网络
I\'ve written a simple XML parser in Haskell. The function convertXML recieves contents of a XML file and returns a list of extracted values that are further processed.

I've written a simple XML parser in Haskell. The function convertXML recieves contents of a XML file and returns a list of extracted values that are further processed.

One attribute of XML tag contains also an URL of a product image and I would like to extend the function to also download it if the tag is found.

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> [String]
convertXML xml = productToCSV products
    where
        productToCSV [] = []
        productToCSV (x:xs) = (getFields x) ++ (productToCSV
                                (elChildren x)) ++ (productToCSV xs)
        getFields elm = case (qName . elName) elm of
                            "product" -> [attrField "uid", attrField "code"]
                            "name" -> [trim $ strContent elm]
                            "annotation" -> [trim $ strContent elm]
                            "text" -> [trim $ strContent elm]
                            "categor开发者_Python百科y" -> [attrField "uid", attrField "name"]
                            "manufacturer" -> [attrField "uid",
                                                attrField "name"]
                            "file" -> [getImgName]
                            _ -> []
            where
                attrField fldName = trim . fromJust $
                                        findAttr (unqual fldName) elm
                getImgName = if (map toUpper $ attrField "type") == "FULL"
                                then
                                    -- here I need some IO code
                                    -- to download an image
                                    -- fetchFile :: String -> IO String
                                    attrField "file"
                                else []
        products = findElements (unqual "product") productsTree
        productsTree = fromJust $ findElement (unqual "products") xmlTree
        xmlTree = fromJust $ parseXMLDoc xml

Any idea how to insert an IO code in the getImgName function or do I have to completely rewrite convertXML function to an impure version ?

UPDATE II Final version of convertXML function. Hybrid pure/impure but clean way suggested by Carl. Second parameter of returned pair is an IO action that runs images downloading and saving to disk and wraps list of local paths where are images stored.

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [String])
convertXML xml = productToCSV products (return [])
    where
        productToCSV :: [Element] -> IO String -> ([String], IO [String])
        productToCSV [] _ = ([], return [])
        productToCSV (x:xs) (ys) = storeFields (getFields x)
                            ( storeFields (productToCSV (elChildren x) (return []))
                                (productToCSV xs ys) )
        getFields elm = case (qName . elName) elm of
                            "product" -> ([attrField "uid", attrField "code"], return [])
                            "name" -> ([trim $ strContent elm], return [])
                            "annotation" -> ([trim $ strContent elm], return [])
                            "text" -> ([trim $ strContent elm], return [])
                            "category" -> ([attrField "uid", attrField "name"], return [])
                            "manufacturer" -> ([attrField "uid",
                                                attrField "name"], return [])
                            "file" -> getImg
                            _ -> ([], return [])
            where
                attrField fldName = trim . fromJust $
                                        findAttr (unqual fldName) elm
                getImg = if (map toUpper $ attrField "type") == "FULL"
                            then
                                ( [attrField "file"], fetchFile url >>=
                                    saveFile localPath >>
                                    return [localPath] )
                                else ([], return [])
                    where
                        fName = attrField "file"
                        localPath = imagesDir ++ "/" ++ fName
                        url = attrField "folderUrl" ++ "/" ++ fName

        storeFields (x1s, y1s) (x2s, y2s) = (x1s ++ x2s, liftM2 (++) y1s y2s)
        products = findElements (unqual "product") productsTree
        productsTree = fromJust $ findElement (unqual "products") xmlTree
        xmlTree = fromJust $ parseXMLDoc xml


The better approach would be to have the function return the list of files to download as part of the result:

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], [URL])

and download them in a separate function.


The entire point of the type system in Haskell is that you can't do IO except with IO actions - values of type IO a. There are ways to violate this, but they run the risk of behaving entirely unlike what you'd expect, due to interactions with optimizations and lazy evaluation. So until you understand why IO works the way it does, don't try to make it work differently.

But a very important consequence of this design is that IO actions are first class. With a bit of cleverness, you could write your function as this:

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [Image])

The second item in the pair would be an IO action that, when executed, would give a list of the images present. That would avoid the need to have image loading code outside of convertXML, and it would allow you to do IO only if you actually needed the images.


I basically see to approaches:

  1. let the function give out a list of found images too and process them with an impure function afterwards. Laziness will do the rest.
  2. Make the whole beast impure

I generally like the first approach more. d

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号