I've used Haskell XML Toolbox in the past. Something along the lines of {-# LANGUAGE Arrows #-} quoteParser :: (ArrowXml a) => a XmlTree Quote quoteParser = hasName "Contents" /> hasName "StockQuote" >>> proc x -> do symbol IO (Maybe Quote) parseQuoteDocument xml = liftM listToMaybe . RunX .
Single $ readString xml >>> getChildren >>> quoteParser.
This is nice. I like arrows. But I can't find anyway to get a String and return an XmlTree to feed the parser.
I only find functions to read documents. Is there any (ArrowXml a) => a String XmlTree function? – Rafael S.
Calsaverini Jan 7 at 10:56 ha! Found hread and xread . Thanks.
– Rafael S. Calsaverini Jan 7 at 11:08 I'm having a problem with the first line . When it's present the parser can't get anythig.
I solved this by simply droping 23 characters from the string. Is there a less hacky solution? – Rafael S.
Calsaverini Jan 7 at 12:30 You shouldn't have to do that. I don't know HXT that much, but I think the issue is that the code above parses an XML fragment, not an XML document. Look at Text.XML.HXT.Arrow.ProcessDocument.
– MtnViewMark Jan 7 at 15:04 @ephemient: while nifty, all those operator overloads makes me feel like I'm reading perl... it's scary! – Matthieu M. Jan 7 at 16:28.
There are plenty of XML libraries written for Haskell that can do the parsing for you. I recommend the library called xml (see hackage.haskell.org/package/xml). With it, you can simply write e.g. : let contents = parseXML source quotes = concatMap (findElements $ simpleName "StockQuote") (onlyElems contents) symbols = map (findAttr $ simpleName "Symbol") quotes simpleName s = QName s Nothing Nothing print symbols This snippet prints Just "PETR3" as a result for your example XML, and it's easy to extend for collecting all the data you need.
To write the program in the style you describe you should use the Maybe monad, as the xml lookup functions often return a Maybe String, signaling whether the tag, element or attribute could be found. Also see a related question: Which Haskell XML library to use?
For simple xml parsing, you can't go wrong with tagsoup. hackage.haskell.org/package/tagsoup.
As long as you don't need to validate well-formedness or ensure that tags are well balanced. As much as I like tagsoup for HTML scraping, I think it's ill-suited for parsing well structured XML files. – Michael Snoyman Jan 6 at 21:02 1 @Michael -- if i'm parsing someone else's irritating format, I generally don't care if they've got the details right, or I trust them to have done so or not depending on the competency of the vendor.
I care about getting my information out, and robustly so if they go changing things on me. – sclv Jan 7 at 5:35.
The following snippet uses xml-enumerator. It leaves date and time as text (parsing those is left as an exercise to the reader): {-# LANGUAGE OverloadedStrings #-} import Text.XML.Enumerator. Parse import Data.Text.
Lazy (Text, unpack) data Quote = Quote { symbol :: Text , date :: Text , time :: Text , price :: Float} deriving Show main = parseFile_ "test. Xml" (const Nothing) $ parseContents parseContents = force "Missing Contents" $ tag'' "Contents" parseStockQuote parseStockQuote = force "Missing StockQuote" $ flip (tag' "StockQuote") return $ do s.
There are other ways to use this library, but for something simple like this I threw together a sax parser. Import Prelude as P import Text.XML.Expat. SAX import Data.ByteString.
Lazy as L parsexml txt = parse defaultParseOptions txt :: SAXEvent String String main = do xml Filter stockquoteelement (parsexml xml) where stockquoteelement (StartElement "StockQuote" attrs) = True stockquoteelement _ = False From there you can figure out where to go. You could also use Text.XML.Expat. Annotated in order to parse it into a structure that is more like what you are looking for above: parsexml txt = parse defaultParseOptions txt :: (LNode String String, Maybe XMLParseError) And then use Text.XML.Expat.
Proc to surf the structure.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.