开发者

NSXMLParser RSS feed strange characters issue

开发者 https://www.devze.com 2023-01-27 14:05 出处:网络
hi i am trying to loop through XML document using NSXMLParser and have trouble with description tag. some news websites have strange characters(HTML tags,<,>,a etc) in thetag and thus parsing is n

hi i am trying to loop through XML document using NSXMLParser and have trouble with description tag.

some news websites have strange characters(HTML tags,<,>,a etc) in the tag and thus parsing is not as expected. could anyone provide so开发者_开发技巧me help?

thanks


You'll need to convert entity references to the characters that they represent. Any HTML tags would either need to be stripped, or fed into a UIWebView.


For skipping the html tags you need to do this:

- (NSString *)flattenHTML:(NSString *)html {

    NSScanner *theScanner;
    NSString *text = nil;
    theScanner = [NSScanner scannerWithString:html];

    while ([theScanner isAtEnd] == NO) {

        [theScanner scanUpToString:@"<" intoString:NULL] ; 

        [theScanner scanUpToString:@">" intoString:&text] ;

        html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@""];
    }
    //
    html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

    return html;
}

Then you can simply replace other unwanted characters by string manipulation.

Hope this helps.

Thanks,

Madhup

0

精彩评论

暂无评论...
验证码 换一张
取 消