ios - Simple NSData's category to parse XML with cyrillic -
i have parse nsdata xml string, know simple category it? have such json, forced use xml. tried use xmlreader, it's interface looks clean, found issues:
mysterious new line characters , spaces everywhere:
"comment_count" = {text = "\n \n 21";};
my cyrillic symbols looks so:
"description_text" = {text = "\n \u041f\u0438\u043a\u0430\u0431\u0443\u0448};
example:
<?xml version="1.0" encoding="utf-8" ?> <news> <xml_count>43</xml_count> <hot_count>449</hot_count> <item type="text"> <id>1469845</id> <rating>147</rating> <pluses>171</pluses> <minuses>24</minuses> <title> <![cdata[Обновление огромного архива Пикабу!]]> </title> <comment_count>26</comment_count> <comment_link>http://pikabu.ru/story/obnovlenie_ogromnogo_arkhiva_pikabu_1469845</comment_link> <author>icq677555</author> <description_text> <![cdata[Пикабушники, я обновил свой огромный архив текстовых постов из горячего!]]> </description_text> </item> </news>
i realized whats' going on. data samples nsdictionary
instances printed in debugger. issues found are:
as xml designed annotated text format, whitespace (spaces, newlines) handling doesn't fit data usage. can either trim resulting strings (
[stringvar stringbytrimmingcharactersinset:[nscharacterset whitespaceandnewlinecharacterset]]
), adapt xmlreader or use xml parser @ http://ios.biomsoft.com/2011/09/11/simple-xml-to-nsdictionary-converter/ (which default).the funny output cyrillic characters proper escaping non-ascii characters in debugger output (which uses old-style property list format). it's artifact of debugger output. variables contain proper characters.
btw: while json contains implicit type information (strings quoted, numbers never quoted etc.), xml without schema file not. parsed simple values strings if numbers.
update:
the xml parser you're using still contains old whitespace handling code described in pesky new lines , whitespace in xml reader class (though comment tells otherwise). apply fix mentioned @ bottom of answer, namely change line:
[dictinprogress setobject:textinprogress forkey:kxmlreadertextnodekey];
to:
[dictinprogress setobject:[textinprogress stringbytrimmingcharactersinset:[nscharacterset whitespaceandnewlinecharacterset]] forkey:kxmlreadertextnodekey];
Comments
Post a Comment