Python unicode equal comparison failed -
this question linked searching unicode characters in python
i read unicode text file using python codecs
codecs.open('story.txt', 'rb', 'utf-8-sig')
and trying search strings in it. i'm getting following warning.
unicodewarning: unicode equal comparison failed convert both arguments unicode - interpreting them being unequal
is there special way of unicode string comparison ?
you may use ==
operator compare unicode objects equality.
>>> s1 = u'hello' >>> s2 = unicode("hello") >>> type(s1), type(s2) (<type 'unicode'>, <type 'unicode'>) >>> s1==s2 true >>> >>> s3='hello'.decode('utf-8') >>> type(s3) <type 'unicode'> >>> s1==s3 true >>>
but, error message indicates aren't comparing unicode objects. comparing unicode
object str
object, so:
>>> u'hello' == 'hello' true >>> u'hello' == '\x81\x01' __main__:1: unicodewarning: unicode equal comparison failed convert both arguments unicode - interpreting them being unequal false
see how have attempted compare unicode object against string not represent valid utf8 encoding.
your program, suppose, comparing unicode objects str objects, , contents of str object not valid utf8 encoding. seems result of (the programmer) not knowing variable holds unicide, variable holds utf8 , variable holds bytes read in file.
i recommend http://nedbatchelder.com/text/unipain.html, advice create "unicode sandwich."
Comments
Post a Comment