Python unicode equal comparison failed -


this question linked searching unicode characters in python

i read unicode text file using python codecs

codecs.open('story.txt', 'rb', 'utf-8-sig') 

and trying search strings in it. i'm getting following warning.

unicodewarning: unicode equal comparison failed convert both arguments unicode - interpreting them being unequal 

is there special way of unicode string comparison ?

you may use == operator compare unicode objects equality.

>>> s1 = u'hello' >>> s2 = unicode("hello") >>> type(s1), type(s2) (<type 'unicode'>, <type 'unicode'>) >>> s1==s2 true >>>  >>> s3='hello'.decode('utf-8') >>> type(s3) <type 'unicode'> >>> s1==s3 true >>>  

but, error message indicates aren't comparing unicode objects. comparing unicode object str object, so:

>>> u'hello' == 'hello' true >>> u'hello' == '\x81\x01' __main__:1: unicodewarning: unicode equal comparison failed convert both arguments unicode - interpreting them being unequal false 

see how have attempted compare unicode object against string not represent valid utf8 encoding.

your program, suppose, comparing unicode objects str objects, , contents of str object not valid utf8 encoding. seems result of (the programmer) not knowing variable holds unicide, variable holds utf8 , variable holds bytes read in file.

i recommend http://nedbatchelder.com/text/unipain.html, advice create "unicode sandwich."


Comments

Popular posts from this blog

ios - UICollectionView Self Sizing Cells with Auto Layout -

node.js - ldapjs - write after end error -

DOM Manipulation in Wordpress (and elsewhere) using php -