Encoding judgment in Python
Read the byte string and infer the encoding used from that pattern.
UniversalDetector
object and load it little by little with the `` `feed``` method.import chardet
from urllib.request import urlopen
with urlopen('http://qiita.com/') as response:
html = response.read()
print(chardet.detect(html)) // {'confidence': 0.99, 'encoding': 'utf-8'}
It's easy to do,
Recommended Posts
Universal Detector code> main interfaces: p>
Detector.done
: A property for end determination that becomes `` `True``` when the reliability exceeds a certain threshold. detector.result
: Property where the result is storedfrom chardet.universaldetector import UniversalDetector
from urllib.request import urlopen
detector = UniversalDetector()
with urlopen('http://qiita.com/') as response:
for l in response:
detector.feed(l)
if detector.done:
break
detector.close()
print(detector.result) // {'confidence': 0.99, 'encoding': 'utf-8'}
`detector.feed``` to read ``` detector``` line by line, and
`detecor.done``` to check if the judgment is complete. And finally, the flow of displaying the result.To study further