While writing natural language processing in python, there was a scene where I wrote a counter with only uppercase letters, so make a note so as not to forget ・ Counting words starting with a capital letter ・ Only when only the beginning of a word is capitalized ・ For uppercase letters + numbers, etc.
import re
from collections import Counter
org_text = """
Of course a writer can write from the viewpoint of Southern slave owners who of course would be racist and
view black people with, at best, benevolent paternalism, without her characters'
attitudes necessarily being her own. But it's very apparent that Mitchell was wholly and uncritically sympathetic to her antebellum ancestors.
The Old South was a graceful, chivalrous land where slavery was not a horrible and oppressive institution creating generations of misery and oppression,
but a divinely-ordained means of preserving racial harmony. And for all that Mitchell,
like her characters, probably considered herself to be kind and affectionate to all the African-Americans she knew personally,
there isn't a single black character in the book who isn't an ignorant, semi-human ape -- which is literally how they are described.
Even beloved Mammy is repeatedly compared to a monkey.
"""
#Separate strings into an array
words= re.split(r'\s|\,|\.|\(|\)',org_text)
#Extract only words starting with a capital letter
r = re.compile("^[A-Z]$|^[A-Z][a-z0-9]+$")
dict_word=[x for x in words if r.match(x)]
counter=Counter(dict_word)
for word,count in counter.most_common():
print("%s,%d" % (word,count))
Recommended Posts