While studying natural language processing, I generated a text file of the word list, but it was difficult to check the contents because it was large in size, so whether the word I chose appropriately is in the word list I wanted to be able to confirm.
I referred to this article on Yukun's Blog.
It seems that the command line arguments are stored in the argv attribute of the sys module.
contain_or_not.py
import sys
r0 = open('vocab.txt','r') #Open file in read mode
vocab = r0.readlines() #vocab.txt contains words separated by line breaks
r0.close()
argvs = sys.argv
words = argvs[1:] #argvs[0]Contains the file name at runtime
for word in words:
if word in vocab:
print(word + ' is in vocab.')
else:
print(word + ' is not in vocab.')
I didn't think about error handling because I only use it, but is it more user-friendly to get an error when a word is not entered?
When I typed a word that was clearly contained in vocab.txt, it output not in vocab. Apparently the line feed symbol is bad. Line breaks are easier for humans to see, but is it better not to break files poorly?
~~ In the next post, I will write how to remove the line feed symbol from each element of vocal. ~~ I have written. The modified version of the script is also here. python note: map -do the same for each element of the list
Recommended Posts