Among the tasks assigned at the university this time, the following were the tasks as preparations.
① Create a database of 1000 English sentences using the Internet Make sure you don't have the same sentence here.
As expected, it is difficult to visually judge whether there are duplicate sentences when there are 1000 sentences, so it is a program after thinking about writing with python, which I am practicing recently.
check.py
# -*- coding: utf-8 -*-
import sys
str1 =""
try:
file_name = sys.argv[1]
except IndexError:
print('Usage: {0} TEXTFILE'.format(file_name))
sys.exit(1)
with open(file_name, 'r') as f:
#Duplicates disappear by making it a collective type (order is not guaranteed)
# .rstrip()By doing so, you don't have to insert a line break on the last line.
unique_texts = {line.rstrip() for line in f}
for i in unique_texts:
str1 += i+"\n"
with open(file_name + '.disticted.txt', 'w') as f:
f.writelines(str1)
This program compares one sentence of the text as it is, so if you do not start a new line after the last sentence, it will not be deleted when the last sentence is duplicated.
I modified hatchinee's edit request and edited it again.
Recommended Posts