If the number of folders and the hierarchy are large, files are inadvertently duplicated, and similar and non-similar files remain in the hierarchy, which tends to cause problems when searching or editing.
Actually, when I took a text file that had been used for a while to a notebook app that can manage the hierarchy mechanically with a tool to mechanically migrate it to another notebook app, it seemed that there were multiple files with the same name. At this time, I wrote a program to create a file list to make it easier to find the file with the same name in order to clean it up.
Such a place. You can also output a complete list (listed in order of file name) and visually determine which ones are marked and which are unmarked but have similar names. I did. (If you don't like it, just look at the marked ones)
The list is created by separating the file name and folder name with ":". Arrange in ascending order of file names. If there is the same file name (before the extension), add "\ ***** duplicated file ?? ****" at the end. The folder to be searched is searched with the current folder as the top level. The file name of the output list is "out.txt", and the UTF-8 line break is only LF, so please modify it according to your environment.
I'm still new to Python, so I made it by combining information from various websites. Your opinions and professors are welcome.
ListDupFile.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
import codecs
import os
oList = []
odata = ""
prev = "...."
for root, dirs, files in os.walk(u'.'):
for file_ in files:
filename = file_
itm = filename + u' : ' + root
oList.append(itm)
oList.sort()
for data_ in oList:
wList = data_.split('.')
if prev in wList[0] :
data_ = data_ + " ***** duplicated file?? ****"
prev = wList[0]
odata = odata + data_ + "\n"
fout = codecs.open(u'out.txt',"w","utf-8")
fout.write(odata)
fout.close()
Recommended Posts