It has always been a problem that there are many English typos in the comments in the patch (C language) I wrote, so I made a tool that checks the spelling of the English words in the comment by inputting patch / diff. So a note of it.
C or C ++ / * ・ ・ ・ * /
format comments can be supported.
The source is published on github and the repository is here.
$ git clone [email protected]:MasahikoSawada/Patch-Spell-Checker.git
$ export PATH=$PATH:/path/to/Patch-Spell-Checker/PatchSpellChecker.py
$ export WLIST_DIR=/path/to/Patch-Spell-Checker/wlist.d/
If you don't want to put it in the WLIST_DIR
environment variable, specify it with -d
every time you run it.
$ git diff | PatctSpellChecker.py
"xl_heap_lock" might be wrong at line 13.
"+ * needed before releasing buffer. we can reuse xl_heap_lock "
"pupose" might be wrong at line 14.
"+ * for this pupose. it should be fine even if we crash midway "
"combocids" might be wrong at line 45.
"+ * for logical decoding we need combocids to properly decode the "
"Pupose" is a "purpose" typo. "xl_heap_lock" and "combocids" are undefined words (more on this later)
Spell check execution example (executed on an existing source file)
-s(--source-file)Add options
$ PatctSpellChecker.py -f src/backend/postmaster/postmaster.c -s
"subprocess" might be wrong at line 11.
" * operations, mind you --- it just forks off a subprocess to do them "
"lock-manager" might be wrong at line 18.
" * and so it cannot participate in lock-manager operations. keeping "
It is possible to register a new word by writing the word in the * .dict
file in the environment variable WLIST_DIR
. The accuracy of spell checking can be improved by registering technical terms.
(There is a free word list on the net, so be careful about the license and register yourself.)
The formats accepted by the dictionary file are as follows.
--Document format (free English documents can be posted as they are)
Document format
$ cat sentence.txt
PostgreSQL is a powerful open source object-relational database system.
It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures.
--Word format (useful when defining your own jargon)
Word format
$ cat words.txt
PostgreSQL
is
a
ACID
database
system
If you want to extract words from an existing source, use the -s -w
option. If you combine sort
and ʻuniq`, you can get word data as it is.
$ PatchSpellChecker.py -f src/backend/postmaster/postmaster.c -s -w | sort | uniq
activity_buffer
addr
am_syslogger
antivirus
archiver
archive_recovery
Recommended Posts