Count specific strings in a file

How many specific strings are included in a certain file on Linux

Situation example

hoge.txt


hogefugapiyohogefugapiyo
hogehogehogehogehogehoge

When you want to know the number of "hoge" from a file like this. (By the way, 8)

I tried to find out the number of occurrences with the built-in command, but it didn't work, so I wrote a script in Python. (Because grep may not be able to handle multiple occurrences in one line)

A script that calculates the number of specific strings

match_count.py


# -*- coding: utf-8 -*-
#!/usr/bin/env python
 
import sys
import os.path
 
 
def clean_args(args):
    if len(args) == 2:
        search_word = args[1] 
        return (True, None, search_word)
    if len(args) != 3: 
        print "[Usage] match_count.py $filename $search_word"
        return (False, None, None)
    
    target_file_path = args[1]
    search_word       = args[2] 
 
    if not os.path.exists(target_file_path):
        print "[Error] File is not exist."
        return (False, None, None)
 
    return (True, target_file_path, search_word)
 
 
def count_words(filename, search_word):
 
    if filename is not None:
        # python 2.Because it was 4, I can't use with
        stream = open(filename, 'r')
        counter = _count(stream, search_word)
        stream.close()
        return counter
    else :
        return _count(sys.stdin, search_word)
 
 
def _count(stream, search_word):
    counter = 0
    for line in stream:
        counter += line.count(search_word)
    return counter
 
   
def main():
 
    args = sys.argv
    (is_valid, filename, search_word) = clean_args(args)
    if not is_valid:
        sys.exit()
    
    print count_words(filename, search_word)
 
 
if __name__ == '__main__':
    main()

Create this file on Linux and give it execute permission.

How to use

$ ./match_count.py hoge.txt hoge
8

You can get the number that matches hoge like this.

I also tried to support pipes

$ cat hoge.txt | ./match_count.py hoge
8

You can use it even if you like. I wonder if it will be better if I cat multiple files.

Recommended Posts

Count specific strings in a file
How to count numbers in a specific range
[Sublime Text 2] Always execute a specific file in the project
Save a specific variable in tensorflow.session
Create a binary file in Python
Save a YAML-formatted file in PyYAML
[GPS] Create a kml file in Python
Create a GIF file using Pillow in Python
Read a file containing garbled lines in Python
Create an executable file in a scripting language
How to create a JSON file in Python
Clone with a specific branch / tag in GitPython
Extract lines containing a specific "string" in Pandas
Sort dict in dict (dictionary in dictionary) with a specific key
Enter a specific value for variable in tensorflow
Create a MIDI file in Python using pretty_midi
How to read a file in a different directory
Get a row containing a specific element in np.where
File operations in Python
Parse a JSON string written to a file in Python
File processing in Python
Get the file name in a folder using glob
A memorandum to run a python script in a bat file
I want to randomly sample a file in Python
File operations in Python
dict in dict Makes a dict a dict
Run a Python file with relative import in PyCharm
Compare strings in Python
Reverse strings in Python
Output a binary dump in binary and revert to a binary file
Python2 / numpy> Replace only a specific column in a file with column data from another file> numpy.c_
Stop an instance with a specific tag in Boto3
Try creating a Deep Zoom file format .DZI in Python
Save the pystan model and results in a pickle file
A general-purpose program that formats Linux command strings in python
Change the standard output destination to a file in Python
How to import a file anywhere you like in Python
Get the number of specific elements in a python list
Write a co-author network in a specific field using arxiv information
[Note] Import of a file in the parent directory in Python
Simultaneously input specific data to a specific sheet in many excels
Take a screenshot in Python
Create a function in Python
Create a dictionary in Python
Download the file in Python
Collaborate in a remote environment
Search for strings in Python
Upload a file to Dropbox
Search for strings in files
Read and write a file
Make a bookmarklet in Python
Create a dummy data file
Write and read a file
Export a gzip-compressed text file
Display Japanese in JSON file
Draw a heart in Python
Get the list in the S3 bucket with Python and search with a specific Key. Output the Key name, last update date, and count number to a file.
Output the key list included in S3 Bucket to a file
Process the contents of the file in order with a shell script
Anyway, the fastest serial communication log is left in a file
Obtain OTU (microorganism) count data as a text file using QIIME2