Using grep in python

Using grep in python

First of all, you are not iterating over the file properly. You can simply use for b in f: without the .readline() stuff.

Then your code will blow in your face as soon as the filename contains any characters which have a special meaning in the shell. Use subprocess.call instead of os.system() and pass an argument list.

Heres a fixed version:

import os
import subprocess
with open(query.txt, r) as f:
    for line in f:
        line = line.rstrip() # remove trailing whitespace such as n
        subprocess.call([/bin/grep, line, my2.txt])

However, you can improve your code even more by not calling grep at all.
Read my2.txt to a string instead and then use the re module to perform the search. In case you do not need a regex at all, you can even simply use if line in my2_content

Your code scans the whole my2.txt file for each query in query.txt.

You want to:

  1. read all queries into a list
  2. iterate once over all lines of the text file and check each file against all queries.

Try this code:

with open(query.txt,r) as f:
    queries = [l.strip() for l in f]

with open(my2.txt,r) as f:
    for line in f:
        for query in queries:
            if query in line:
                print query, line

Using grep in python

This isnt actually a good way to use Python, but if you have to do something like that, then do it correctly:

from __future__ import with_statement
import subprocess

def grep_lines(filename, query_filename):
    with open(query_filename, rb) as myfile:
        for line in myfile:
             subprocess.call([/bin/grep, line.strip(), filename])

grep_lines(my2.txt, query.txt)

And hope that your file doesnt contain any characters which have special meanings in regular expressions =)

Also, you might be able to do this with grep alone:

grep -f query.txt my2.txt

It works like this:

~ $ cat my2.txt 
One two
two two
two three
~ $ cat query.txt 
two two
three
~ $ python bar.py 
two two
two three

Leave a Reply

Your email address will not be published. Required fields are marked *