Conditional search and replace with regexps in python

| No Comments | No TrackBacks

Sometimes you have a text manipulation job ahead of you that's a bit too complex to just handle with a single sed command, but too much work to do by hand. Recently I had to deal with this at my intern position, when I had to update from an old set of deprecated API to a new one, across a decently sized code base (~900 changes).

The APIs I had to update were in ruby code, and they looked as follows:

b.key "a"
b.key "Enter"
b.key_up "Ctrl"
b.key_down "Ctrl"
b.keys.send "a"
b.keys.send :Enter
b.keys.up :Ctrl
b.keys.down :Ctrl

So not only do you need to change three different function calls, but you also need to change any special keys to Ruby symbols, and this doesn't even take into account that Ruby supports two different ways of calling functions (both with and without parentheses), and the code base used both, with varying spacing. In the end you should be left with more consistent code, that doesn't throw a gazillion deprecation warning. :)

What is an intern to do? You hatch a master plan, of course! One including a single regexp matching all the cases, and use python's ability to replace with the result from a function call to make everything magically happening. So, without further ado, here's my code:

import os
import os.path
import re
import sys
import tempfile

tmp = tempfile.NamedTemporaryFile(delete=False)
f = open(sys.argv[1])
print tmp.name, os.path.abspath(f.name)
called = False

def key_upgrade(key):
    if len(key) == 1:
        return '"%s"' % key
    else:
        return ':%s' % key.capitalize()

def replacement(match):
    global called
    called = True
    func = match.group(1)
    if func:
        func = func[1:]
    else:
        func = 'send'

    key = key_upgrade(match.group(2))

    trail = ''
    if match.group(3):
        trail = match.group(3).replace(')', '')

    return '.keys.%s %s%s' % (func, key, trail)

for line in f:
    tmp.write(re.sub(r'\.key(_down|_up|)[ (]\s*["\'](.+?)["\'](\s*[ )]?\s*[}#]?)?',
                    replacement, line, 0))

tmp.close()
f.close();

#Minor security thing
if called:
    os.rename(tmp.name, os.path.abspath(f.name))
else:
    os.unlink(tmp.name)

As you can see, it uses a simple (probably too simple for "real" code, so be warned) file handling routine that creates a new temporary file, rips through the code file line by line, running the regexp against it, and writing the result to the temporary file.

The regexp substitution calls the function replacement whenever it matches, sending in a match object containing either two or three groups (depending on if the last group of the regexp matches). This is then used to do the necessary manipulation on the key names, generate the right function call, and do some munging on the trailing parts of the call.

The last group ((\s*[ )]?\s*[}#]?)?) is probably the most bewildering. It's used to filter out closing parentheses in calls that use them, even if they are followed by a comment or a brace, in a way that retains spacing. Purely aesthetic, but it leaves things looking neat and consistent, which was a design goal.

When all the lines have been processed, it does some magic to put the new and improved file into place. It'll check if any replacements have been done. If none, just delete the temporary file, otherwise you rename the temporary file to the same name as the old one to replace it, this should avoid a few of the problems inherent in doing this sort of thing, and be reasonably fast, without thrashing the old file if anyone is trying to run it while we're working.

After running this, I was left with a pretty massive diff, which I actually went through and controlled every single change in to make absolutely sure my quickly hacked up code didn't break anything. This was made pretty easy by the use of git's interactive commit functionality (git commit --interactive), which may just be one of my favourite git features.

In summary, it was a very effective way to make my work more fun, and probably more efficient. Of course, since it's a quick hack, there are flaws in the code (it doesn't handle multiline calls, for example), but as noted it worked perfectly fine on the codebase I had, and I made sure to control the result (!!!), so everything went better than expected. :D

No TrackBacks

TrackBack URL: http://ircubic.net/cgi-bin/mt/mt-tb.cgi/6

Leave a comment

About me

I am Daniel E. Bruce, a Python and .NET coder.

Currently working on Renraku OS, in addition to some personal Python web projects, using both Django and Pylons.

More info:

About this Entry

This page contains a single entry by Daniel E. Bruce published on July 17, 2011 7:20 PM.

Installing Windows 7 on the Eee PC 1000H - The Epic Struggle was the previous entry in this blog.

Beginning a work log for my Master's Thesis is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

December 2011

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.01