Usefulness of itertools.cycle & re.sub

(… or at least the concept). I wanted to process a piece of plain text which would include conventional double-quote marks in such a way that they became HTML smart quote characters (&ldquot; &rdquot;). I was prepared to adopt a naive algorithm which assumed that alternate quotes would always match up, something which obviously wouldn’t work for single quotes. I toyed with various ways of splitting the text up and joining it back together until I came across the slick combination of itertools.cycle and re.sub:

import itertools
import re

quotes = itertools.cycle (['&ldquot;', '&rdquot;'])
def sub (match):
  return quotes.next ()

text = 'The "quick" brown "fox" jumps over the "lazy" dog.'
print re.sub ('"', sub, text)

Obviously my itertools.cycle could trivially be written as: while 1: yield '..'; yield '...', but why reinvent the wheel?

Update: Tom Lynn points out that this can be done with a straightforward regex:

text = re.sub(r’”([^”]*)”‘, r’&ldquot;\1&rdquot;’, text)

4 Comments so far »

  1. Tom Lynn said,

    Wrote on November 16, 2009 @ 2:51 pm

    You’ve got a typo: “&ldqot;”.

    More importantly, the magic here is coming from the re module, not itertools. I’d have used::

    text = re.sub(r’”([^”]*)”‘, r’&ldquot;\1&rdquot;’, text)

  2. Tom Lynn said,

    Wrote on November 16, 2009 @ 2:55 pm

    There’s also the smartypants module to do this, if you prefer.

    http://pypi.python.org/pypi/smartypants/

  3. tim said,

    Wrote on November 16, 2009 @ 2:55 pm

    Thanks, Tom. I knew there’d be a good regex solution; I always seem to spend a few minutes poking at more-than-trivial regexes before giving up in disgust at my ignorance and producing solutions like the above. Thanks for the typo report, too. Fixed now.

    I suppose I could argue that my solution would scale better to some hypothetical need to cycle round three things but frankly I’d be scrambling for self-justification :)

  4. tim said,

    Wrote on November 16, 2009 @ 2:58 pm

    Heh. If it had taken me more than the 10 minutes it did to do what I described above I’d have looked around for the obviously-must-be-there prior art. ISTR that Sphinx uses Smartypants, so I’d have got there pretty fast. Thanks for the link.

Comment RSS · TrackBack URI

Leave a Comment

OpenID

Sign in with your OpenID ?

Anonymous

Name: (Required)

E-mail: (Required)

Website:

Comment: