Usefulness of itertools.cycle & re.sub
(… or at least the concept). I wanted to process a piece of plain text which would include conventional double-quote marks in such a way that they became HTML smart quote characters (&ldquot; &rdquot;). I was prepared to adopt a naive algorithm which assumed that alternate quotes would always match up, something which obviously wouldn’t work for single quotes. I toyed with various ways of splitting the text up and joining it back together until I came across the slick combination of itertools.cycle and re.sub:
import itertools
import re
quotes = itertools.cycle (['&ldquot;', '&rdquot;'])
def sub (match):
return quotes.next ()
text = 'The "quick" brown "fox" jumps over the "lazy" dog.'
print re.sub ('"', sub, text)
Obviously my itertools.cycle could trivially be written as: while 1: yield '..'; yield '...', but why reinvent the wheel?
Update: Tom Lynn points out that this can be done with a straightforward regex:
text = re.sub(r’”([^”]*)”‘, r’&ldquot;\1&rdquot;’, text)
Tom Lynn said,
Wrote on November 16, 2009 @ 2:51 pm
You’ve got a typo: “&ldqot;”.
More importantly, the magic here is coming from the re module, not itertools. I’d have used::
text = re.sub(r’”([^”]*)”‘, r’&ldquot;\1&rdquot;’, text)
Tom Lynn said,
Wrote on November 16, 2009 @ 2:55 pm
There’s also the smartypants module to do this, if you prefer.
http://pypi.python.org/pypi/smartypants/
tim said,
Wrote on November 16, 2009 @ 2:55 pm
Thanks, Tom. I knew there’d be a good regex solution; I always seem to spend a few minutes poking at more-than-trivial regexes before giving up in disgust at my ignorance and producing solutions like the above. Thanks for the typo report, too. Fixed now.
I suppose I could argue that my solution would scale better to some hypothetical need to cycle round three things but frankly I’d be scrambling for self-justification :)
tim said,
Wrote on November 16, 2009 @ 2:58 pm
Heh. If it had taken me more than the 10 minutes it did to do what I described above I’d have looked around for the obviously-must-be-there prior art. ISTR that Sphinx uses Smartypants, so I’d have got there pretty fast. Thanks for the link.