Archive for Aide-memoire

How Does Python Handle Signals (on Windows)?

(Background: I’ve recently and less recently worked through a couple of issues with Python’s Ctrl-C handling under Windows. These required me to dig into the corners of Python’s signal-handling mechanism as interpreted on Windows. This post is something of an aide memoire for myself for the next time I have to dig.).

Signals are a Posix mechanism whereby User-space code can be called by Kernel-space code as a result of some event (which might itself have been initiated by other User-space code). At least, that’s the understanding of this ignorant Windows-based developer.

For practical purposes, it means you can set up your code to be called asynchronously by installing a signal handler for a particular signal. Lots more information, of course, over at Wikipedia. Windows (which is not short of native IPC mechanisms, asynchronous and otherwise) offers an emulation of the Posix signals via the C runtime library, and this is what Python mostly uses.

However, as you’ll see from the Python docs Python doesn’t allow arbitrary code to be called directly by the OS. Instead, it keeps track of what handlers you’ve set up via the signal module and then calls them when it’s got a moment. The “when it’s got a moment” means, essentially, that Modules/signalmodule.c:PyErr_CheckSignals is called all over the place, but especially is called via the eval-loop’s pending calls mechanism.

So what does this mean in term’s of Python’s codebase?

* The heart of the signal handling mechanism is in Modules/signalmodule.c

* The signal module keeps track in a Handlers structure of the Python handlers registered via the signal.signal function. When the mechanism fires up, it pulls the appropriate function out of that structure and calls it.

* Python registers Modules/signalmodule.c:signal_handler with the OS as a global signal handler which, when fired by the OS, calls Modules/signalmodule.c:trip_signal which indicates that the corresponding Python signal handler should be called at the next available point.

* The signal can be delivered by the OS (to the internal signal_handler function) at any point but the registered Python handler will only be run when PyErr_CheckSignals is run. This means that, at the very least, the Python signal handlers will not be run while a system call is blocking. It may be that whatever caused the signal will have caused the kernel to abort the blocking call, at which point Python takes over again and can check the signals. (This is what happens at points in the IO read/write loops). But if some uninterruptible device read hangs then Python will not regain control and no signal handler will execute.

* The main eval loop will check for raised signals via its pending calls mechanism, a C-level stack from which a function call can be popped every so often around the loop. The trip_signal function (called by the global signal_handler) adds to the queue of pending functions a wrapped call to PyErr_CheckSignals. This should result in the signals being checked a few moments later during the eval loop.

OK; so much for the whistlestop tour. How about Windows?

Well, for the most part, Windows operates just the same way courtesy of the C runtime library. But the signals which are raised and trapped are limited. And they probably resolve to the more Windows-y Ctrl-C and Ctrl-Break. I’m not going to touch on Ctrl-Break here, but the default Ctrl-C handling in Python is a bit of a mixed bag. We currently have a mixture of three things interacting with each other: the signal handling described above (where the default SIGINT handler raises PyErr_KeyboardInterrupt); the internal wrapper around the C runtime’s implementation of fgets which returns specific error codes if the line-read was interrupted; and some recently-added Windows event-handling which makes it easier to interrupt sleeps and other kernel objects from within Python).

That really was quick and I’ve brushed over a whole load of details; as I say, it’s more to remind me the next time I look at a related issue. But, hopefully it’ll give other interested people a headstart if they want to see how Python does things.(Background: I’ve recently and less recently worked through a couple of issues with Python’s Ctrl-C handling under Windows. These required me to dig into the corners of Python’s signal-handling mechanism as interpreted on Windows. This post is something of an aide memoire for myself for the next time I have to dig.).

Aide-memoire for Python hg clones

(This is because I use mercurial rarely enough and commit to Python even more rarely; so I always forget what incantations I used last time…)


hg clone http://hg.python.org/cpython hg.python.org
hg clone hg.python.org issue1234
cd issue1234


hg up 3.3
hg import --no-commit http://.../fixedit.patch

# Do whatever
# Edit Misc/NEWS


hg commit -m "... (Patch by ...)"
hg up default
hg merge 3.3

# Stuff happens including, probably, Misc/NEWS conflicting
# Copy Misc/NEWS.orig back to Misc/NEWS and re-edit


hg resolve -m Misc/NEWS

# Do whatever


hg commit -m "... (Patch by ...)"

(Watch out for push races if other devs have committed…)


hg push ssh://hg@hg.python.org/cpython

Usefulness of itertools.cycle & re.sub

(… or at least the concept). I wanted to process a piece of plain text which would include conventional double-quote marks in such a way that they became HTML smart quote characters (&ldquot; &rdquot;). I was prepared to adopt a naive algorithm which assumed that alternate quotes would always match up, something which obviously wouldn’t work for single quotes. I toyed with various ways of splitting the text up and joining it back together until I came across the slick combination of itertools.cycle and re.sub:

import itertools
import re

quotes = itertools.cycle (['&ldquot;', '&rdquot;'])
def sub (match):
  return quotes.next ()

text = 'The "quick" brown "fox" jumps over the "lazy" dog.'
print re.sub ('"', sub, text)

Obviously my itertools.cycle could trivially be written as: while 1: yield '..'; yield '...', but why reinvent the wheel?

Update: Tom Lynn points out that this can be done with a straightforward regex:

text = re.sub(r’”([^”]*)”‘, r’&ldquot;\1&rdquot;’, text)

Passing params to db-api queries

Falling mostly into the aide-memoire category, but in case it’s helpful to anyone else…

You have a more-or-less complex SQL query which you’re executing via, eg, pyodbc (or some other dbapi-compliant module) and you need to pass in a set of positional parameters. So you have a where clause which looks something like this (although with better names, obviously):

WHERE
(
  t1.x = ? AND
  (t2.y = ? OR (t3.z = ? AND t2.y < ?))
)
OR
(
  t1.x > ? AND
  (t2.y BETWEEN ? AND ?)
)

So your Python code has to pass in seven parameters, in the right order, several of which are probably the same value. And then you realise that the WHERE clause is slightly wrong. So you adjust it, but now you have eight parameters, and two of the previous ones have changed, and there’s a new one. And then…

There’s no way to use named params with pyodbc, so you end up with a list/tuple of positional parameters which you have to eyeball-match up with the corresponding question marks in the query:

import pyodbc

...

cursor.execute (
  SQL, [
  from_date, threshold, threshold, to_date, interval, threshold]
)

Unless… you use a derived table in the query and use that to generate pseudo-named parameters. This is possible in MSSQL; I don’t know if it would work with other databases, although I can’t see why not. So your code becomes something like (NB no attempt at consistency here; it’s an example):

SELECT
  *
FROM
  t1
JOIN t2 ON t2.t1_id = t1.id
JOIN
(
  SELECT
    from_date = ?,
    to_date = ?,
    max_value = ?,
    interval = ?,
    threshold = ?
) AS params ON
(
  t1.x = params.from_date AND
  (t2.y = params.threshold OR
    (t3.z = params.interval AND t2.y < params.to_date)
  )
)
OR
(
  t1.x > params.threshold AND
  (t2.y BETWEEN params.from_date  AND params.to_date)
)

All you need to do then is to line up the order of params in your cursor.execute with the order of columns in the params derived table.

Alternatives? Well, you could use an ORM of some sort — goodness knows there are enough of them about — but maybe, like me, you find that learning another syntax for something which you can do perfectly well in its native SQL is onerous. Another approach is to set up local variables in your executed statement and use these in much the same way, eg:

DECLARE
  @v_from_date DATETIME,
  @v_to_date DATETIME,
  @v_threshold INT

SELECT
  @v_from_date = ?,
  @v_to_date = ?,
  @v_threhold = ?

SELECT
  *
FROM
  ..
WHERE
  (t1.x < @v_from_date ...)

This works (and is, in fact, how we generate lightweight SQL-to-Excel reports). But there’s a bit more boilerplate involved.

smtplib and failed recipients

Just a quick aide-memoire and a note to anyone else who’s caught out… when using Python’s smtplib module to send email. If you’re like me, you may have missed the following documented behaviour:

This method will return normally if the mail is accepted for at least one recipient. Otherwise it will throw an exception. That is, if this method does not throw an exception, then someone should get your mail. If this method does not throw an exception, it returns a dictionary, with one entry for each recipient that was refused. Each entry contains a tuple of the SMTP error code and the accompanying error message sent by the server.

In other words, sendmail will return successfully even if some of the recipients couldn’t receive the email. You can work out which recipients failed from the dictionary returned. I’d assumed that if *any* recipient didn’t get the email then an exception would be raised. As I say, the actual behaviour is clearly documented, but just in case…