How Does Python Handle Signals (on Windows)?

(Background: I’ve recently and less recently worked through a couple of issues with Python’s Ctrl-C handling under Windows. These required me to dig into the corners of Python’s signal-handling mechanism as interpreted on Windows. This post is something of an aide memoire for myself for the next time I have to dig.).

Signals are a Posix mechanism whereby User-space code can be called by Kernel-space code as a result of some event (which might itself have been initiated by other User-space code). At least, that’s the understanding of this ignorant Windows-based developer.

For practical purposes, it means you can set up your code to be called asynchronously by installing a signal handler for a particular signal. Lots more information, of course, over at Wikipedia. Windows (which is not short of native IPC mechanisms, asynchronous and otherwise) offers an emulation of the Posix signals via the C runtime library, and this is what Python mostly uses.

However, as you’ll see from the Python docs Python doesn’t allow arbitrary code to be called directly by the OS. Instead, it keeps track of what handlers you’ve set up via the signal module and then calls them when it’s got a moment. The “when it’s got a moment” means, essentially, that Modules/signalmodule.c:PyErr_CheckSignals is called all over the place, but especially is called via the eval-loop’s pending calls mechanism.

So what does this mean in term’s of Python’s codebase?

* The heart of the signal handling mechanism is in Modules/signalmodule.c

* The signal module keeps track in a Handlers structure of the Python handlers registered via the signal.signal function. When the mechanism fires up, it pulls the appropriate function out of that structure and calls it.

* Python registers Modules/signalmodule.c:signal_handler with the OS as a global signal handler which, when fired by the OS, calls Modules/signalmodule.c:trip_signal which indicates that the corresponding Python signal handler should be called at the next available point.

* The signal can be delivered by the OS (to the internal signal_handler function) at any point but the registered Python handler will only be run when PyErr_CheckSignals is run. This means that, at the very least, the Python signal handlers will not be run while a system call is blocking. It may be that whatever caused the signal will have caused the kernel to abort the blocking call, at which point Python takes over again and can check the signals. (This is what happens at points in the IO read/write loops). But if some uninterruptible device read hangs then Python will not regain control and no signal handler will execute.

* The main eval loop will check for raised signals via its pending calls mechanism, a C-level stack from which a function call can be popped every so often around the loop. The trip_signal function (called by the global signal_handler) adds to the queue of pending functions a wrapped call to PyErr_CheckSignals. This should result in the signals being checked a few moments later during the eval loop.

OK; so much for the whistlestop tour. How about Windows?

Well, for the most part, Windows operates just the same way courtesy of the C runtime library. But the signals which are raised and trapped are limited. And they probably resolve to the more Windows-y Ctrl-C and Ctrl-Break. I’m not going to touch on Ctrl-Break here, but the default Ctrl-C handling in Python is a bit of a mixed bag. We currently have a mixture of three things interacting with each other: the signal handling described above (where the default SIGINT handler raises PyErr_KeyboardInterrupt); the internal wrapper around the C runtime’s implementation of fgets which returns specific error codes if the line-read was interrupted; and some recently-added Windows event-handling which makes it easier to interrupt sleeps and other kernel objects from within Python).

That really was quick and I’ve brushed over a whole load of details; as I say, it’s more to remind me the next time I look at a related issue. But, hopefully it’ll give other interested people a headstart if they want to see how Python does things.(Background: I’ve recently and less recently worked through a couple of issues with Python’s Ctrl-C handling under Windows. These required me to dig into the corners of Python’s signal-handling mechanism as interpreted on Windows. This post is something of an aide memoire for myself for the next time I have to dig.).

Just in case you thought it was easy…

From time to time, the idea of a standard Python “Enum” object is raised on the Python lists. You know the kind of thing: a lightweight means of mapping numbers to labels so you can do set_colour( without having a long module of manifest constants or magic numbers lying around all over your codebase.

It all sounds very straightforward, and Barry Warsaw had an existing module which seemed like a fairly good starting point, so PEP 435 was started and it all looked like it was just a formality.

Now, literally *hundreds* of mailing list posts and endless, endless threads later, GvR has just pronounced his approval of the PEP and it’s good to go.

If you — like me — thought “this one won’t be controversial”, then just point your search engine of choice at and look for “enum” or “435″, or just look at the archive for May alone (which only represents the final few days of details being thrashed out) to realise just how much discussion and work is involved in what appears to be quite a simple thing.

Of course, part of the problem is precisely the fact that the idea is so simple. I’m sure most people have rolled their own version of something like this. I know I have. You can get up and running with a simple “bunch” class, possibly throw in a few convenience methods to map values to names and then just get on with life. But when something’s got to go into the stdlib then it all becomes a lot more difficult, because everyone has slightly (or very) different needs; and everyone has slightly (or very) different ideas about what constitutes the most convenient interface.

And there’s always the danger of the “bikeshed” effect. If a PEP is proposing something perhaps quite fundamental but outside most people’s experience, then only people with sufficient interest and knowledge are likely to contribute. (Or argue). But an enum: everyone’s done that, and everyone’s got an interest, and an idea about how it should be done.

But, bikesheds aside, I’m glad that the Python community is prepared to refine its ideas to get the best possible solution into the standard library. As a developer, one naturally feels that one’s own ideas and needs represent everyone else’s. It’s only when you expose your ideas to the sometimes harsh winds of the community critique that you discover just how many different angles there are to something you thought was simple.

Thankfully, we have a BDFL (or, sometimes, a PEP Czar) to make the final decision. And, ultimately, that means that some people won’t see their needs being served in the way they want. But I think that that’s far preferable to a design-by-committee solution which tries to please everybody and ends up being cluttered.

Yesterday’s London Python Dojo

Yesterday was the March Python Dojo, hosted as usual by the ever-generous Fry-IT, with a book donated by O’Reilly. We started with a couple of not-so lightning talks from Tom Viner — talking about his team’s solution for last month’s puzzle — and Nicholas Tollervey — talking about bittorrent. An artfully-worded late question had @ntoll on his soapbox for a while on the subject of copyright and payment to artists, until someone spoiled it by suggesting that maybe we ought to write some code in Python!

After the usual, only slightly convoluted, voting experience, we decided to pick up one of last month’s runner-up challenges: creating a compression-decompression algorithm. Naturally most people started from some kind of frequency table, replacing the most common items with the smallest replacement. The approaches ranged from a hybrid Huffman-UTF8 encoding to an attempt to replace common words by a $n placeholder, where the n would increase as the word became less common. The winner for the most optimistic approach was a lossy algorithm which dropped every other word on compression, replacing it on decompression by the most likely from a lookup table. Tested against a corpus of Shakespeare’s works it produced some quite readable poetry.

As an aside, I can assert after a wide-ranging survey that (a) the preferred editor background is dark (black or dark-grey); and (b) in spite of all the tech at their fingertips, programmers still reach for pen and paper when they need to work something out!

The Great Thing About Open Source Is…

(No, this isn’t one of those New Year memes; it’s just a bit of a verbal ramble).

I wanted to get an app up-and-running on WebFaction. And, for reasons not necessarily entirely connected with reality and hard requirements, I decided to go for Python3. WF offers CherryPy 3.2 against Python 3.1 but — again, for no very good reason — I decided I wanted the very latest.

So: download cherrypy.latest.tgz and python.latest.tgz. Dum de dum, configure –prefix=$HOME && make && make install. Bit of fiddling about with WF’s cherrypy setup and… we have a problem. The cherrypy engine is, for some reason of its own, using threading._Timer rather than threading.Timer and it’s not there any more in Python 3.3.

Investigation shows that this was fixed in cherrypy a few months ago. Meanwhile things will probably go ok with Python 3.2. What to do? I decided on backtracking on Python (truth be told, I hadn’t checked the cherrypy logs at this point). Download python.32.tgz. Dum de dum… and… error with _posixsubprocess. Looks like it’s fixed in the 3.2 branch but not yet released.

I’m now in the situation where I’ll either have to use a hg checkout of cherrypy or an hg checkout of Python. Or both.

Lest this be seen as a dig at either the cherrypy or the Python guys, it’s certainly not. I’m grateful to them both for the work they’ve done and for the open source infrastructure they use which gives me the information and options I’ve outlined above. I could most certainly backtrack a couple of published versions of either perfectly well for my own needs.

It’s more of an endorsement of what open source offers you if you want (and I haven’t even talked about the fact that I could fork cherrypy or Python if I needed and still maintain a version in sync with the canonical one for my own use on WF).

All I need to do now is clone, config & make and I’m away.

Aide-memoire for Python hg clones

(This is because I use mercurial rarely enough and commit to Python even more rarely; so I always forget what incantations I used last time…)

hg clone
hg clone issue1234
cd issue1234

hg up 3.3
hg import --no-commit http://.../fixedit.patch

# Do whatever
# Edit Misc/NEWS

hg commit -m "... (Patch by ...)"
hg up default
hg merge 3.3

# Stuff happens including, probably, Misc/NEWS conflicting
# Copy Misc/NEWS.orig back to Misc/NEWS and re-edit

hg resolve -m Misc/NEWS

# Do whatever

hg commit -m "... (Patch by ...)"

(Watch out for push races if other devs have committed…)

hg push ssh://