Activity on distutils-sig

A long time ago (I don’t remember exactly: years) I subscribed to the Python distutils-sig mailing list. And then I unsubscribed because it was noisy and not terribly fruitful.

Now, chasing down a Windows-related pip issue, I’ve come across it again and discovered that there’s a shed-load of useful work going on there. I had no idea that distribute (a fork of setuptools) & setuptools had [re]merged as of setuptools v0.7, and I’d lost sight of the many PEPs on naming, versioning, distribution formats and the like. I still haven’t worked out which is which, but at least certain of them seem to have reached the stage where they’re the point of reference for other discussions — not discussion points in their own right. There’s an initiative to get pip into the main Python distribution — which I also had no idea about.

I’m especially happy to see Paul Moore holding up the Windows end of things in discussions — thanks, Paul! Despite our both being UK-based [*] and Windows types and long-term Python users, we’ve never actually met AFAIK.

I’ve resubscribed now and I hope to be able to contribute in some small way.

[*] I’m fairly sure — and he did recently make a reference to Gunga Din, which is something I’ve never heard outside this country.

Python meets BoA

[tl;dr photos here]

Last night’s London Python Dojo was held, for the first time, at the very spacious Canary Wharf offices of the Bank of America. They’re big users of Python and, as we were told in an brief introductory, were keen to give something back to the Community.

They certainly did it in style. Their main reception is about the same size as Ealing Common. The meet-and-greet bar area where we had Pizza on classy platters & Beer served by bar staff is not much smaller than the whole of the offices of Fry IT, our long-standing default hosts. And the area below where a few of us gathered feels like a swimming pool with a long slide-like flight of stairs leading down. The function room where the main business of the evening was transacted was spacious with large tables (and *lots* of pencils!).

The guys at BoA had really done their prep work: power strips were already in place and every possible laptop-to-screen adapter was available. (For those who haven’t done this kind of thing: there’s *always* some kind of mismatch between a screen which can only take DVI-I and a Mac user who doesn’t have the Mini-HDMI-to-DisplayPort adapter. Or whatever: I use Windows which never has these problems ;) ).

As well as the friendly intro from one of the BoA guys, we had an enthusiastic lightning talk on Bitcoin from Sam Phippen (who comes in from Winchester or Bristol for the Dojos!). With over 30 people present, we had about 15 suggestions for the evening’s challenge, including old favourites (How does 20 Questions work, Nicholas?) and new ideas, some around the theme of banking. After the usual two rounds we settled on Steganography and made use of the generous table space (and pencils) which our hosts had provided.

The results are on Github (or will be, depending on when you’re reading this) as pull requests come in and are honoured. In short, two (three?) teams went for piggybacking on image bits; two teams (including the one I was with) encoded bits in the extraneous whitespace of a text document; and the last team tried to use the Python’s indentation to carry information in some way which I couldn’t quite understand at the time. I think that every team bar the Python-indentation one had a working result[*]; ours even had unittests!

FWIW my first idea for our team was to encode the characters in Morse code (using spaces & tabs as dots & dashes). We finally settled on binary but I still think Morse would have been cooler — and we could have played the message out as a midi file for extra points!

Of course at the end we had a draw for O’Reilly’s usual generous contribution to proceedings along with an added bonus: an historical map of programming languages. Appropriately enough, the book was won by Sal who’d been the driving force behind Bank of America hosting the Dojo this month.

Next month we’ll probably delay by a week to come in after Europython. Not sure where we’ll be yet, but follow @ldnpydojo or look out on python-uk.

And, of course, big thanks to Bank of America for being our hosts this time round.

TJG

[*] And they may have got things working after a live “Aha!” moment by Al who was demo-ing. [UPDATE: Al was actually in another team per his comment below; so many teams, so short a memory span…]

How Does Python Handle Signals (on Windows)?

(Background: I’ve recently and less recently worked through a couple of issues with Python’s Ctrl-C handling under Windows. These required me to dig into the corners of Python’s signal-handling mechanism as interpreted on Windows. This post is something of an aide memoire for myself for the next time I have to dig.).

Signals are a Posix mechanism whereby User-space code can be called by Kernel-space code as a result of some event (which might itself have been initiated by other User-space code). At least, that’s the understanding of this ignorant Windows-based developer.

For practical purposes, it means you can set up your code to be called asynchronously by installing a signal handler for a particular signal. Lots more information, of course, over at Wikipedia. Windows (which is not short of native IPC mechanisms, asynchronous and otherwise) offers an emulation of the Posix signals via the C runtime library, and this is what Python mostly uses.

However, as you’ll see from the Python docs Python doesn’t allow arbitrary code to be called directly by the OS. Instead, it keeps track of what handlers you’ve set up via the signal module and then calls them when it’s got a moment. The “when it’s got a moment” means, essentially, that Modules/signalmodule.c:PyErr_CheckSignals is called all over the place, but especially is called via the eval-loop’s pending calls mechanism.

So what does this mean in term’s of Python’s codebase?

* The heart of the signal handling mechanism is in Modules/signalmodule.c

* The signal module keeps track in a Handlers structure of the Python handlers registered via the signal.signal function. When the mechanism fires up, it pulls the appropriate function out of that structure and calls it.

* Python registers Modules/signalmodule.c:signal_handler with the OS as a global signal handler which, when fired by the OS, calls Modules/signalmodule.c:trip_signal which indicates that the corresponding Python signal handler should be called at the next available point.

* The signal can be delivered by the OS (to the internal signal_handler function) at any point but the registered Python handler will only be run when PyErr_CheckSignals is run. This means that, at the very least, the Python signal handlers will not be run while a system call is blocking. It may be that whatever caused the signal will have caused the kernel to abort the blocking call, at which point Python takes over again and can check the signals. (This is what happens at points in the IO read/write loops). But if some uninterruptible device read hangs then Python will not regain control and no signal handler will execute.

* The main eval loop will check for raised signals via its pending calls mechanism, a C-level stack from which a function call can be popped every so often around the loop. The trip_signal function (called by the global signal_handler) adds to the queue of pending functions a wrapped call to PyErr_CheckSignals. This should result in the signals being checked a few moments later during the eval loop.

OK; so much for the whistlestop tour. How about Windows?

Well, for the most part, Windows operates just the same way courtesy of the C runtime library. But the signals which are raised and trapped are limited. And they probably resolve to the more Windows-y Ctrl-C and Ctrl-Break. I’m not going to touch on Ctrl-Break here, but the default Ctrl-C handling in Python is a bit of a mixed bag. We currently have a mixture of three things interacting with each other: the signal handling described above (where the default SIGINT handler raises PyErr_KeyboardInterrupt); the internal wrapper around the C runtime’s implementation of fgets which returns specific error codes if the line-read was interrupted; and some recently-added Windows event-handling which makes it easier to interrupt sleeps and other kernel objects from within Python).

That really was quick and I’ve brushed over a whole load of details; as I say, it’s more to remind me the next time I look at a related issue. But, hopefully it’ll give other interested people a headstart if they want to see how Python does things.(Background: I’ve recently and less recently worked through a couple of issues with Python’s Ctrl-C handling under Windows. These required me to dig into the corners of Python’s signal-handling mechanism as interpreted on Windows. This post is something of an aide memoire for myself for the next time I have to dig.).

Just in case you thought it was easy…

From time to time, the idea of a standard Python “Enum” object is raised on the Python lists. You know the kind of thing: a lightweight means of mapping numbers to labels so you can do set_colour(Colours.red) without having a long module of manifest constants or magic numbers lying around all over your codebase.

It all sounds very straightforward, and Barry Warsaw had an existing module which seemed like a fairly good starting point, so PEP 435 was started and it all looked like it was just a formality.

Now, literally *hundreds* of mailing list posts and endless, endless threads later, GvR has just pronounced his approval of the PEP and it’s good to go.

If you — like me — thought “this one won’t be controversial”, then just point your search engine of choice at mail.python.org/pipermail/python-dev and look for “enum” or “435″, or just look at the archive for May alone (which only represents the final few days of details being thrashed out) to realise just how much discussion and work is involved in what appears to be quite a simple thing.

Of course, part of the problem is precisely the fact that the idea is so simple. I’m sure most people have rolled their own version of something like this. I know I have. You can get up and running with a simple “bunch” class, possibly throw in a few convenience methods to map values to names and then just get on with life. But when something’s got to go into the stdlib then it all becomes a lot more difficult, because everyone has slightly (or very) different needs; and everyone has slightly (or very) different ideas about what constitutes the most convenient interface.

And there’s always the danger of the “bikeshed” effect. If a PEP is proposing something perhaps quite fundamental but outside most people’s experience, then only people with sufficient interest and knowledge are likely to contribute. (Or argue). But an enum: everyone’s done that, and everyone’s got an interest, and an idea about how it should be done.

But, bikesheds aside, I’m glad that the Python community is prepared to refine its ideas to get the best possible solution into the standard library. As a developer, one naturally feels that one’s own ideas and needs represent everyone else’s. It’s only when you expose your ideas to the sometimes harsh winds of the community critique that you discover just how many different angles there are to something you thought was simple.

Thankfully, we have a BDFL (or, sometimes, a PEP Czar) to make the final decision. And, ultimately, that means that some people won’t see their needs being served in the way they want. But I think that that’s far preferable to a design-by-committee solution which tries to please everybody and ends up being cluttered.

Yesterday’s London Python Dojo

Yesterday was the March Python Dojo, hosted as usual by the ever-generous Fry-IT, with a book donated by O’Reilly. We started with a couple of not-so lightning talks from Tom Viner — talking about his team’s solution for last month’s puzzle — and Nicholas Tollervey — talking about bittorrent. An artfully-worded late question had @ntoll on his soapbox for a while on the subject of copyright and payment to artists, until someone spoiled it by suggesting that maybe we ought to write some code in Python!

After the usual, only slightly convoluted, voting experience, we decided to pick up one of last month’s runner-up challenges: creating a compression-decompression algorithm. Naturally most people started from some kind of frequency table, replacing the most common items with the smallest replacement. The approaches ranged from a hybrid Huffman-UTF8 encoding to an attempt to replace common words by a $n placeholder, where the n would increase as the word became less common. The winner for the most optimistic approach was a lossy algorithm which dropped every other word on compression, replacing it on decompression by the most likely from a lookup table. Tested against a corpus of Shakespeare’s works it produced some quite readable poetry.

As an aside, I can assert after a wide-ranging survey that (a) the preferred editor background is dark (black or dark-grey); and (b) in spite of all the tech at their fingertips, programmers still reach for pen and paper when they need to work something out!