Archive for October, 2009

Bottom-posting considered confusing?

Not a specifically Python-related post, although most of my mailing list activity is Python-related. Anyone who’s spent a more than negligible amount of time following any of the Python lists will have encountered the “We Bottom-Post Here” reaction to posters who top-post. Personally it makes sense to me, altho’ I do sometimes feel that people are too forthright about things of this sort.

That said, my experience among practically all my colleagues, friends and relations is that bottom-posting or inter-posting (ie snipping segments of an email and replying to them immediately afterwards) is downright confusing. You can blame the phenomenon on the default behaviour of certainly commonly-used email clients if you like. But I’ve had several people reply to me in puzzled tones along the lines of “Why do you put your comments under my comments like that?” or even “I think you pressed the Send key too soon…” when I’d replied at the bottom of their email rather than at the top where they expected it.

Ultimately, When in Rome… I think bottom/interposting makes more sense, but if it makes less sense to my correspondents then I view that as my problem and not theirs.

Moratorium on Python language changes?

Guido’s proposing a moratorium on changes to the Python language, the idea being to enable alternative implementations to have a stable target to aim at. This only applies to the syntax of the language, not to the stdlib, nor to the underlying implementation: the announcement hints at acceptance for a GIL-free implementation if one came along.

I’m all for it, myself. Like other people, I believe that the stdlib needs lots of care and attention in several respects. Removing one focus of changes will result in at least some of that effort shifting to the stdlib. (Altho’ it will probably result in some of the effort simply going elsewhere :) ).

Adventures in spam & Spambayes

This is really a bit of a me-too article, but I thought it worth summarising a modest Python success story. My hosting provider offers IMAP access and allows me to set up my own cron and procmail configuration. I use Thunderbird on several (Windows) machines and very occasionally Squirrelmail or even mutt if that’s all the access I’ve got. I’ve advertised my mail at timgolden address pretty widely and I’m not at all surprised to be receiving a few hundred spams every day.

I suppose everyone has their way of coping with spam and I’ve been using Spambayes for quite a while via a procmail filter, but the bsddb database kept corrupting during training (a known but unsolved issue, it seems) and in the end I just left the hammie.db in the last known state, without retraining, and carried on as best I could, clearing out my Inbox every few days. Then all of a sudden I seemed to get onto someone’s list and the situation became unmanageable. So… back to Spambayes to see if I couldn’t find a solution.

Well, the result was a fresh install of Spambayes (from svn, fwiw), specifying a pickle database since it seems to be less prone to corruption and the volumes I’m dealing with aren’t high, a slight reshuffling of my folders, and the use of Menno Smits’ recently rehoused imapclient lib. The whole process is as follows:

  • A cron job scans my mail folders every few hours and gathers from-addresses from known-to-be-good folders into a white list.
  • Another pair of cron jobs runs Spambayes’ sb_mboxtrain trainer on the to-ham and to-spam folders and then uses imapclient to remove the contents of those folders.
  • When mail comes in, it is whitelisted if it comes from a known-good address; if not, it is passed to Spambayes
  • Spambayes will tag it as ham, spam or unsure
  • A further procmail rule will drop it in the Inbox if it’s considered ham, into the Spam folder if it’s considered spam, or into Suspect.
  • I scan the Suspect folder periodically (manually) and classify messages by moving them to the to-spam folder or copying them to the to-ham folder and then moving to then Inbox or to some other folder.
  • Likewise, I move mail from the Inbox into one of the known-good folders so it will be whitelisted next time.
  • For the time being, I’m also scanning the Spam folder and fishing out the very occasional falsely-accused good email.

The result is remarkable: Spambayes very quickly identifies ham/spam pretty much 100% correctly; I haven’t had any database corruptions so far (about a week now); and I’ll pretty soon ignore the Spam folder and drop anything spambayes calls spam into /dev/null. It’s a little risky, but life is short and my experience is that Spambayes very rarely gets it wrong.

The use of the imapclient libs was new this time round (the rest of the process was only very slightly tweaked from its previous incarnation). And this means less for me to check. Just copy/move the email to to-ham/to-spam and forget about it.

One small thing which came out of this was that I discovered I could have folders on IMAP. I was sure I’d tried it previously and failed with some obscure error. This time, though, Thunderbird just told me: you can either have a folder-only folder or a mail-only folder and created it quite happily. I rely heavily on the Nostalgy add-in to Thunderbird. It means I can have a full-width two-pane display without the folder tree and still move things easily from folder to folder.

In short, a couple of Python libs: Spambayes & imapclient coupled with the ubiquitous procmail and I’ve got a very functional spam filter in place.

Notes:

  • I did look at SPF, but somehow wasn’t sure if the DNS incantation I was using was correct and never took it further.
  • Not sure if greylisting is an option with this hosting service, although people report good results from it in general

Getting WMI to work with Python 3.x

Well, that was easier than I thought…

Someone emailed me recently to say that he was new to Python but wanted to use WMI to query a bunch of machines in a University Computer Lab. He’d downloaded the module from my website but when he tried to import, he got this traceback… I was rather surprised: I don’t claim my code’s perfect, but the current release has been out in the field for a while and I’d have been surprised if something so serious hadn’t been picked up before. Anyway, as you probably guessed, he was using Python 3.1, having gone to python.org and downloaded the latest version. I simply advised him to go back to 2.6 where I knew there was no problem.

But of course, that got me thinking about porting to 3.x. I’ve more-or-less followed the progress of 2to3 and various people’s attempts at porting code to 3.x, and in particular Ned Batchelder’s article a few days ago got me thinking. And coding. And the result is wmi 1.4.2 (and counting), which not only runs on all Python versions from 2.4 to 3.1 but actually has a test suite to prove it. Plus a little web engine for browsing a machine’s WMI structure.

I’m in the process of porting the documentation over to Sphinx, but there’s a pre-release version (sans extra docs) on my website or you can follow the latest and greatest on Subversion.

2nd London Python Dojo & TDD

The 2nd London Python Dojo took place last night, space & food again courtesy of Fry-IT. The format was pretty much the same with the difference that the task was more of a program-y one and less of an API-y one. Which had the result that the audience was far more engaged (read: lots of opinionated backseat drivers) than on the previous week. It was still fun and the proposal that we essentially carry on with the same problem domain (a noughts & crosses game) next time was fairly well received.

What interested me a little more was the differences of approach among the developers present, both those up-front and those in the cheap seats. As I touched on last week, a Test-Driven Development technique was assumed (at least by the organisers). Now, as far as I can tell, while this is a perfectly valid approach to development, it isn’t of the essence of Dojo — ie you don’t need to do TDD for a Dojo to work. The point of a Dojo is rather to code and learn in front of others. Neither does it need to involve pair programming per se.

Now my point is not that I disagree with these techniques, altho’ I’m happily not using them myself in my every day life, but rather that a certain amount of the “suggestions” from the body of the audience was centred on their use. One or two of the coders were clearly not accustomed to working that way, or even aware that you could perhaps, and my own feeling is that this should be perfectly permissible. I’m not saying that anyone was booed off stage for launching in without a test, but there were several strong voices of encouragement in the crowd pointing out that a failing test had not been written (or any test, for that matter) as though True Development were impossible without one!

FWIW, my view on Test-Driven Development is rather like my view on Object-Oriented Development: that it’s an arrow one should certainly have in one’s quiver but that it isn’t always applicable. I realise that the comparison is not the most apt, but go with it for now. I appreciate that the people who were coding were not necessarily in their element and that I may not have been seeing TDD at its best, but there were not a few moments when I felt that a test was being written simply because it should be, according to the Mantra, without any thought to the program, design, goals, structures etc. At one point it was suggested that a particular function should return a string rather than print it to the screen as it would be easier to form a test. Now my view is that if a function needs to print a string then it needs to print a string. The *test* shouldn’t be driving the needs of your program: the requirements should be doing that. (In that case, it could well have been a pragmatic choice since the alternative would presumably have been to construct a mocked sys.stdout but still…).

As I say, I’m sure I wasn’t seeing TDD at its best and brightest. I would genuinely welcome a Masterclass Dojo (or whatever they’re called) where someone walks through a test-driven development to show how it might be done. As it was, I felt that the need to invent a test for something before you did anything about it left you seeing only the trees and failing to get a grasp of the wider wood. My 2.5d-worth.