Archive for Aide-memoire

Usefulness of itertools.cycle & re.sub

(… or at least the concept). I wanted to process a piece of plain text which would include conventional double-quote marks in such a way that they became HTML smart quote characters (&ldquot; &rdquot;). I was prepared to adopt a naive algorithm which assumed that alternate quotes would always match up, something which obviously wouldn’t work for single quotes. I toyed with various ways of splitting the text up and joining it back together until I came across the slick combination of itertools.cycle and re.sub:

import itertools
import re

quotes = itertools.cycle (['&ldquot;', '&rdquot;'])
def sub (match):
  return quotes.next ()

text = 'The "quick" brown "fox" jumps over the "lazy" dog.'
print re.sub ('"', sub, text)

Obviously my itertools.cycle could trivially be written as: while 1: yield '..'; yield '...', but why reinvent the wheel?

Update: Tom Lynn points out that this can be done with a straightforward regex:

text = re.sub(r’”([^”]*)”‘, r’&ldquot;\1&rdquot;’, text)

Passing params to db-api queries

Falling mostly into the aide-memoire category, but in case it’s helpful to anyone else…

You have a more-or-less complex SQL query which you’re executing via, eg, pyodbc (or some other dbapi-compliant module) and you need to pass in a set of positional parameters. So you have a where clause which looks something like this (although with better names, obviously):

WHERE
(
  t1.x = ? AND
  (t2.y = ? OR (t3.z = ? AND t2.y < ?))
)
OR
(
  t1.x > ? AND
  (t2.y BETWEEN ? AND ?)
)

So your Python code has to pass in seven parameters, in the right order, several of which are probably the same value. And then you realise that the WHERE clause is slightly wrong. So you adjust it, but now you have eight parameters, and two of the previous ones have changed, and there’s a new one. And then…

There’s no way to use named params with pyodbc, so you end up with a list/tuple of positional parameters which you have to eyeball-match up with the corresponding question marks in the query:

import pyodbc

...

cursor.execute (
  SQL, [
  from_date, threshold, threshold, to_date, interval, threshold]
)

Unless… you use a derived table in the query and use that to generate pseudo-named parameters. This is possible in MSSQL; I don’t know if it would work with other databases, although I can’t see why not. So your code becomes something like (NB no attempt at consistency here; it’s an example):

SELECT
  *
FROM
  t1
JOIN t2 ON t2.t1_id = t1.id
JOIN
(
  SELECT
    from_date = ?,
    to_date = ?,
    max_value = ?,
    interval = ?,
    threshold = ?
) AS params ON
(
  t1.x = params.from_date AND
  (t2.y = params.threshold OR
    (t3.z = params.interval AND t2.y < params.to_date)
  )
)
OR
(
  t1.x > params.threshold AND
  (t2.y BETWEEN params.from_date  AND params.to_date)
)

All you need to do then is to line up the order of params in your cursor.execute with the order of columns in the params derived table.

Alternatives? Well, you could use an ORM of some sort — goodness knows there are enough of them about — but maybe, like me, you find that learning another syntax for something which you can do perfectly well in its native SQL is onerous. Another approach is to set up local variables in your executed statement and use these in much the same way, eg:

DECLARE
  @v_from_date DATETIME,
  @v_to_date DATETIME,
  @v_threshold INT

SELECT
  @v_from_date = ?,
  @v_to_date = ?,
  @v_threhold = ?

SELECT
  *
FROM
  ..
WHERE
  (t1.x < @v_from_date ...)

This works (and is, in fact, how we generate lightweight SQL-to-Excel reports). But there’s a bit more boilerplate involved.

smtplib and failed recipients

Just a quick aide-memoire and a note to anyone else who’s caught out… when using Python’s smtplib module to send email. If you’re like me, you may have missed the following documented behaviour:

This method will return normally if the mail is accepted for at least one recipient. Otherwise it will throw an exception. That is, if this method does not throw an exception, then someone should get your mail. If this method does not throw an exception, it returns a dictionary, with one entry for each recipient that was refused. Each entry contains a tuple of the SMTP error code and the accompanying error message sent by the server.

In other words, sendmail will return successfully even if some of the recipients couldn’t receive the email. You can work out which recipients failed from the dictionary returned. I’d assumed that if *any* recipient didn’t get the email then an exception would be raised. As I say, the actual behaviour is clearly documented, but just in case…

NTLM Authentication using IIS, ISAPI-WSGI and cherrypy

(How’s that for a snappy post title?). This is really an aide-memoire for myself since I always make this kind of thing way more complicated than it needs to be. My mission (whether or not I choose to accept it) is to have a web interface to one of our internal apps which should support transparent browser-based authentication but should fail back gracefully to anonymous access. The app in question is a Helpdesk app and some years ago I wrote a cherrypy-based web interface to it since its own desktop interface is woeful. The cherrypy app’s been running fine for some years, with incremental improvements but I’m trying to push it out to a wider (internal) audience and it could do with the speed and stability boost which a fully-fledged webserver can give it.

The machine it’s running on (via the standard cherrypy server on port 8080) is already running IIS and the pywin32 isapi module, nicely extended by the ISAPI-WSGI project, makes transferring the app fairly plain sailing. But one of the benefits of transferring to IIS was to get as smooth a passthrough authentication as possible. I had been using a fairly crude http basic auth with an authentication check at the server end, but that’s far from ideal. I hoped that by switching to IIS (and given that the official company browser is IE) I could just flick a switch and get the browser’s identity transparently.

Well you can, but I put two stumbling blocks in my own path: when it didn’t work immediately I tried to make it far more complicated than I need have rather than looking for a simple solution; and identity isn’t the same as an access token. So, for my future self trying to remember how to do this, and for anyone else looking for this solution, here’s the easy bit. (I’m assuming you’ve installed some kind of ISAPI-WSGI app, altho’ the same pretty much applies elsewhere).

  • In the IIS MMC snap-in find the isapi-wsgi app you’ve installed. Right-click Properties. [Directory Security] tab. Anonymous access and authentication control [Edit]. Leave [Anonymous Access] ticked. Tick [Integrated Windows Authentication]. [Ok] all the way out.
  • In your WSGI app, look for the REMOTE_USER env var. If it’s set, you’ve got a remote user. If it’s not, call start_response (”401 Unauthorized”, [(”www-authenticate”, “NTLM”)]) and return []

That’s the really brief version, and is based around the IIS 5.1 which comes free with WinXP. IE users need do no more. FF users will probably already know that they need to add the web server to the about:config param with the unmemorable name (it’s got “ntlm” and “trusted-uris” in it). Don’t know if Chrome handles this.

For my purposes, I was using cherrypy so — when I get back to work — all I should have to do is: check cherrypy.request.login which keeps track of incoming REMOTE_USER / LOGIN_USER env vars for me; and do the “401 Unauthorized” dance if I need to possibly by means of a simple pre-request cherrypy tool. The graceful anonymity will be implemented by not displaying and/or allowing certain actions if there is no remote user. Not sure yet whether I need to fail back to basic or digest auth.

The trickier bit — and the bit I haven’t yet solved — is translating the “Authorization:” header which the browser sends into something which I can persuade Windows to use to give me a session token. With the session token, I can determine whether my user’s in certain security groups or not. (I can do something from the username alone by querying AD and that’s my fallback plan). The python-ntlm project looks like it’s done all the spadework here, and especially in the clientserver branch but they also talk about the pywin32 sspi module which I’ve so far managed to avoid having anything to do with.

GROUP BY in Python

When it comes to a certain class of data problem, my mind reaches for SQL… which is a problem when the data’s in Python. Obviously I could create an in-memory sqlite database just for the purpose of storing the data and then retrieving it with SQL. But that would be mild overkill. One such example is grouping data by, say, the first letter. I’ve known about the itertools.groupby function for a while but for some reason whenever I came to look at it, it never quite seemed to find my brain. Having now made the breakthrough I’m reminding myself here for future purposes:

SELECT
  LEFT (words.word, 1),
  COUNT (*)
FROM
(
  SELECT
    word = LOWER (w.word)
  FROM
    words AS w
) AS words
WHERE
  LEN (words.word) >= 2
GROUP BY
  LEFT (words.word, 1)

translates to

import os, sys
import itertools
import operator
import re

first_letter = operator.itemgetter (0)

text = open (os.path.join (sys.prefix, "LICENSE.txt")).read ()
words = set (w.lower () for w in re.findall (r"\w{2,}", text))
groups = itertools.groupby (sorted (words), first_letter)

for k, v in groups:
  print k, "=>", len (list (v))