Archive for Aide-memoire

NTLM Authentication using IIS, ISAPI-WSGI and cherrypy

(How’s that for a snappy post title?). This is really an aide-memoire for myself since I always make this kind of thing way more complicated than it needs to be. My mission (whether or not I choose to accept it) is to have a web interface to one of our internal apps which should support transparent browser-based authentication but should fail back gracefully to anonymous access. The app in question is a Helpdesk app and some years ago I wrote a cherrypy-based web interface to it since its own desktop interface is woeful. The cherrypy app’s been running fine for some years, with incremental improvements but I’m trying to push it out to a wider (internal) audience and it could do with the speed and stability boost which a fully-fledged webserver can give it.

The machine it’s running on (via the standard cherrypy server on port 8080) is already running IIS and the pywin32 isapi module, nicely extended by the ISAPI-WSGI project, makes transferring the app fairly plain sailing. But one of the benefits of transferring to IIS was to get as smooth a passthrough authentication as possible. I had been using a fairly crude http basic auth with an authentication check at the server end, but that’s far from ideal. I hoped that by switching to IIS (and given that the official company browser is IE) I could just flick a switch and get the browser’s identity transparently.

Well you can, but I put two stumbling blocks in my own path: when it didn’t work immediately I tried to make it far more complicated than I need have rather than looking for a simple solution; and identity isn’t the same as an access token. So, for my future self trying to remember how to do this, and for anyone else looking for this solution, here’s the easy bit. (I’m assuming you’ve installed some kind of ISAPI-WSGI app, altho’ the same pretty much applies elsewhere).

  • In the IIS MMC snap-in find the isapi-wsgi app you’ve installed. Right-click Properties. [Directory Security] tab. Anonymous access and authentication control [Edit]. Leave [Anonymous Access] ticked. Tick [Integrated Windows Authentication]. [Ok] all the way out.
  • In your WSGI app, look for the REMOTE_USER env var. If it’s set, you’ve got a remote user. If it’s not, call start_response (”401 Unauthorized”, [(”www-authenticate”, “NTLM”)]) and return []

That’s the really brief version, and is based around the IIS 5.1 which comes free with WinXP. IE users need do no more. FF users will probably already know that they need to add the web server to the about:config param with the unmemorable name (it’s got “ntlm” and “trusted-uris” in it). Don’t know if Chrome handles this.

For my purposes, I was using cherrypy so — when I get back to work — all I should have to do is: check cherrypy.request.login which keeps track of incoming REMOTE_USER / LOGIN_USER env vars for me; and do the “401 Unauthorized” dance if I need to possibly by means of a simple pre-request cherrypy tool. The graceful anonymity will be implemented by not displaying and/or allowing certain actions if there is no remote user. Not sure yet whether I need to fail back to basic or digest auth.

The trickier bit — and the bit I haven’t yet solved — is translating the “Authorization:” header which the browser sends into something which I can persuade Windows to use to give me a session token. With the session token, I can determine whether my user’s in certain security groups or not. (I can do something from the username alone by querying AD and that’s my fallback plan). The python-ntlm project looks like it’s done all the spadework here, and especially in the clientserver branch but they also talk about the pywin32 sspi module which I’ve so far managed to avoid having anything to do with.

GROUP BY in Python

When it comes to a certain class of data problem, my mind reaches for SQL… which is a problem when the data’s in Python. Obviously I could create an in-memory sqlite database just for the purpose of storing the data and then retrieving it with SQL. But that would be mild overkill. One such example is grouping data by, say, the first letter. I’ve known about the itertools.groupby function for a while but for some reason whenever I came to look at it, it never quite seemed to find my brain. Having now made the breakthrough I’m reminding myself here for future purposes:

  LEFT (words.word, 1),
  COUNT (*)
    word = LOWER (w.word)
    words AS w
) AS words
  LEN (words.word) >= 2
  LEFT (words.word, 1)

translates to

import os, sys
import itertools
import operator
import re

first_letter = operator.itemgetter (0)

text = open (os.path.join (sys.prefix, "LICENSE.txt")).read ()
words = set (w.lower () for w in re.findall (r"\w{2,}", text))
groups = itertools.groupby (sorted (words), first_letter)

for k, v in groups:
  print k, "=>", len (list (v))

To genexp or not to genexp - a cautionary tale

I don’t know about you but I’ve been increasingly comfortable using a genexp style when passing sequences as parameters. It just saves a bit of syntax clutter and generally does exactly what you want. To give a completely trivial example:

a = range (10)
print sorted (i for i in a if i < 7)

But (there’s always a But) I found myself having unexpected problems with a database-row class I’d recently beefed up and I couldn’t work out why. It’s one of these things I’m sure everyone’s done where you pass it the cursor description and the row values and it does the __getattr__ and __getitem__ stuff for you. Well, it was acting as though the cursor wasn’t passing the column names at all. I suspected the new(ish) version of pyodbc I’d installed lately, but it was soon clear that the problem was in the generated Row class itself.

Of course, it turned out that I was, as above, passing the column names as a genexp. And some early reference was consuming the generator, silently leaving nothing for a later reference to consume. Something like this:

q.execute ("SELECT blah FROM blah")
Row = row.Row (d[0] for d in q.description)
return [Row (i) for i in q.fetchall ()]

where the code in row.Row did something like this:

def Row (names):
  something = "-".join (names)
  ## Oops, consumed the generator
  description = dict ((name, index) for index, name in enumerate (names))
  ## Oh dear, nothing left to consume but not an error

You see that I’m not falling into the trap of returning the list of rows as a genexp, because I know I’m likely to be using it in several places, but it’s just that bit cleaner to pass as a genexp as a parameter, and I fell for it.

One more thing to look out for.

How do I get the window for a subprocess.Popen object?

[This only applies to Win32, obviously!]
Perhaps you want to be able to close an app cleanly by sending it a WM_CLOSE? Or maybe you want to minimize it on demand? To do that you have to get the hWnd of its top-level window. You’d have thought there’d be some kind of get-window-from-process-id API call, wouldn’t you? Well, according to all the available literature, there isn’t. (Maybe there is in .NET? I wouldn’t know). The code below pretty much illustrates the canonical approach: loop over available windows, find their process id, and compare against the process id you first thought of. Since a process could have multiple visible top-level windows, I’ve allowed for a list of HWNDs but this is probably overkill.

import subprocess
import time

import win32con
import win32gui
from win32gui import IsWindowVisible, IsWindowEnabled
import win32process

def get_hwnds_for_pid (pid):

  def callback (hwnd, hwnds):
    if IsWindowVisible (hwnd) and IsWindowEnabled (hwnd):
      _, found_pid = win32process.GetWindowThreadProcessId (hwnd)
      if found_pid == pid:
        hwnds.append (hwnd)
    return True

  hwnds = []
  win32gui.EnumWindows (callback, hwnds)
  return hwnds

if __name__ == '__main__':
  notepad = subprocess.Popen ([r"notepad.exe"])
  # sleep to give the window time to appear
  time.sleep (2.0)

  for hwnd in get_hwnds_for_pid (
    print hwnd, "=>", win32gui.GetWindowText (hwnd)
    win32gui.SendMessage (hwnd, win32con.WM_CLOSE, 0, 0)

SQL - always learning

I recently attended a course for SQL Server programmers upgrading from SQL Server 2000 to 2005 (as we are at work). Naturally enough I thought that I’d know everything there was to know about the existing product but would find out about the new stuff in 2005. To my surprise we learnt a technique entirely by the way which has been there forever.

If you do this kind of thing:

SET @v = ''
SELECT @v = @v + a.x + ' ' FROM a WHERE a.y = z

then @v will be a space-separated string of all the values of the the x column in table a. I always assumed it would simply give me the first or the last or some arbitrary value.

You live and learn.