Archive for Aide-memoire

GROUP BY in Python

When it comes to a certain class of data problem, my mind reaches for SQL… which is a problem when the data’s in Python. Obviously I could create an in-memory sqlite database just for the purpose of storing the data and then retrieving it with SQL. But that would be mild overkill. One such example is grouping data by, say, the first letter. I’ve known about the itertools.groupby function for a while but for some reason whenever I came to look at it, it never quite seemed to find my brain. Having now made the breakthrough I’m reminding myself here for future purposes:

SELECT
  LEFT (words.word, 1),
  COUNT (*)
FROM
(
  SELECT
    word = LOWER (w.word)
  FROM
    words AS w
) AS words
WHERE
  LEN (words.word) >= 2
GROUP BY
  LEFT (words.word, 1)

translates to

import os, sys
import itertools
import operator
import re

first_letter = operator.itemgetter (0)

text = open (os.path.join (sys.prefix, "LICENSE.txt")).read ()
words = set (w.lower () for w in re.findall (r"\w{2,}", text))
groups = itertools.groupby (sorted (words), first_letter)

for k, v in groups:
  print k, "=>", len (list (v))

To genexp or not to genexp - a cautionary tale

I don’t know about you but I’ve been increasingly comfortable using a genexp style when passing sequences as parameters. It just saves a bit of syntax clutter and generally does exactly what you want. To give a completely trivial example:

a = range (10)
print sorted (i for i in a if i < 7)

But (there’s always a But) I found myself having unexpected problems with a database-row class I’d recently beefed up and I couldn’t work out why. It’s one of these things I’m sure everyone’s done where you pass it the cursor description and the row values and it does the __getattr__ and __getitem__ stuff for you. Well, it was acting as though the cursor wasn’t passing the column names at all. I suspected the new(ish) version of pyodbc I’d installed lately, but it was soon clear that the problem was in the generated Row class itself.

Of course, it turned out that I was, as above, passing the column names as a genexp. And some early reference was consuming the generator, silently leaving nothing for a later reference to consume. Something like this:

q.execute ("SELECT blah FROM blah")
Row = row.Row (d[0] for d in q.description)
return [Row (i) for i in q.fetchall ()]

where the code in row.Row did something like this:

def Row (names):
  something = "-".join (names)
  ## Oops, consumed the generator
  description = dict ((name, index) for index, name in enumerate (names))
  ## Oh dear, nothing left to consume but not an error

You see that I’m not falling into the trap of returning the list of rows as a genexp, because I know I’m likely to be using it in several places, but it’s just that bit cleaner to pass as a genexp as a parameter, and I fell for it.

One more thing to look out for.

How do I get the window for a subprocess.Popen object?

[This only applies to Win32, obviously!]
Perhaps you want to be able to close an app cleanly by sending it a WM_CLOSE? Or maybe you want to minimize it on demand? To do that you have to get the hWnd of its top-level window. You’d have thought there’d be some kind of get-window-from-process-id API call, wouldn’t you? Well, according to all the available literature, there isn’t. (Maybe there is in .NET? I wouldn’t know). The code below pretty much illustrates the canonical approach: loop over available windows, find their process id, and compare against the process id you first thought of. Since a process could have multiple visible top-level windows, I’ve allowed for a list of HWNDs but this is probably overkill.

import subprocess
import time

import win32con
import win32gui
from win32gui import IsWindowVisible, IsWindowEnabled
import win32process

def get_hwnds_for_pid (pid):

  def callback (hwnd, hwnds):
    if IsWindowVisible (hwnd) and IsWindowEnabled (hwnd):
      _, found_pid = win32process.GetWindowThreadProcessId (hwnd)
      if found_pid == pid:
        hwnds.append (hwnd)
    return True

  hwnds = []
  win32gui.EnumWindows (callback, hwnds)
  return hwnds

if __name__ == '__main__':
  notepad = subprocess.Popen ([r"notepad.exe"])
  #
  # sleep to give the window time to appear
  #
  time.sleep (2.0)

  for hwnd in get_hwnds_for_pid (notepad.pid):
    print hwnd, "=>", win32gui.GetWindowText (hwnd)
    win32gui.SendMessage (hwnd, win32con.WM_CLOSE, 0, 0)

SQL - always learning

I recently attended a course for SQL Server programmers upgrading from SQL Server 2000 to 2005 (as we are at work). Naturally enough I thought that I’d know everything there was to know about the existing product but would find out about the new stuff in 2005. To my surprise we learnt a technique entirely by the way which has been there forever.

If you do this kind of thing:

DECLARE @v VARCHAR (1000)
SET @v = ''
SELECT @v = @v + a.x + ' ' FROM a WHERE a.y = z

then @v will be a space-separated string of all the values of the the x column in table a. I always assumed it would simply give me the first or the last or some arbitrary value.

You live and learn.

Essential Firefox & Thunderbird extensions

I use Firefox & Thunderbird as my main browser and mail client (respectively!) They’re both useful enough out of the box, but loads of people have published even more useful extensions which “enhance the user experience” as the blurb goes. I have a number of other addons installed, but my essentials are:

Thunderbird

  • Nostalgy - forget about the folder window; just a few keystrokes will do everything you need.
  • Correct Identity - I serve a number of email addresses (from completely different domains) through one mailbox. This extension makes a good guess as to which hat I should be wearing to reply to an email, based on the “To” address.

Firefox

  • Sage - no matter what other RSS reader I try, I keep coming back to Sage’s simplicity. It lacks a couple of things I might like, such as the ability to vary the refresh interval by feed, but it does what it does simply and well.
  • IE Tab - indispensable if you’re developing web pages or just trying to view the ones out there.
  • CSS Viewer - not as powerful as Firebug, but very handy for a quick check as to what’s going on with your page’s styling.