Archive for SQL

Passing params to db-api queries

Falling mostly into the aide-memoire category, but in case it’s helpful to anyone else…

You have a more-or-less complex SQL query which you’re executing via, eg, pyodbc (or some other dbapi-compliant module) and you need to pass in a set of positional parameters. So you have a where clause which looks something like this (although with better names, obviously):

WHERE
(
  t1.x = ? AND
  (t2.y = ? OR (t3.z = ? AND t2.y < ?))
)
OR
(
  t1.x > ? AND
  (t2.y BETWEEN ? AND ?)
)

So your Python code has to pass in seven parameters, in the right order, several of which are probably the same value. And then you realise that the WHERE clause is slightly wrong. So you adjust it, but now you have eight parameters, and two of the previous ones have changed, and there’s a new one. And then…

There’s no way to use named params with pyodbc, so you end up with a list/tuple of positional parameters which you have to eyeball-match up with the corresponding question marks in the query:

import pyodbc

...

cursor.execute (
  SQL, [
  from_date, threshold, threshold, to_date, interval, threshold]
)

Unless… you use a derived table in the query and use that to generate pseudo-named parameters. This is possible in MSSQL; I don’t know if it would work with other databases, although I can’t see why not. So your code becomes something like (NB no attempt at consistency here; it’s an example):

SELECT
  *
FROM
  t1
JOIN t2 ON t2.t1_id = t1.id
JOIN
(
  SELECT
    from_date = ?,
    to_date = ?,
    max_value = ?,
    interval = ?,
    threshold = ?
) AS params ON
(
  t1.x = params.from_date AND
  (t2.y = params.threshold OR
    (t3.z = params.interval AND t2.y < params.to_date)
  )
)
OR
(
  t1.x > params.threshold AND
  (t2.y BETWEEN params.from_date  AND params.to_date)
)

All you need to do then is to line up the order of params in your cursor.execute with the order of columns in the params derived table.

Alternatives? Well, you could use an ORM of some sort — goodness knows there are enough of them about — but maybe, like me, you find that learning another syntax for something which you can do perfectly well in its native SQL is onerous. Another approach is to set up local variables in your executed statement and use these in much the same way, eg:

DECLARE
  @v_from_date DATETIME,
  @v_to_date DATETIME,
  @v_threshold INT

SELECT
  @v_from_date = ?,
  @v_to_date = ?,
  @v_threhold = ?

SELECT
  *
FROM
  ..
WHERE
  (t1.x < @v_from_date ...)

This works (and is, in fact, how we generate lightweight SQL-to-Excel reports). But there’s a bit more boilerplate involved.

Back-end to Front-end: a SQL developer’s perspective

More SQL than Python-related, but in fact the immediate inspiration came from this month’s PyMag, which is a bit database-oriented. In his article on good practice in database design, Brian Jones touches on an issue which I’ve felt many times before. There are, let us say, two kinds of developer: those who are primarily interested in the front-end and for whom a database is merely a convenient back-end store; and those who are primarily interested in the data, and for whom any front-end is merely a means to view that data. Obviously I’m polarising the positions here, but it’s surprising how often you can look at a third-party app and get an idea of which camp its developers belong to.

I style myself a database developer. This doesn’t mean I despise front-ends and prefer to look, Neo-like, at streams of data falling in green down my screen. (Although sometimes…) Rather, it means that I’m much more concerned about the data structures and their interactions at the database level, and keen that the UI do a good service by the data. This may of course reflect my almost complete lack of visual design flair: the best thing you can say about my interfaces is that they are workmanlike and competent, conforming as far as possible to established UI guidelines. They certainly don’t have any kind of panache.

To some extent this also reflects in my slowness to take up the various ORMs on offer, although I have recently started using Elixir for a personal project, and found it very workable. In general, though, I prefer my data raw, not cooked, and the typical examples of ORMs tend to be aimed at people who find data a bit of an embarrassment and want to get it out of the way as quickly as possible :)

Little gems in SQL Server 2005

I know it’s clear I don’t have all that much time for the “innovations” in SQL Server 2005, but there are two syntax additions which I’ve been yearning for for years. (Since I moved from a company which used Oracle, in fact). These are: INTERSECT and EXCEPT (the latter known as MINUS in other dialects of SQL). They complete the list of set operations which has always included UNION. And the practical use is much clearer code in some circumstances.

We have, for example, a set of data on our database and we want to know what’s been added compared with a backup set from a few days back. So:

SELECT * FROM bus_barcode EXCEPT SELECT * FROM bus_barcode_bak

What could be simpler? Throw in a UNION ALL (and some messiness to cope with operator precedence) and you have symmetric difference:

SELECT * FROM (
  SELECT * FROM bus_barcode EXCEPT SELECT * FROM bus_barcode_bak
) AS live_bak
UNION ALL
SELECT * FROM (
  SELECT * FROM bus_barcode_bak EXCEPT SELECT * FROM bus_barcode
) AS bak_live

Brilliant!

Using MSSQL from within Python

As you may have gathered from earlier posts I’m a professional database developer, currently using Microsoft SQL Server 2000 / 2005. Wherever it makes sense, I use Python to access and manipulate the data. Over the years I’ve tried out several different libraries for connecting to SQL Server so I present below what seems to be the current landscape.

ODBC Solutions

ntwdblib.dll Solutions

Other Solutions

My own default choice these days is pyodbc. It’s active and pretty much feature-complete (they added the .nextset method at my request). It’s also the dbapi module of choice for sqlalchemy’s mssql backend. ceODBC is a relative newcomer, and it’s not clear what it offers which the others don’t. mxODBC is of course a stable and mature product which recently released v3, and probably the most fully-featured, but it does require a commercial license.

The ntwdblib solutions all suffer from the fact that MS are removing support for that method of db access after MSSQL 2005 (and even in SQL2005 it’s only there in SQL2000-compatibility mode), so their days are numbered. In addition, the Object Craft module hasn’t been updated for 3 years with no supplied binary past Python 2.3. (I used it for years before that so I can attest to its robustness). The pymssql module also hasn’t updated in about a year, and does have some slightly funny issues with Unicode.

Of the two other solutions, the osql one is a bit of a curiosity, parsing the output from SQL Server’s commandline tools. As is noted in the comments on the recipe, it’s mostly handy when you’ve got no other possibilities, perhaps in an admin situation. The adodbapi module was moribund for several years, but has recently been picked up again by Vernon Cole who’s resurrected the SourceForge project and applied some of the outstanding patches. That said, I’m not aware that it offers anything which I can’t achieve with an ODBC solution.

Update (June 2008): adodbapi is now maintained as part of the pywin32 packages, as of release 211.

SQL - always learning

I recently attended a course for SQL Server programmers upgrading from SQL Server 2000 to 2005 (as we are at work). Naturally enough I thought that I’d know everything there was to know about the existing product but would find out about the new stuff in 2005. To my surprise we learnt a technique entirely by the way which has been there forever.

If you do this kind of thing:

DECLARE @v VARCHAR (1000)
SET @v = ''
SELECT @v = @v + a.x + ' ' FROM a WHERE a.y = z

then @v will be a space-separated string of all the values of the the x column in table a. I always assumed it would simply give me the first or the last or some arbitrary value.

You live and learn.