Reverse sorting on arbitrary key segments
I have a database-y requirement to sort a list by a series of keys which are user-defined. Each of the keys could be a forward or a reverse sort.
In essence I have a list of dictionaries:
rows = [ dict (name="John", age=30, id=123), dict (name="John", age=40, id=456), dict (name="Fred", age=20, id=567), ]
and a set of keys from the user:
keys = [
("name", "asc"),
("age", "desc")
]
and I want to call sorted with a key function to end up with:
[ dict (name="Fred", age=20, id=567), dict (name="John", age=40, id=456), dict (name="John", age=30, id=123), ]
Obviously, I can’t just use reversed=True since that would reverse all or nothing. The main sorting page on the wiki rather oddly suggests that you sort against the first column and then against the second.
Which would seem to leave you with (ie age order descending):
[ dict (name="John", age=40, id=456), dict (name="John", age=30, id=123), dict (name="Fred", age=20, id=567), ]
[Update: As several people have pointed out, the page actually suggests sorting against the second column reversed and then against the first, relying on the fact that Python’s sort is stable. But meanwhile, granted my misunderstood premise…]
The sorting-lists-of-dictionaries page comes closer, but assumes that all the keys are numeric.
What I’ve actually done is to create a Reversed class which simply reverses the sense of calling __lt__:
class Reversed (object):
def __init__ (self, value):
self.value = value
def __lt__ (self, other):
return self.value > other.value
and then to implement a key function like this:
def sorter (row):
return tuple (
(row[col] if dir=="asc" else Reversed (row[col]))
for (col, dir) in keys
)
and sorted:
sorted (rows, key=sorter)
Now, all this certainly works, but is it a sane way to achieve the end? I was convinced that there should be something fancy I could do with a key function alone, avoiding an auxiliary class, but since the column values could be any datatype, including user-defined ones, I can’t just use the usual -number or -timedelta tricks.
If it looks like this is a useful technique, I’ll add it to the sorting wiki page. But I thought it best to subject it public scrutiny first. To many eyeballs… etc. etc.
Kent Johnson said,
Wrote on October 15, 2009 @ 5:11 pm
You have mis-read the sorting page on the wiki. It is confusing because the primary sort in the example is the second element, the secondary sort is the first element. What it is saying is sort in reverse order of significance of the keys. Because Python sort is guaranteed to preserve order of equal items, this will give the result you want.
In [1]: from operator import itemgetter
In [2]: rows = [
…: dict (name=”John”, age=30, id=123),
…: dict (name=”John”, age=40, id=456),
…: dict (name=”Fred”, age=20, id=567),
…: ]
In [3]: rows.sort(key=itemgetter(’age’), reverse=True)
In [4]: rows
Out[4]:
[{’age’: 40, ‘id’: 456, ‘name’: ‘John’},
{’age’: 30, ‘id’: 123, ‘name’: ‘John’},
{’age’: 20, ‘id’: 567, ‘name’: ‘Fred’}]
In [5]: rows.sort(key=itemgetter(’name’))
In [6]: rows
Out[6]:
[{’age’: 20, ‘id’: 567, ‘name’: ‘Fred’},
{’age’: 40, ‘id’: 456, ‘name’: ‘John’},
{’age’: 30, ‘id’: 123, ‘name’: ‘John’}]
Wrote on October 15, 2009 @ 5:38 pm
TL;DR Great! If Raymond Hettinger comes up with another way, use that, but yours is great!
What I like about your approach:
1) trivial to see that the sort is correct
With Python’s stable sort, there is always a combination of mixing these three [(A) ascending sorts on a key, (B) reversed sorts on a key, (C) reversing the list] that will do the job, but it can be confusing to work out the pattern, and I don’t know the logic to compute the patter given any list of keys and asc/desc.
The !!!testing!!! on any approach is the killer. If you are not obligated to be correct on the corner cases, it is easy to implement.
Your technique, it is either 100% correct, or it will fail plainly.
2) faster than “cmp”
usually, implementing a combination of ascending and descending sorts is done with the “cmp” option of providing “sort” a “cmp” function. If the first column is an ascending sort, your approach will blow away any “cmp” function, and I believe your technique will be faster in all cases, than using “cmp”
3) less code than an implementation that mixes sorts with reverses
If somebody codes up a correct implementation that mixes [(A) ascending sorts on a key, (B) reversed sorts on a key, (C) reversing the list] to do the work, it will be faster and use less resources than your technique. But it will take more code.
Ian Collins said,
Wrote on October 15, 2009 @ 6:00 pm
Sorting by each key in turn does work if you take the keys in reverse order i.e. in your example, by age (descending) and then by name (ascending). It works because Python’s sort algorithm is “stable”, meaning that if item A starts off above item B and for the purposes of the sort A and B evaluate equal, A will still be above B after the sort. I think I read somewhere that the sort algorithm is guaranteed to remain stable in future Python releases.
Wrote on October 15, 2009 @ 7:20 pm
In your particular case something like this (untested) could have worked too:
sorted(rows,key=lambda x:(x[’names’],-x[’age’]))
Travis Vaught said,
Wrote on October 15, 2009 @ 9:04 pm
Have you tried a numpy structured array, rather than a list of dicts? The final example on this page seems to do what you want:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
tim said,
Wrote on October 16, 2009 @ 8:59 am
Thanks to those who pointed out my misunderstanding of the wiki page. I knew that the Python sort was stable (altho’ I could never remember whether that’s guaranteed in all version / implementations or merely an artefact of the current CPython code). But I hadn’t appreciated the sort-backwards trick.
@pruebauno: As I mentioned, specific optimisations for known datatypes won’t help me for the general case I’m trying to achieve.
@Travis: I’ll have a look at that page but since I don’t have numpy or its relatives installed on any of the machines I use it would be quite an overhead to achieve one sort — no matter how efficiently :)
tim said,
Wrote on October 16, 2009 @ 9:05 am
@manuelmoeg: Thanks for the bouquets. I believe that the particular reason to avoid the cmp function approach (apart from the fact that it no longer exists in Python 3.x) is that it is called for *every* comparison while the key function is called only for every item in the iterable. If I’d had to decorate each item with an auxiliary class in a cmp function I think performance would have suffered unduly. (Altho’ I haven’t run any timings).