jeudi 9 avril 2015

dict.setdefault versus defaultdict

There is something annoying with dict in Python: when you want to use a key that is not defined, you get a KeyError. The solution is to be able to get a default value for the key when no value has been defined. Actually, the solution is made of two solutions.

The first one is to use dict.setdefault method. It makes the dict return a value for an undefined key:

In [3]: t = {}

In [4]: a = t.setdefault("foo", "bar")

In [5]: a
Out[5]: 'bar'

(If you ask, this is IPython's output. You must use IPython as a REPL)

The thing is that using this solution makes the assignation cumbersome. For instance, when I want to count or agregate values for a given key, I cannot do it this way.

In [6]: t = {}

In [7]: t.setdefault("foo", 0) += 1
  File "<ipython-input-7-15225840c433>", line 1
      t.setdefault("foo", 0) += 1
                                 ^
 SyntaxError: can't assign to function call
 # shit.

So you'll need some heavier stuff that is collections.defaultdict. Defaultdict is a dict-like object that returns a default value provided by a factory function for every undefined key. You just need to import it and set up the factory (I'll do it with a lambda).

In [8]: from collections import defaultdict

In [9]: dd = defaultdict(lambda: 0)

In [10]: dd["foo"]
Out[10]: 0

In [11]: dd["bar"] += 1

In [12]: dd["bar"]
Out[12]: 1
# Hooray!

To conclude, you'll often need to set a default value when querying an unknown key from a dict. To do so, the most flexible way might be the defaultdict as you can use it like a standard dictionary.

Last remark: collections package contains a lot of cool stuff. For instance, if you actually want to count (hashable) elements from an iterable, you should look at collections.Counter that does the job directly for you.