codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Multidimensional dicts


On 06Sep2019 22:48, Ralf M. <Ralf_M at t-online.de> wrote:
>Recently I wrote a quick and dirty script to do some counting and 
>statistics. When I re-read it a bit later I noticed that I had been 
>using two different ways to create two-dimensional (default-)dicts. 
>Now I'm wondering whether one of them is "better" or more pythonic 
>than the other.
>
>What I did:
>
>ddd_a = collections.defaultdict(set)
>ddd_a[(key1, key2)].add(foo)
>
>ddd_b = collections.defaultdict(lambda: collections.defaultdict(set))
>ddd_b[key1][key2].add(foo)
>
>Both work as expected.
>
>Trying to think about differences I only noticed that ddd_a more 
>easily generalises to more dimensions, and ddd_b has the benefit that 
>ddd_b[key1] is a dict, which might help if one "row" needs to be fed 
>to a function that expects a dict.
>
>More general ddd_a looks more symmetric (key1 and key2 are 
>exchangeable, if done consistently) and ddd_b looks more hierarchic 
>(like a tree traversed from root to leaves where key1, key2 etc. 
>determine which way to go at each level). ddd_b also is more simmilar 
>to how two-dimensional lists are done in python.
>
>Any recommendations / comments as to which to prefer?

As you'd imagine, it depends on what yuou're doing.

If (key1,key2) are a Cartesian-like "space" of value, for example the 
domain of key2 values it the same regardless of key1, I lean toward 
(key1,key2).

If (key1,key2) are a tree like structure such as the clause names and 
field values form a .ini config file:

  [clause1]
  field1 = 1
  field2 = 3

  [clause2]
  field2 = 9

I lean towards the ddd_b[key1][key2] approach.

So: are they a "flat" space or a tree structure? The is my normal rule 
of thumb for deciding how to key things.

In particular, if you need to ask "what are the key2 values for 
key1==x?" then you might want a tree structure.

The ddd_a (key1,key2) approach is easier to manage in terms of creating 
new nodes. OTOH, using a nested defaultdict can handle that work for 
you:

  ddd_d = defaultdict(lambda: defaultdict(int))

(Pick a suitable type in place of "int" maybe.)

Cheers,
Cameron Simpson <cs at cskk.id.au>