Will via plug on 5 Jul 2021 17:06:49 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Python nested dict data structure


JP,

If you want you can do a write to create a a dictionary key. Using an IndexError on write, you can create the keys necessary. Not my ideal way... But I have used it production. 

-Will C

On Mon, Jul 5, 2021, 19:59 JP Vossen via plug <plug@lists.phillylinux.org> wrote:
First, thanks to everyone who replied.  Second, sorry about the delay, life happened a couple of times.

I think what I've learned is that no, there's not a better way to do it, at least for me and my use cases.  That's good to know, thanks.

Note I'd never use `dict` in production, but I changed it to `mydict` in my example just to be safe.

This makes my example output easier to sanity check, thanks for the reminder Will:
```
print(json.dumps(mydict, indent=2, sort_keys=True))  # Pretty but needs: import json
```


#------------------------------------------------------
On 7/2/21 4:06 PM, brent saner wrote:
> If you *know* that that level will/should be a dict:
> d['foo'] = ({'bar': 'baz'} if not d.get('foo') else d.update({'bar': 'baz'}))
>
> It's still a PITA but it's less cycles and vertical space.

I do know that level will be a dict, so I think that might work for:
```
mydict[company] = ({'region': 'US'} if not mydict.get(company) else mydict.update({'region': 'US'}))
```

But I'm not 100% how the accumulator would work there.  Maybe via a lambda in there somewhere.  I'm also not sure that's easier to read, at least for me.  :-)

> You could also subclass DictObject (? It might just be Dict) or OrderedDict and add a custom method or replace the associated methods.
> https://stackoverflow.com/questions/2390827/how-to-properly-subclass-dict-and-override-getitem-setitem

I get the idea there, but that seems like even MORE work, and more messing with internals.  (OK, sort-of.  I know those capabilities exist for a reason, but...)

> sent from my toaster.
:-)


#------------------------------------------------------
On 7/2/21 4:14 PM, Victor via plug wrote:
> Using the dictionary methods .setdefault() and .get() [1] is much more
> pythonic, but it is confusing the first few times you are mangling
> dictionaries like this. Merge and Update using | and |=, respectively,
> are new in 3.8 (I think) and worth playing with. I haven't gotten to
> use either of those much yet, so I don't know that they apply in your
> situation.
>
> ```
> #!/usr/bin/env python3
> # d.py--do I REALLY have to do all this crap just for NESTED ds!?
>
> # dict is reserved. it works, but I avoid using it.
### Good point, thanks.
> d = {}
>
> # Add Nested Key/Value pair to d
> def add_nstkv(d, key, subkey, value):
>      d.setdefault(key, dict())
>      d[key][subkey] = value   # Add value
### OK, got that

> # Add/inc_nstkv a value in a nested d
> def inc_nstkv(d, key, subkey, value):
>      d.setdefault(key, dict())
>      d[subkey] = d.get(subkey, 1) + 1
### But that doesn't use `value`.  I guess I should have called
### it accumulate and not increment.  This doesn't work:
        d[subkey] = d.get(subkey, 1) += value
### And I think it overwrites the region value, depending on order
### (which will be unpredictable).

> company = 'Acme Inc'
> add_nstkv(d, company, 'region', 'US') # Does: d[company]['region']   = val
> inc_nstkv(d, company, 'counter', 1)   # Does: d[company]['counter'] += val
> inc_nstkv(d, company, 'counter', 2)
> print(d)
> ```
> [1] https://docs.python.org/3/library/stdtypes.html#typesmapping


#------------------------------------------------------
On 7/2/21 5:19 PM, Carlos M. Fernández wrote:
> What prevents you from doing this?:
>
>>>> d = dict()
>>>> d
> {}
>>>> company = "Acme, Inc."
>>>> company
> 'Acme, Inc.'
>>>> d[company] = {"region": "US", "counter": 1}
>>>> d
> {'Acme, Inc.': {'region': 'US', 'counter': 1}}
>>>> d[company]["counter"] += 1
>>>> d
> {'Acme, Inc.': {'region': 'US', 'counter': 2}}

That's how I started out, but since I don't have all the details at once, and I can't predict when I'll get them, or from which file, I ended up using keys that aren't there or overwriting data that was there.


#------------------------------------------------------
On 7/2/21 8:34 PM, Will wrote:
> Going off of what Carlos was saying. Is this what you're looking for? As long as you make sure each level is either a dict() or list(), you should be easy.
...

That's what I'm trying to do, but I'm coming at it from Perl where it Just Works, though in a much harder to read way.

Also, I'm not sure where that middle list came from.  I didn't intend it, and if I use this one (thanks for that reminder), my code gives:
```
...print(json.dumps(mydict, indent=2, sort_keys=True))

### Output:
{
   "Acme Inc": {
     "counter": 3,
     "region": "US",
   }
}
...


#------------------------------------------------------

On 7/3/21 2:03 PM, Steve Litt via plug wrote:
> Carlos M. Fernández via plug said on Fri, 2 Jul 2021 17:19:32 -0400
...
> The preceding is how I've always done it. I'm not sure why big data
> structures in Python would be cumbersome or difficult to comprehend.

They're not, they are much simpler than in Perl.  Except in creating them without key errors or overwriting data that's already in there.

> It's only necessary to remember that any data structure can be made
> from arrays (Python lists) and sets of key/value pairs (Python dicts).
> Just build them up.

Yup.  I mostly use hashes of hashes (sigh, dicts), because the keys are self-documenting, and not indexes you have to map someplace, even if just in your head.

If you get the data structure right, the rest of the program writes itself.  If you get the structure wrong, it and the program will fight you every step of the way.

> Another data point: Python complex structures and JSON convert
> losslessly to each other. Perhaps the OP (Original Poster) would want to
> build his initial structure in JSON, and then convert to a structure.
> IIRC Python's JSON library is part of Python's standard libraries.

I love that Python <> JSON thing, but the entire point here is that I'm reading multiple TAB or CSV files to pull out fields keyed on company from each, then assembling the data structure as I encounter the elements I need.  I do this kind of thing a lot, and almost none of it is JSON.

> Another technique I use is to create sub-dicts and sub-sub-dicts, etc,
> and when each is complete, bolt it onto the next higher level. More
> understandable that way.

That's sort-of how I solved the problem that led to this question.  I ended up using 2 different dicts, both with company as key, than assembling them at output time.  But I thought that was much more klunky, and much less scalable, since I now had to pass 2 objects in and out of some functions.  I can get away with that with 2, even if it's ugly.  But if I had to scale out much beyond that it gets too ugly.  It *should* work in in one dict-of-dicts.  And it clearly does, just with a LOT more code than Perl would need (for this part).


#------------------------------------------------------
On 7/3/21 4:38 PM, Hung Dinh via plug wrote:
> I would suggest the toolz library - https://toolz.readthedocs.io/en/latest/_modules/toolz/dicttoolz.html <https://toolz.readthedocs.io/en/latest/_modules/toolz/dicttoolz.html>
> There's useful functions in there for handling high order dictionary operations. Will look very familiar if you are coming from a functional background.

Bummer:
```
ModuleNotFoundError: No module named 'toolz'
```

> For example, if I would do something like the original script, it would look something like:
>
> from toolz.dicttoolz import assoc_in, update_in
> dict = {}
> company = 'Acme Inc'
> dict = assoc_in(dict, [company, 'region'], 'US') # note: it's immutable, so a new dict is created
> dict = update_in(dict, [company, 'counter'], lambda x: (x or 0) + 1)   # x or 1 because x will be None
> dict = update_in(dict, [company, 'counter'], lambda x: (x or 0) + 2)
> print(dict)

Hummmm, that's not very readable to me and `lambda`s make my head hurt.  :-)  I think something like that might solve the "new or update" issues elsewhere, though.  More thought needed.


Original post/question:
> On Fri, Jul 2, 2021, 15:51 JP Vossen via plug <plug@lists.phillylinux.org <mailto:plug@lists.phillylinux.org>> wrote:
>
>     If I want to read multiple files, and build a data structure like this by adding different keys and values at different times, and/or incrementing an existing numerical value:
>     ```
>     {'Acme Inc': {'region': 'US', 'counter': 3}}
>     ```
>
>     In Perl, but using Pythonic syntax, this would auto-instantiate and Just Work:
>     ```
>     dict = {}
>     company = 'Acme Inc'
>     dict[company]['region']   = 'US'
>     dict[company]['counter'] += ave
>     ```
>
>     Clean, and simple...except that Perl's syntax for nested data structures is virtually unreadable and really hard to write, for me at least.  Overall I'm liking Python, but there are some things, like this, that are hurting my head.
>
>     In Python it's really easy to create that structure at one time, but that is NOT how most of my use-cases work.  I'm always pulling fields out of 2-3 different files and building a data structure in pieces over time.  Like company and region are in 1 file, and a whole bunch of records containing something I want to count are in another file(s), with the 'company' key to link them.
>
>     This works in Python 3 and 2 (sad to say I still have old nodes with only Python2):
>     ```
>     #!/usr/bin/env python3
>     # dict.py--do I REALLY have to do all this crap just for NESTED dicts!?
>
>     dict = {}
>
>     # Add Nested Key/Value pair to dict
>     def add_nstkv(dict, key, subkey, value):
>           if key not in dict:         # Key must exist or error
>               dict[key] = {}          # Create empty sub-dict
>           dict[key][subkey] = value   # Add value
>
>     # Add/inc_nstkv a value in a nested dict
>     def inc_nstkv(dict, key, subkey, value):
>           if key not in dict:              # Parent key must exist or error
>               dict[key] = {}               # Create empty sub-dict
>           if subkey not in dict[key]:      # Subkey must exist or error
>               dict[key][subkey]  = value   # Add value
>           else:
>               dict[key][subkey] += value   # inc_nstkv
>
>     company = 'Acme Inc'
>     add_nstkv(dict, company, 'region', 'US') # Does: dict[company]['region']   = val
>     inc_nstkv(dict, company, 'counter', 1)   # Does: dict[company]['counter'] += val
>     inc_nstkv(dict, company, 'counter', 2)
>     print(dict)
>     ```
>
>     Output:
>     ```
>     $ ./dict.py
>     {'Acme Inc': {'region': 'US', 'counter': 3}}
>     ```
>
>     I've tried `dict = defaultdict()` and `dict = defaultdict(int)` but both fail since I have mixed string and int values.  I've tried all kinds of other crazy stuff and nothing else worked either.  Most Google results are too-trivial `d[k] = v` and don't help with nested structs.
>
>     Do I *really* have to write all that extra code every time I want to do something this simple?  Or am I missing something?
---end original post

Thanks,
JP
--  -------------------------------------------------------------------
JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug