JP Vossen via plug on 10 Jul 2021 09:39:43 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Python nested dict data structure


Thanks Victor, that's much more what I thought I should be able to do!

I had to add the `company` key back in to the second part, but then it works at that level.  It took me several tries to get it to work at a level below that, but that's probably just me.  I also find this a bit hard to read, but that's also probably just me and might be better with more meaningful real-world variable names.

The only flaw is that encountered field order in input files matters, which I'll explain in a moment.

Modified code (works for Python2 or 3):
```
     1	#!/usr/bin/env python2
     2	# dict2.py--do I REALLY have to do all this crap just for NESTED dicts!?
     3	# 2021-07-10
     4	# From Victor in EM "Re: [PLUG] Python nested dict data structure"
     5	
     6	import json
     7	from collections import defaultdict
     8	
     9	d = dict()
    10	
    11	# Main
    12	company = 'Acme Inc'  # Key in both (all) files
    13	
    14	# If "company" key exists, add or set the "region" subkey;
    15	# else the "company" key should default to an empty dictionary then
    16	# add or set the "region" subkey.
    17	#
    18	# First read file 1, containing: Company\tRegion\tOther-stuff-I-don't-care-about-here
    19	d.setdefault(company, dict())['region'] = 'US'
    20	
    21	# If "counter" key exists, get its current value and add X;
    22	# If "counter" does not exist, return 0 as the default and add X.
    23	#
    24	# ...LATER...read file 2, containing *multiple records* of: Company\tthis\tthat\Counter
    25	d[company]['counter'] = d[company].get('counter', 0) + 2
    26	d[company]['counter'] = d[company].get('counter', 0) + 3
    27	
    28	# ...STILL LATER...read file 3, containing even more crazy stuff
    29	d[company].setdefault('subkey', dict())['subsubkey'] = 'foo'
    30	d[company]['subkey']['subsubint'] = d[company]['subkey'].get('subsubint', 0) + 6
    31	
    32	#print(d)
    33	print(json.dumps(d, indent=2, sort_keys=True))  # Pretty but needs: import json
```

Expected and desired output:
```
{
  "Acme Inc": {
    "counter": 5,
    "region": "US",
    "subkey": {
      "subsubint": 6,
      "subsubkey": "foo"
    }
  }
}
```

That's all great.  But if I move line 30 above line 29, it fails:

```
$ ./dict2.py
Traceback (most recent call last):
  File "/home/jp/MyDocs/HOME/CODE/Python/dict2.py", line 29, in <module>
    d[company]['subkey']['subsubint'] = d[company]['subkey'].get('subsubint', 0) + 6
KeyError: 'subkey'

```

The same would happen a the `company` level too.  I know why...it's the original problem, the key doesn't exist yet.  My problem is that I am cherry-picking fields out of 2+ files and my input might be in any order.

This longer code works in any order, as you'll see because I moved the lines around without changing the comments (works for Python2 or 3):
```
#!/usr/bin/env python2
# dict.py--do I REALLY have to do all this crap just for NESTED dicts!?
# JP, 2021-07-02

# https://www.geeksforgeeks.org/python-nested-dictionary/
# but... https://stackoverflow.com/questions/1024847/how-can-i-add-new-keys-to-a-dictionary
# And >EM "[PLUG] Python nested dict data structure" 2021-07-05

import json

mydict = {}  # Not a good idea to use "dict"

# Add Nested Key/Value pair to mydict
def add_nkv(mydict, key, subkey, value):
    if key not in mydict:         # Key must exist or error
        mydict[key] = {}          # Create empty sub-mydict
    mydict[key][subkey] = value   # Add value

# Accumulate a value in a nested mydict
def acc_nkv(mydict, key, subkey, value):
    if key not in mydict:              # Parent key must exist or error
        mydict[key] = {}               # Create empty sub-mydict
    if subkey not in mydict[key]:      # Subkey must exist or error
        mydict[key][subkey]  = value   # Add new value
    else:
        mydict[key][subkey] += value   # Accumulate value

# Main
company = 'Acme Inc'  # Key in both (all) files

# ...LATER...read file 2, containing *multiple records* of: Company\tthis\tthat\Counter
acc_nkv(mydict, company, 'counter', 2)   # Does: mydict[company]['counter'] += val
acc_nkv(mydict, company, 'counter', 3)

# First, read file 1, containing: Company\tRegion\tOther-stuff-I-don't-care-about-here
add_nkv(mydict, company, 'region', 'US') # Does: mydict[company]['region']   = val

# ...STILL LATER...read file 3, containing even more crazy stuff
acc_nkv(mydict[company], 'subkey', 'subsubint', 6)     # Does: mydict[company]['subkey'][subsubint' += val
add_nkv(mydict[company], 'subkey', 'subsubkey', 'foo') # Does: mydict[company]['subkey'][subsubkey' = 'foo'

#print(mydict)  # Simple
print(json.dumps(mydict, indent=2, sort_keys=True))  # Pretty but needs: import json
```

Output:
```
$ ./dict.py
{
  "Acme Inc": {
    "counter": 5,
    "region": "US",
    "subkey": {
      "subsubint": 6,
      "subsubkey": "foo"
    }
  }
}
```


On 7/6/21 10:40 AM, Victor via plug wrote:
# Add/inc_nstkv a value in a nested d
def inc_nstkv(d, key, subkey, value):
      d.setdefault(key, dict())
      d[subkey] = d.get(subkey, 1) + 1
*************************************
d[subkey] = d.get(subkey, 0) + value
*************************************
### But that doesn't use `value`.  I guess I should have called
### it accumulate and not increment.  This doesn't work:
        d[subkey] = d.get(subkey, 1) += value
### And I think it overwrites the region value, depending on order
### (which will be unpredictable).

Oops, you're right that .get() should be using 0 as the default and
+value; fixed inline above. What led me to think of .setdefault() and
.get() was that you wrote their exact logic using other code plus the
explanation that you're pulling data from multiple sources where you
don't have a uniform dictionary output in mind. You can even eliminate
the functions you created entirely using .setdefault() and .get(), but
it's up to you if that diminishes readability. Example below.

And I think it overwrites the region value, depending on order
I don't believe that's a problem, but maybe I'm not understanding your
expected output.

```
d = dict()
company = 'Acme Inc'

# If "company" key exists, add or set the "region" subkey; else the
"company" key should default to an empty dictionary then add or set
the "region" subkey.
d.setdefault(company, dict())['region'] = 'US'

# If "counter" key exists, get it's current value and add X; if
"counter" does not exist, return 0 as the default and add X.
d['counter'] = d.get('counter', 0) + 1
d['counter'] = d.get('counter', 0) + 2

print(d)
```
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
Thanks,
JP
--  -------------------------------------------------------------------
JP Vossen, CISSP | http://www.jpsdomain.org/ | http://bashcookbook.com/
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug