A junior A asked me, "I want to create a dictionary with a hierarchical structure in Python." Of course, "Jean can do that in 5 seconds" I thought. But in the end, it took 30 minutes to answer the question. I spent about 30 minutes explaining it. Also, if you are asked to implement it, you will forget it, so I would like to keep it as a memo.
In the first place, your junior's question is not just that you want to make a double-structured or triple-structured dictionary, but you receive the following input.
example.csv
A1,B1,C1,3
A1,B1,C2,1
A1,B1,C3,5
A1,B2,C1,4
A1,B2,C2,3
A1,B2,C3,1
A1,B3,C1,3
A1,B3,C2,2
A1,B3,C3,5
A2,B1,C1,3
A2,B1,C2,5
A2,B1,C3,3
A2,B2,C1,2
A2,B2,C2,1
A2,B2,C3,3
A2,B3,C1,4
A2,B3,C2,4
A2,B3,C3,5
He wanted to automatically create the following hierarchical dictionary.
{'A1': {'B1': {'C1': 1,
'C2': 1,
'C3': 3},
'B2': {'C1': 3,
'C2': 3,
'C3': 4},
'B3': {'C1': 2,
'C2': 5,
'C3': 5}},
'A2': {'B1': {'C1': 4,
'C2': 4,
'C3': 1},
'B2': {'C1': 1,
'C2': 3,
'C3': 3},
'B3': {'C1': 4,
'C2': 2,
'C3': 3}}}
So, I had a hard time with this. The first thing I came up with was how to use defaultdict.
import collections
hoge = collections.defaultdict(lambda : collections.defaultdict(lambda : collections.defaultdict(int))
You can create a triple nested dict by using a lambda expression like this. However. This method is subtle. In the first place, it is necessary to know the number of hierarchies in advance, and the number of hierarchies may differ depending on the element, so it is not very general. Furthermore, it is difficult to pickle the defaultdict that defines the default value in the lambda expression. I thought about the following method.
import pprint
def make_tree_dict(inputs):
tree_dict = {}
for i, ainput in enumerate(inputs):
pre_dict = tree_dict
for j, key in enumerate(ainput):
if j == len(ainput)-2:
pre_dict[key] = ainput[-1]
break
elif key not in pre_dict:
pre_dict[key] = {}
else:
pass
pre_dict = pre_dict[key]
return tree_dict
if __name__ == "__main__":
pp = pprint.PrettyPrinter(width=10,compact=True)
inputs = []
with open("example.csv") as f:
for line in f:
line = line.rstrip().split(",")
inputs.append(line)
hoge = make_tree_dict(inputs)
pp.pprint(hoge)
By actually running the above program, you can get the output of the hierarchical dict as shown above. It's a strange program that the contents of tree_dict are updated even though it is never directly assigned to tree_dict, but it works. I thought I'd post a commentary, but I don't have time, so this time ...
Incidentally, the above script can be applied to inputs with different numbers of layers for each element as shown below.
example2.csv
A1,B1,C1,1
A1,B1,C2,D1,3
A1,B1,C3,5
A1,B2,C1,D1,5
A1,B2,C2,2
A1,B2,C3,5
A1,B3,C1,2
A1,B3,C2,D1,4
A1,B3,C2,D2,10
A1,B3,C3,2
A2,B1,C1,4
A2,B1,C2,D1,5
A2,B1,C3,5
A2,B2,C1,D1,6
A2,B2,C2,3
A2,B2,C3,D1,8
A2,B3,C1,2
A2,B3,C2,5
A2,B3,C3,4
You can get a dict like this
example2_output
{'A1': {'B1': {'C1': '1',
'C2': {'D1': '3'},
'C3': '5'},
'B2': {'C1': {'D1': '5'},
'C2': '2',
'C3': '5'},
'B3': {'C1': '2',
'C2': {'D1': '4',
'D2': '10'},
'C3': '2'}},
'A2': {'B1': {'C1': '4',
'C2': {'D1': '5'},
'C3': '5'},
'B2': {'C1': {'D1': '6'},
'C2': '3',
'C3': {'D1': '8'}},
'B3': {'C1': '2',
'C2': '5',
'C3': '4'}}}
I will add this section if I have time. .. ..
Recommended Posts