Having issues with getting nested dictionary (ngrams) to output to specific format for use in diff program (cytoscape if anybody is curious lol). I need to preserve the values of the dictionary because I'm using a hidden markov model as the basis for a small language model that generates bad poetry, and I need to show the network that the slm is moving through for reasons.
My brain is too dead at the moment to properly explain a markov model/markov chain, but essentially it looks at the order of a system of 'randomly' occurring events, and gives the probability of specific events happening consecutively.
Currently print(ngrams)
outputs as follows:
{'a': {'b': 1, 'c': 2, 'd': 1}, 'b': {'a': 3}, 'c': {'b': 2, 'a': 2}, 'd': {'d': 1, 'b': 1}}
The nested dictionary names are the initial word (source node), the keys are the words that follow the source node (target node), and the values are the number of times the interaction occurs (number of duplicates). In the end csv, all interactionTypes will be
The content of the csv file must be as follows to be read by my network mapping program:
sourceNode,interactionType,targetNode
a,<interactsWith>,b
a,<interactsWith>,c
a,<interactsWith>,c
a,<interactsWith>,d
b,<interactsWith>,a
b,<interactsWith>,a
b,<interactsWith>,a
c,<interactsWith>,b
c,<interactsWith>,b
c,<interactsWith>,a
c,<interactsWith>,a
d,<interactsWith>,d
d,<interactsWith>,b
first tried to output each of the source nodes as a list (aka names of subdictionaries as list), expected output: ['a', 'b', 'c', 'd']
wowChart = {
'a' : {
'b' : 1,
'c' : 2,
'd' : 1
},
'b' : {
'a' : 3
},
'c' : {
'b' : 2,
'a' : 2
},
'd' : {
'd' : 1,
'b' : 2
}
}
sourceNode = list(wowChart.items())
print(sourceNode)
actual output:
[('a', {'b': 1, 'c': 2, 'd': 1}), ('b', {'a': 3}), ('c', {'b': 2, 'a': 2}), ('d', {'d': 1, 'b': 2})]
I kind of knew this wouldn't work, next step is for x, obj in wowChart.items():
and go from there.
Not sure why I'm first trying to make lists of names, keys and values to then recombine them to get desired output, theres almost definitely better ways to do this but my mind is not working. probs a skill issue tbh, i'm new to programming and haven't used pandas
yet.
Having issues with getting nested dictionary (ngrams) to output to specific format for use in diff program (cytoscape if anybody is curious lol). I need to preserve the values of the dictionary because I'm using a hidden markov model as the basis for a small language model that generates bad poetry, and I need to show the network that the slm is moving through for reasons.
My brain is too dead at the moment to properly explain a markov model/markov chain, but essentially it looks at the order of a system of 'randomly' occurring events, and gives the probability of specific events happening consecutively.
Currently print(ngrams)
outputs as follows:
{'a': {'b': 1, 'c': 2, 'd': 1}, 'b': {'a': 3}, 'c': {'b': 2, 'a': 2}, 'd': {'d': 1, 'b': 1}}
The nested dictionary names are the initial word (source node), the keys are the words that follow the source node (target node), and the values are the number of times the interaction occurs (number of duplicates). In the end csv, all interactionTypes will be
The content of the csv file must be as follows to be read by my network mapping program:
sourceNode,interactionType,targetNode
a,<interactsWith>,b
a,<interactsWith>,c
a,<interactsWith>,c
a,<interactsWith>,d
b,<interactsWith>,a
b,<interactsWith>,a
b,<interactsWith>,a
c,<interactsWith>,b
c,<interactsWith>,b
c,<interactsWith>,a
c,<interactsWith>,a
d,<interactsWith>,d
d,<interactsWith>,b
first tried to output each of the source nodes as a list (aka names of subdictionaries as list), expected output: ['a', 'b', 'c', 'd']
wowChart = {
'a' : {
'b' : 1,
'c' : 2,
'd' : 1
},
'b' : {
'a' : 3
},
'c' : {
'b' : 2,
'a' : 2
},
'd' : {
'd' : 1,
'b' : 2
}
}
sourceNode = list(wowChart.items())
print(sourceNode)
actual output:
[('a', {'b': 1, 'c': 2, 'd': 1}), ('b', {'a': 3}), ('c', {'b': 2, 'a': 2}), ('d', {'d': 1, 'b': 2})]
I kind of knew this wouldn't work, next step is for x, obj in wowChart.items():
and go from there.
Not sure why I'm first trying to make lists of names, keys and values to then recombine them to get desired output, theres almost definitely better ways to do this but my mind is not working. probs a skill issue tbh, i'm new to programming and haven't used pandas
yet.
2 Answers
Reset to default 1My first instinct would be to use a nested enumeration of sorts, which Python is good for. I don't explicitly use enumerate() in the below code because I don't believe you are interested in the indices for your target and source Nodes.
import csv
ngrams = {
"a": {"b": 1, "c": 2, "d": 1},
"b": {"a": 3},
"c": {"b": 2, "a": 2},
"d": {"d": 1, "b": 1},
}
with open("my_csv.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["sourceNode", "interactionType", "targetNode"])
# Iterate over sourceNode keys and interactions
for key, di in ngrams.items():
# Iterate over targetNode keys and interaction numbers
for kj, vj in di.items():
for k in range(vj):
# Writing row using desired sourceNode, interactionType, targetNode format
writer.writerow([key, "<interactsWith>", kj])
This assumes you are using Python 3.x. If you're using Python 2.x, then change 'in ngrams.items()' to 'in ngrams.iteritems()' and change 'in di.items()' to 'in di.iteritems()'.
Here is how I approach it
Code:
import csv
ngrams = {
"a": {"b": 1, "c": 2, "d": 1},
"b": {"a": 3},
"c": {"b": 2, "a": 2},
"d": {"d": 1, "b": 1},
}
with open("/tmp/out.csv", "w") as buf:
writer = csv.writer(buf)
writer.writerow(["sourceNode", "interactionType", "targetNode"])
for source, targets in ngrams.items():
for target, count in targets.items():
rows = [[source, "<interactsWith>", target]] * count
writer.writerows(rows)
Notes
- The algorithm is simple and straight forward
- The line
rows = [[ ... ]] * count
createscount
identical rows