最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How to convert nested dictionary to specific csv format for use in outside program? (visualization of Markov Network) -

programmeradmin1浏览0评论

Having issues with getting nested dictionary (ngrams) to output to specific format for use in diff program (cytoscape if anybody is curious lol). I need to preserve the values of the dictionary because I'm using a hidden markov model as the basis for a small language model that generates bad poetry, and I need to show the network that the slm is moving through for reasons.

My brain is too dead at the moment to properly explain a markov model/markov chain, but essentially it looks at the order of a system of 'randomly' occurring events, and gives the probability of specific events happening consecutively.

Currently print(ngrams) outputs as follows:

{'a': {'b': 1, 'c': 2, 'd': 1}, 'b': {'a': 3}, 'c': {'b': 2, 'a': 2}, 'd': {'d': 1, 'b': 1}}

The nested dictionary names are the initial word (source node), the keys are the words that follow the source node (target node), and the values are the number of times the interaction occurs (number of duplicates). In the end csv, all interactionTypes will be

The content of the csv file must be as follows to be read by my network mapping program:

sourceNode,interactionType,targetNode
a,<interactsWith>,b
a,<interactsWith>,c
a,<interactsWith>,c
a,<interactsWith>,d
b,<interactsWith>,a
b,<interactsWith>,a
b,<interactsWith>,a
c,<interactsWith>,b
c,<interactsWith>,b
c,<interactsWith>,a
c,<interactsWith>,a
d,<interactsWith>,d
d,<interactsWith>,b

first tried to output each of the source nodes as a list (aka names of subdictionaries as list), expected output: ['a', 'b', 'c', 'd']

wowChart = {
    'a' : {
        'b' : 1,
        'c' : 2,
        'd' : 1
    },
    'b' : {
        'a' : 3 
    },
    'c' : {
        'b' : 2,
        'a' : 2
    },
    'd' : {
        'd' : 1,
        'b' : 2
    }

}

sourceNode = list(wowChart.items())
print(sourceNode)

actual output:

[('a', {'b': 1, 'c': 2, 'd': 1}), ('b', {'a': 3}), ('c', {'b': 2, 'a': 2}), ('d', {'d': 1, 'b': 2})]

I kind of knew this wouldn't work, next step is for x, obj in wowChart.items(): and go from there.

Not sure why I'm first trying to make lists of names, keys and values to then recombine them to get desired output, theres almost definitely better ways to do this but my mind is not working. probs a skill issue tbh, i'm new to programming and haven't used pandas yet.

Having issues with getting nested dictionary (ngrams) to output to specific format for use in diff program (cytoscape if anybody is curious lol). I need to preserve the values of the dictionary because I'm using a hidden markov model as the basis for a small language model that generates bad poetry, and I need to show the network that the slm is moving through for reasons.

My brain is too dead at the moment to properly explain a markov model/markov chain, but essentially it looks at the order of a system of 'randomly' occurring events, and gives the probability of specific events happening consecutively.

Currently print(ngrams) outputs as follows:

{'a': {'b': 1, 'c': 2, 'd': 1}, 'b': {'a': 3}, 'c': {'b': 2, 'a': 2}, 'd': {'d': 1, 'b': 1}}

The nested dictionary names are the initial word (source node), the keys are the words that follow the source node (target node), and the values are the number of times the interaction occurs (number of duplicates). In the end csv, all interactionTypes will be

The content of the csv file must be as follows to be read by my network mapping program:

sourceNode,interactionType,targetNode
a,<interactsWith>,b
a,<interactsWith>,c
a,<interactsWith>,c
a,<interactsWith>,d
b,<interactsWith>,a
b,<interactsWith>,a
b,<interactsWith>,a
c,<interactsWith>,b
c,<interactsWith>,b
c,<interactsWith>,a
c,<interactsWith>,a
d,<interactsWith>,d
d,<interactsWith>,b

first tried to output each of the source nodes as a list (aka names of subdictionaries as list), expected output: ['a', 'b', 'c', 'd']

wowChart = {
    'a' : {
        'b' : 1,
        'c' : 2,
        'd' : 1
    },
    'b' : {
        'a' : 3 
    },
    'c' : {
        'b' : 2,
        'a' : 2
    },
    'd' : {
        'd' : 1,
        'b' : 2
    }

}

sourceNode = list(wowChart.items())
print(sourceNode)

actual output:

[('a', {'b': 1, 'c': 2, 'd': 1}), ('b', {'a': 3}), ('c', {'b': 2, 'a': 2}), ('d', {'d': 1, 'b': 2})]

I kind of knew this wouldn't work, next step is for x, obj in wowChart.items(): and go from there.

Not sure why I'm first trying to make lists of names, keys and values to then recombine them to get desired output, theres almost definitely better ways to do this but my mind is not working. probs a skill issue tbh, i'm new to programming and haven't used pandas yet.

Share Improve this question asked Mar 17 at 22:17 ranchhranchh 212 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

My first instinct would be to use a nested enumeration of sorts, which Python is good for. I don't explicitly use enumerate() in the below code because I don't believe you are interested in the indices for your target and source Nodes.

import csv

ngrams = {
    "a": {"b": 1, "c": 2, "d": 1},
    "b": {"a": 3},
    "c": {"b": 2, "a": 2},
    "d": {"d": 1, "b": 1},
}

with open("my_csv.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["sourceNode", "interactionType", "targetNode"])
    # Iterate over sourceNode keys and interactions
    for key, di in ngrams.items():
        # Iterate over targetNode keys and interaction numbers
        for kj, vj in di.items():
            for k in range(vj):
                # Writing row using desired sourceNode, interactionType, targetNode format
                writer.writerow([key, "<interactsWith>", kj])

This assumes you are using Python 3.x. If you're using Python 2.x, then change 'in ngrams.items()' to 'in ngrams.iteritems()' and change 'in di.items()' to 'in di.iteritems()'.

Here is how I approach it

Code:

import csv

ngrams = {
    "a": {"b": 1, "c": 2, "d": 1},
    "b": {"a": 3},
    "c": {"b": 2, "a": 2},
    "d": {"d": 1, "b": 1},
}

with open("/tmp/out.csv", "w") as buf:
    writer = csv.writer(buf)
    writer.writerow(["sourceNode", "interactionType", "targetNode"])

    for source, targets in ngrams.items():
        for target, count in targets.items():
            rows = [[source, "<interactsWith>", target]] * count
            writer.writerows(rows)

Notes

  • The algorithm is simple and straight forward
  • The line rows = [[ ... ]] * count creates count identical rows

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论