最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

java - Visualizing set hierarchies as color coded graphs - Stack Overflow

programmeradmin5浏览0评论

I have been reading quite a bit on graphing libraries for Java and Javascript lately but I haven't found a good way to do what I want to do.

Essentially I have a hierarchy of sets with regards to a bunch of elements (up to several thousands). These sets can be fully or partly overlapping, fully covering or completely disjoint from one another. What I would like to do is to display the following information:

  • The size of a set (in relation to the other sets)
  • A "heat" value (in color code) of a set calculated from the elements it covers
  • The full topology of the sets in a single graph (so that overlaps, intersections etc are displayed to the user)

Edit: Perhaps I should give an example of what I mean by sets and elements and partially overlapping hierarchies. The following is an over-simplified version of the kind of sets I deal with (note that numbers 1-10 and letters a-h and X represent elements which are comparable to one another):

Set1 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Set2 = {1, 2, 3, 4, 5, 6}
Set3 = {1, 2, 3}
Set4 = {1, 4, 5, 6, 7}
Set5 = {a, b, c, d, e, f, g, h}
Set6 = {a, b, c, d, e}
Set7 = {a, b, c, 7}
Set8 = {2, 4, 7, 8, c, f}
Set9 = {X}

I am not sure how I would go about displaying this information in an intuitive way. I have seen Voronoi ¹,² graphs which I really like visually, however they have a different mathematical background so I don't think I'll be able to portray the hierarchies I have in a proper manner. I would like to create these graphs during runtime (in case of Java) or using Javascript in case of HTML deployment, either is perfectly fine. One thing that is a constraint, however, is that the graphs need to be either created, or can be exportable, to high-res vector graphics.

My questions in short:

  1. Is there a nice way to visualize the kind of data I have? If so does it exist in a readily implemented form (i.e. a library)?
  2. If there is no easy solution to the problem, in other words if I need to invent my wheel in this case, how do I go about implementing such a graph myself? What is a good starting point? What do I pay extra attention to?

Thanks!

Edit: I potential idea I had was to layout all the elements in the universal set as a hexagonal grid with the desired color overlay, and then draw the boundaries for the sets. There are however several problems with that idea, in particular the problem of designating locations for the elements, so that the sets are not split all over the graph. Any comments/suggestions?

I have been reading quite a bit on graphing libraries for Java and Javascript lately but I haven't found a good way to do what I want to do.

Essentially I have a hierarchy of sets with regards to a bunch of elements (up to several thousands). These sets can be fully or partly overlapping, fully covering or completely disjoint from one another. What I would like to do is to display the following information:

  • The size of a set (in relation to the other sets)
  • A "heat" value (in color code) of a set calculated from the elements it covers
  • The full topology of the sets in a single graph (so that overlaps, intersections etc are displayed to the user)

Edit: Perhaps I should give an example of what I mean by sets and elements and partially overlapping hierarchies. The following is an over-simplified version of the kind of sets I deal with (note that numbers 1-10 and letters a-h and X represent elements which are comparable to one another):

Set1 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Set2 = {1, 2, 3, 4, 5, 6}
Set3 = {1, 2, 3}
Set4 = {1, 4, 5, 6, 7}
Set5 = {a, b, c, d, e, f, g, h}
Set6 = {a, b, c, d, e}
Set7 = {a, b, c, 7}
Set8 = {2, 4, 7, 8, c, f}
Set9 = {X}

I am not sure how I would go about displaying this information in an intuitive way. I have seen Voronoi ¹,² graphs which I really like visually, however they have a different mathematical background so I don't think I'll be able to portray the hierarchies I have in a proper manner. I would like to create these graphs during runtime (in case of Java) or using Javascript in case of HTML deployment, either is perfectly fine. One thing that is a constraint, however, is that the graphs need to be either created, or can be exportable, to high-res vector graphics.

My questions in short:

  1. Is there a nice way to visualize the kind of data I have? If so does it exist in a readily implemented form (i.e. a library)?
  2. If there is no easy solution to the problem, in other words if I need to invent my wheel in this case, how do I go about implementing such a graph myself? What is a good starting point? What do I pay extra attention to?

Thanks!

Edit: I potential idea I had was to layout all the elements in the universal set as a hexagonal grid with the desired color overlay, and then draw the boundaries for the sets. There are however several problems with that idea, in particular the problem of designating locations for the elements, so that the sets are not split all over the graph. Any comments/suggestions?

Share Improve this question edited Jul 25, 2012 at 20:00 ErikE 50.2k23 gold badges155 silver badges200 bronze badges asked Jul 2, 2012 at 15:52 posdefposdef 6,53211 gold badges49 silver badges97 bronze badges 14
  • 1 How many sets are we talking about? for small numbers, Symmetric Venn diagrams cover all the possibilities, but not especially paying heed to hierarchy – AakashM Commented Jul 2, 2012 at 16:35
  • hundreds for sure, in many cases close to a thousand and sometimes even more... – posdef Commented Jul 2, 2012 at 16:51
  • Can you describe what the sets represent and how the visualization will help with analysis? – orangepips Commented Jul 2, 2012 at 17:15
  • Maybe you should look at Matlab. – Garrett Hall Commented Jul 2, 2012 at 17:15
  • 1 Chord diagram might be a useful tool: mbostock.github.com/d3/ex/chord.html. Order the elements in each set, represent each set as an arc on the circle's edge, set intersections would be represented by chords between arcs, and perhaps chord color serves as a heat map to indicate degree of intersection. In that design there could be more than one chord drawn between a combination of arcs. – orangepips Commented Jul 3, 2012 at 17:03
 |  Show 9 more comments

4 Answers 4

Reset to default 10

Yes, this is a fairly well-studied problem. What you are describing is called a hypergraph. Each element can be represented as a vertex in a graph, and the sets are the hyperedges. The problem then becomes that of visualizing hypergraphs.

Unfortunately there isn't a perfect, generalized solution to this since even the simplest graphs can have complex visualizations.

If your sets are relatively small (< 5 elements), you can use a regular graph drawing library like graphviz. To do this, simply connect all pairs of vertices within each set and color them differently. This will yield a solution similar to this:

Have you considered a 2-dimensional grid:

  • Put the set number on one axis
  • Put the unique elements found in all sets on the other axis
  • Color each cell where an element is found in a set (by looking at that row and column's labels)

While this visualization method would normally be inferior to some of the more complicated ones mentioned so far, it has the virtue of actually being possible when you have thousands of elements and thousands of sets.

The trick will be to order the rows and columns in a way that puts the most information together in a way useful to the user. My instinct says that the problem you're trying to solve is to make the colored cells be as "bloblike" as possible—if each set of adjacent colored cells is called an "area", to have the least number of distinct areas and for them to have the fewest holes in them.

That is a very complicated problem in its own right, but could be at least partially solved by working up some adjacency factors for each set against every other set. What you're looking for are "islands" of closeness--so start with the pair of most alike sets, add them to the graph, and consider them a region. Recalculate your closeness numbers with the region replacing the pair it holds (averaging in some way?). Find the next most close pair of items (each item being a region or a set), and if that pair is within a certain threshold of closeness to any existing region in the graph, attach to one side of that region, otherwise create a new, separate region (again removing the pair's closeness values and recomputing for the region itself). Eventually, all sets will be added to regions, and all regions will be joined. Joining two regions can have four possibilities (flipping may be required), so which sides to attach in the graph could be calculated by the closeness of the sets on the 4 edges of the two regions.

While this may never give the optimal configuration, it should come up with something that has few regions compared to a random distribution.

Finally, some dynamic reordering might be useful, by allowing the user to select an interesting set or element, and use that as the seed for a completely rearranged graph, calculating each addition based on closeness to that element (and subsequently that region after being combined with another element), rather than overall lowest closeness of any.

Here is a diagram of the result, having done the above logic process on the example set of data in your question:

Deciding how to order the columns is complex, but basically you can get sort of reasonable results by moving columns to be adjacent when such a move won't disturb the colored block area of any already-added segments.

Additional thoughts:

  • Calculating set closeness is not just how many elements they have in common, but also how many elements they have that are not in common. If two pairs of sets have 3 elements in common between the pairs, but one has 5 non-shared elements and the other has 3 non-shared elements, then the pair with 3 non-shared elements is a closer match than the other.
  • After adding a set to the graph, there is an opportunity to reorder the elements. Stacking the elements as leftmost as possible is a good start for the first placement. After that, stacking most common elements leftmost seems good. After that, it breaks down. I wonder if getting the colored cells as close to the diagonal (from top left to bottom right) would also be a useful algorithm--this reminds me a little of the Design Structure Matrix though that only shows one-way dependencies rather than two-way relationships.
  • When a colored blob consists of sets that are completely disjoint from all other sets (like the set containing X in your example), it can be moved to a separate graph.

There are many approaches to this problem but personally, I'd draw sort of a Venn chart using dynamically generated SVG with a tool like Raphael JS and color it the way I want. Also, Raphael has api like Set that can enable you to give full detailed information about the elements and their relations. There SVG to Code converter will also likely help out in understanding how you can generate the SVG elements.

Alternatively you could, use tools like Venn charts:

which seems to be easily adaptable to this scenario. There's also Flotr2 which can create bubble charts:

or even Canvas Express.

A little more tweaking with any of the later tools will enable you to get it properly done...

I do not have your solution for getting the data in the proper format. Take a look at this javascript plugin created by MIT for building graphs, sigmajs. Haven't looked at the data it accepts, but may be worth a look.

发布评论

评论列表(0)

  1. 暂无评论