What is the most efficient/elegant way to parse a flat table into a tree?

"YOU AND THE ART OF ONLINE DATING" is the only product on the market that will take you step-by-step through the process of online dating, provide you with the resources to help ensure success. Get it now!

There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods: Adjacency List (the "parent" column) and Path Enumeration (the dotted-numbers in your name column) Another solution is called Nested Sets and it can be stored in the same table too. Read Trees and erarchies in SQL for Smarties by Joe Celko for a lot more information on these designs I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data.It requires another table, but then querying trees is pretty easy CREATE TABLE ClosureTable ( ancestor_id INT NOT NULL REFERENCES FlatTable(id), descendant_id INT NOT NULL REFERENCES FlatTable(id), PRIMARY KEY (ancestor_id, descendant_id) ) Store all paths in the Closure Table, where there is a direct ancestry from one node to another.

Include a row for each node to reference itself. For example, using the data set you showed in your question: INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES (1,1), (1,2), (1,4), (1,6), (2,2), (2,4), (3,3), (3,5), (4,4), (5,5), (6,6) Now you can get a tree starting at node 1 like this: SELECT f. * FROM FlatTable f JOIN ClosureTable a ON (f.

Id = a. Descendant_id) WHERE a. Ancestor_id = 1 The output (in MySQL client) looks like the following: id | +----+ | 1 | | 2 | | 4 | | 6 In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1 Re: comment from e-satis about immediate children (or immediate parent).

You can add a path_length column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance) INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES (1,1,0), (1,2,1), (1,4,2), (1,6,1), (2,2,0), (2,4,1), (3,3,0), (3,5,1), (4,4,0), (5,5,0), (6,6,0) Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1 SELECT f. * FROM FlatTable f JOIN ClosureTable a ON (f.

Id = a. Descendant_id) WHERE a. Ancestor_id = 1 AND path_length = 1; +----+ | id | +----+ | 2 | | 6 Re comment from @ashraf: "How about sorting the whole tree by name?

Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name and sort by the name SELECT f. Name FROM FlatTable f JOIN ClosureTable a ON (f.Id = a. Descendant_id) WHERE a.

Ancestor_id = 1 ORDER BY f.name.

Read "Trees and erarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs. I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

CREATE TABLE ClosureTable ( ancestor_id INT NOT NULL REFERENCES FlatTable(id), descendant_id INT NOT NULL REFERENCES FlatTable(id), PRIMARY KEY (ancestor_id, descendant_id) ); Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question: INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES (1,1), (1,2), (1,4), (1,6), (2,2), (2,4), (3,3), (3,5), (4,4), (5,5), (6,6); Now you can get a tree starting at node 1 like this: SELECT f.

* FROM FlatTable f JOIN ClosureTable a ON (f. Id = a. Descendant_id) WHERE a.

Ancestor_id = 1; The output (in MySQL client) looks like the following: +----+ | id | +----+ | 1 | | 2 | | 4 | | 6 | +----+ In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1. Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES (1,1,0), (1,2,1), (1,4,2), (1,6,1), (2,2,0), (2,4,1), (3,3,0), (3,5,1), (4,4,0), (5,5,0), (6,6,0); Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1. SELECT f.

* FROM FlatTable f JOIN ClosureTable a ON (f. Id = a. Descendant_id) WHERE a.

Ancestor_id = 1 AND path_length = 1; +----+ | id | +----+ | 2 | | 6 | +----+ Re comment from @ashraf: "How about sorting the whole tree by name? " Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name. SELECT f.Name FROM FlatTable f JOIN ClosureTable a ON (f.

Id = a. Descendant_id) WHERE a. Ancestor_id = 1 ORDER BY f.name.

1 This is very elegant, thank you. Bonus point awarded. ;-) I see one small drawback though - as it stores the child relation explicitly and implicitly, you need to do a lot of careful UPDATEing for even a small shift in the tree structure.

– Tomalak Oct 11 '08 at 9:51 3 True, every method of storing tree-structures in a database requires some work, either when creating or updating the tree, or when querying trees and subtrees. Choose the design based on which you'd like to be simpler: writing or reading. – Bill Karwin Oct 11 '08 at 19:29 The "parent" design you showed in your question is easy to maintain, and easy to query if you only need immediate parent or immediate child.

But not possible to get a full tree in one query, unless you use proprietary syntax like Oracle's CONNECT BY. – Bill Karwin Oct 11 '08 at 19:32 Please add the output of the last Select statement (using the ClosureTable) to add clarity to this answer. Thank You.

– Aaron Hoffman Apr 18 '09 at 20:35 @Aaron Hoffman: Done. I just created a trivial "flattable" table with a single column "id". – Bill Karwin Apr 18 '09 at 21:22.

If you use nested sets (sometimes referred to as Modified Pre-order Tree Traversal) you can extract the entire tree structure or any subtree within it in tree order with a single query, at the cost of inserts being more expensive, as you need to manage columns which describe an in-order path through thee tree structure. For django-mptt, I used a structure like this: id parent_id tree_id level lft rght -- --------- ------- ----- --- ---- 1 null 1 0 1 14 2 1 1 1 2 7 3 2 1 2 3 4 4 2 1 2 5 6 5 1 1 1 8 13 6 5 1 2 9 10 7 5 1 2 11 12 Which describes a tree which looks like this (with id representing each item): 1 +-- 2 | +-- 3 | +-- 4 | +-- 5 +-- 6 +-- 7 Or, as a nested set diagram which makes it more obvious how the lft and rght values work: __________________________________________________________________________ | Root 1 | | ________________________________ ________________________________ | | | Child 1.1 | | Child 1.2 | | | | ___________ ___________ | | ___________ ___________ | | | | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | | 1 2 3___________4 5___________6 7 8 9___________10 11__________12 13 14 | |________________________________| |________________________________| | |__________________________________________________________________________| As you can see, to get the entire subtree for a given node, in tree order, you simply have to select all rows which have lft and rght values between its lft and rght values. It's also simple to retrieve the tree of ancestors for a given node.

The level column is a bit of denormalisation for convenience more than anything and the tree_id column allows you to restart the lft and rght numbering for each top-level node, which reduces the number of columns affected by inserts, moves and deletions, as the lft and rght columns have to be adjusted accordingly when these operations take place in order to create or close gaps. I made some development notes at the time when I was trying to wrap my head around the queries required for each operation. In terms of actually working with this data to display a tree, I created a tree_item_iterator utility function which, for each node, should give you sufficient information to generate whatever kind of display you want.

More info about MPTT: Trees in SQL Storing erarchical Data in a Database Managing erarchical Data in MySQL.

That's an up in any case. Thanks for your time (and brains). – Tomalak Oct 13 '08 at 17:24.

As of Oracle 9i, you can use CONNECT BY. SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name" FROM (SELECT * FROM TMP_NODE ORDER BY "Order") CONNECT BY PRIOR "Id" = "ParentId" START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0) As of SQL Server 2005, you can use a recursive common table expression (CTE). WITH NodeList ( Id , ParentId , Level , Order ) AS ( SELECT Node.Id , Node.

ParentId , 0 AS Level , CONVERT(varchar(MAX), Node. Order) AS Order FROM Node WHERE Node. ParentId = 0 UNION ALL SELECT Node.

Id , Node. ParentId , NodeList. Level + 1 AS Level , NodeList.

Order + '|' + CONVERT(varchar(MAX), Node. Order) AS Order FROM Node INNER JOIN NodeList ON NodeList.Id = Node. ParentId ) SELECT REPLICATE(' ', NodeList.

Level * 4) + Node.Name AS Name FROM Node INNER JOIN NodeList ON NodeList. Id = Node. Id ORDER BY NodeList.

Order Both will output the following results. Name 'Node 1' ' Node 1.1' ' Node 1.1.1' ' Node 1.2' 'Node 2' ' Node 2.1.

1 Thats convenient... Too bad I have SQL Server only. – Tomalak Oct 11 '08 at 9:41 1 @Tomalak - I added a SQL Server solution. – Eric Weilnau Jun 11 '10 at 16:41 1 Thanks!

:-) – Tomalak Jun 11 '10 at 22:51 This is exactly what I need to solve a problem at work (using SQL Server) but I really can't understand what's going on here and how to implement this in my query. Can you please add some more explanation? – Ziv Oct 26 at 11:51.

Well given the choice, I'd be using objects. I'd create an object for each record where each object has a children collection and store them all in an assoc array (/hashtable) where the Id is the key. And blitz through the collection once, adding the children to the relevant children fields.Simple.

But because you're being no fun by restricting use of some good OOP, I'd probably iterate based on: function PrintLine(int pID, int level) foreach record where ParentID == pID print level*tabs + record-data PrintLine(record. ID, level + 1) PrintLine(0, 0) Edit: this is similar to a couple of other entries, but I think it's slightly cleaner. One thing I'll add: this is extremely SQL-intensive.It's nasty.

If you have the choice, go the OOP route.

What I wrote was real quick between meetings. What I don't understand is why you and I are the only ones with a real solution and we have no up votes at all. I'm going to up vote yours answer because your code is a bit cleaner.

– wcm Oct 10 '08 at 18:16 That is what I meant with "no frameworks" - you are using LINQ, aren't you? Regarding your first paragraph: The result set is already there, why copying all the info to a new object structure first? (I was not clear enough on that fact, sorry) – Tomalak Oct 11 '08 at 10:15 Tomalak - no the code is pseudo-code.

Of course you'd have to break things out into proper selects and iterators... and a real syntax! Why OOP? Because you can exactly mirror the structure.It keeps things nice and it just happens to be more efficient (only one select) – Oli Oct 12 '08 at 9:52 I had no repeated selects in mind either.

Regarding OOP: Mark Bessey said in his answer: "You can emulate any other data structure with a hashmap, so that's not a terrible limitation. ". Your solution is correct, but I think there is some room fore improvement even without OOP.

– Tomalak Oct 13 '08 at 8:45.

Assuming that you know that the root elements are zero, here's the pseudocode to output to text: function PrintLevel (int curr, int level) //print the indents for (i=1; i.

You can emulate any other data structure with a hashmap, so that's not a terrible limitation. Scanning from the top to the bottom, you create a hashmap for each row of the database, with an entry for each column. Add each of these hashmaps to a "master" hashmap, keyed on the id.

If any node has a "parent" that you haven't seen yet, create an placeholder entry for it in the master hashmap, and fill it in when you see the actual node. To print it out, do a simple depth-first pass through the data, keeping track of indent level along the way. You can make this easier by keeping a "children" entry for each row, and populating it as you scan the data.As for whether there's a "better" way to store a tree in a database, that depends on how you're going to use the data.

I've seen systems that had a known maximum depth that used a different table for each level in the hierarchy. That makes a lot of sense if the levels in the tree aren't quite equivalent after all (top level categories being different than the leaves).

Thanks for sticking to the "plain English" constraint. :-) In fact, that is pretty close to how I already do it. – Tomalak Oct 11 '08 at 9:37.

This was written quickly, and is neither pretty nor efficient (plus it autoboxes alot, converting between int and Integer is annoying! ), but it works. It probably breaks the rules since I'm creating my own objects but hey I'm doing this as a diversion from real work :) This also assumes that the resultSet/table is completely read into some sort of structure before you start building Nodes, which wouldn't be the best solution if you have hundreds of thousands of rows.

Public class Node { private Node parent = null; private List children; private String name; private int id = -1; public Node(Node parent, int id, String name) { this. Parent = parent; this. Children = new ArrayList(); this.Name = name; this.

Id = id; } public int getId() { return this. Id; } public String getName() { return this.Name; } public void addChild(Node child) { children. Add(child); } public List getChildren() { return children; } public boolean isRoot() { return (this.

Parent == null); } @Override public String toString() { return "id=" + id + ", name=" + name + ", parent=" + parent; } } public class NodeBuilder { public static Node build(List> input) { // maps id of a node to it's Node object Map nodeMap = new HashMap(); // maps id of a node to the id of it's parent Map childParentMap = new HashMap(); // create special 'root' Node with id=0 Node root = new Node(null, 0, "root"); nodeMap. Put(root.getId(), root); // iterate thru the input for (Map map : input) { // expect each Map to have keys for "id", "name", "parent" ... a // real implementation would read from a SQL object or resultset int id = Integer. ParseInt(map.

Get("id")); String name = map. Get("name"); int parent = Integer. ParseInt(map.

Get("parent")); Node node = new Node(null, id, name); nodeMap. Put(id, node); childParentMap. Put(id, parent); } // now that each Node is created, setup the child-parent relationships for (Map.

Entry entry : childParentMap.entrySet()) { int nodeId = entry.getKey(); int parentId = entry.getValue(); Node child = nodeMap. Get(nodeId); Node parent = nodeMap. Get(parentId); parent.

AddChild(child); } return root; } } public class NodePrinter { static void printRootNode(Node root) { printNodes(root, 0); } static void printNodes(Node node, int indentLevel) { printNode(node, indentLevel); // recurse for (Node child : node.getChildren()) { printNodes(child, indentLevel + 1); } } static void printNode(Node node, int indentLevel) { StringBuilder sb = new StringBuilder(); for (int I = 0; I resultSet = new ArrayList>(); resultSet. Add(newMap("1", "Node 1", "0")); resultSet. Add(newMap("2", "Node 1.1", "1")); resultSet.

Add(newMap("3", "Node 2", "0")); resultSet. Add(newMap("4", "Node 1.1.1", "2")); resultSet. Add(newMap("5", "Node 2.1", "3")); resultSet.

Add(newMap("6", "Node 1.2", "1")); Node root = NodeBuilder. Build(resultSet); printRootNode(root); } //convenience method for creating our dummy data private static Map newMap(String id, String name, String parentId) { Map row = new HashMap(); row. Put("id", id); row.

Put("name", name); row. Put("parent", parentId); return row; } }.

I always find it hard to filter the algorithm-specific part from the implementation-specific part when presented with a lot of source code. That's why I asked for a solution that was not language-specific in the first place. But it does the job, so thanks for your time!

– Tomalak Oct 11 '08 at 12:19 I see what you mean now, if it's not obvious the main algorithm is in NodeBuilder.build() - I probably could have done a better job of summing this up. – matt be Oct 13 '08 at 20:05.

Here is how I do it currently: (This solution assumes you actually can index into the result set, as indicated in the question. ) Make sure the result set is "ORDER BY Order" so all nodes with the same parent come out in the right order. Iterate the result set, write down each row index (not node Id!) in a hashmap children, keyed by ParentId: {0:1,3, 1:2,6, 2:4, 3:5} Depth-first recurse into children, starting with the default parent node "0".

The row index in the child array is used to pull out any other information from the result set. Maintain info about the row index and its nesting level (e.g. Recursion depth), building this array (let's call it sequence): {row:1,lvl:0}, {row:2,lvl:1}, {row:4,lvl:2}, {row:6,lvl:1}, {row:3,lvl:0}, {row:5,lvl:1} Use that array to access the result set sequentially (simple loop) any time I need to print out the tree. Assuming I have to do it more than once, I spare me the repeated recursion.

Pseude-Code: children = {} foreach row in resultSet parentId = row. Fields'ParentId' if parentId not in children childrenparentId = end if chidrenparentId. Push(row.

Index) next row function buildSequence(parentId = 0, depth = 0) sequence = foreach rowIndex in childrenparentId sequence. Push( {row: rowIndex, lvl: depth} ) thisId = resultSetrowIndex. Fields'Id' if thisId in children sequence.

Append( buildSequence(thisId, depth + 1) ) end if next rowIndex return sequence end function sequence = buildSequence().

If nested hash maps or arrays can be created, then I can simply go down the table from the beginning and add each item to the nested array. I must trace each line to the root node in order to know which level in the nested array to insert into. I can employ memoization so that I do not need to look up the same parent over and over again.

Edit: I would read the entire table into an array first, so it will not query the DB repeatedly. Of course this won't be practical if your table is very large. After the structure is built, I must do a depth first traverse through it and print out the HTML.

There's no better fundamental way to store this information using one table (I could be wrong though ;), and would love to see a better solution ). However, if you create a scheme to employ dynamically created db tables, then you opened up a whole new world at the sacrifice of simplicity, and the risk of SQL hell ;).

I'd rather not change the DB layout just because a new level of sub-nodes is needed. :-) – Tomalak Oct 11 '08 at 12:06.

If elements are in tree order, as shown in your example, you can use something like the following Python example: delimiter = '. ' stack = for item in items: while stack and not item. Startswith(stack-1+delimiter): print "" stack.pop() print "" print item stack.

Append(item) What this does is maintain a stack representing the current position in the tree. For each element in the table, it pops stack elements (closing the matching divs) until it finds the parent of the current item. Then it outputs the start of that node and pushes it to the stack.

If you want to output the tree using indenting rather than nested elements, you can simply skip the print statements to print the divs, and print a number of spaces equal to some multiple of the size of the stack before each item. For example, in Python: print " " * len(stack) You could also easily use this method to construct a set of nested lists or dictionaries. Edit: I see from your clarification that the names were not intended to be node paths.

That suggests an alternate approach: idx = {} idx0 = for node in results: child_list = idxnode. Id = child_list idxnode.ParentId. Append((node, child_list)) This constructs a tree of arrays of tuples(!).

Idx0 represents the root(s) of the tree. Each element in an array is a 2-tuple consisting of the node itself and a list of all its children. Once constructed, you can hold on to idx0 and discard idx, unless you want to access nodes by their ID.

Your approach explicitly relies on the clients's ability to display nested structures (DIVs). Thats not exactly what I had in mind, what if the output should be plain text, or a flat HTML table? – Tomalak Oct 11 '08 at 9:20 Updated the answer to clarify.

– Nick Johnson Oct 14 '08 at 11:53.

Bill's answer is pretty gosh-darned good, this answer adds some things to it which makes me which SO supported threaded answers. Anyway I wanted to support the tree structure and the Order property. I included a single property in each Node called leftSibling that does the same thing Order is meant to do in the original question (maintain left-to-right order).

Mysql> desc nodes ; +-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | name | varchar(255) | YES | | NULL | | | leftSibling | int(11) | NO | | 0 | | +-------------+--------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> desc adjacencies; +------------+---------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+---------+------+-----+---------+----------------+ | relationId | int(11) | NO | PRI | NULL | auto_increment | | parent | int(11) | NO | | NULL | | | child | int(11) | NO | | NULL | | | pathLen | int(11) | NO | | NULL | | +------------+---------+------+-----+---------+----------------+ 4 rows in set (0.00 sec) More detail and SQL code on my blog. Thanks Bill your answer was helpful in getting started!

To Extend Bill's SQL solution you can basically do the same using a flat array. Further more if your strings all have the same lenght and your maximum number of children are known (say in a binary tree) you can do it using a single string (character array). If you have arbitrary number of children this complicates things a bit... I would have to check my old notes to see what can be done.

Then, sacrificing a bit of memory, especially if your tree is sparse and/or unballanced, you can, with a bit of index math, access all the strings randomly by storing your tree, width first in the array like so (for a binary tree): String nodeArray = L0root, L1child1, L1child2, L2Child1, L2Child2, L2Child3, L2Child4 ... yo know your string length, you know it I'm at work now so cannot spend much time on it but with interest I can fetch a bit of code to do this. We use to do it to search in binary trees made of DNA codons, a process built the tree, then we flattened it to search text patterns and when found, though index math (revers from above) we get the node back... very fast and efficient, tough our tree rarely had empty nodes, but we could searh gigabytes of data in a jiffy.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

What is the most efficient/elegant way to parse a flat table into a tree?

Related Questions

What's the most efficient/elegant way to delete elements from a matrix in MATLAB?

Most efficient way to copy elements from multidimensional array to flat array?

Using CSS, what is the most elegant way to vertical align a short title and a description, not using a table?

Most Efficient MySQL query to update a table from identical memory table?

In mysql, what is an efficient way to get only rows in a table by a distinct value in one of that table's columns?

Efficient Way to store a table of checkbox data into MySQL database?