I am saving a graph to a database. It has a structure like:
class SuperNode(Base):
__tablename__ = "supernodes"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
nodes = relationship("Node", back_populates="supernode")
class Node(Base):
__tablename__ = "nodes"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
supernode_id = Column(Integer, ForeignKey("supernodes.id"), nullable=False)
sub_node_id = Column(Integer, ForeignKey("supernodes.id"), nullable=True)
supernode = relationship("SuperNode", back_populates="nodes")
sub_node = relationship("SuperNode", foreign_keys=[sub_node_id])
children = relationship(
"Node",
secondary=association_table,
primaryjoin=id == association_table.c.parent_id,
secondaryjoin=id == association_table.c.child_id,
backref="parents"
)
association_table = Table(
"parent_child_association",
Base.metadata,
Column("parent_id", Integer, ForeignKey("nodes.id"), primary_key=True),
Column("child_id", Integer, ForeignKey("nodes.id"), primary_key=True),
)
FYI: I am using sqlalchemy 2.0 with aiosqlite
Now, I have trouble creating a statement that will get be a SuperNode by id, and then the entire hierarchy of nodes
their sub_node
s the nodes
of those... etc. as well as all children of every node, their subnodes their nodes, their children, etc. so I get everything in the hierarchy of a SuperNode when I select it. I've been playing with CTE's, but with that I can end up getting tonnes of rows, and I have to rebuild the structure (which is acceptable as plan B, but I still have issues getting SuperNode.nodes.sub_node.nodes
and so on), the other option would be using selectinload, but that will get me only a fixed number of layers, which is just not good enough.
CTE's seems like a way to do this, perhaps the best way? But I can't quite seem to understand how to use them in a case like this, and I have not been able to find any good examples.
Just a small EDIT: I have a solution that uses recursion in python coupled with lazyloading to do this, but I do not want to used lazyloading. Any sort of depth and it takes seconds to make a query.
So if you can tell me how, what, and why, I would be very happy indeed :D
I am saving a graph to a database. It has a structure like:
class SuperNode(Base):
__tablename__ = "supernodes"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
nodes = relationship("Node", back_populates="supernode")
class Node(Base):
__tablename__ = "nodes"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
supernode_id = Column(Integer, ForeignKey("supernodes.id"), nullable=False)
sub_node_id = Column(Integer, ForeignKey("supernodes.id"), nullable=True)
supernode = relationship("SuperNode", back_populates="nodes")
sub_node = relationship("SuperNode", foreign_keys=[sub_node_id])
children = relationship(
"Node",
secondary=association_table,
primaryjoin=id == association_table.c.parent_id,
secondaryjoin=id == association_table.c.child_id,
backref="parents"
)
association_table = Table(
"parent_child_association",
Base.metadata,
Column("parent_id", Integer, ForeignKey("nodes.id"), primary_key=True),
Column("child_id", Integer, ForeignKey("nodes.id"), primary_key=True),
)
FYI: I am using sqlalchemy 2.0 with aiosqlite
Now, I have trouble creating a statement that will get be a SuperNode by id, and then the entire hierarchy of nodes
their sub_node
s the nodes
of those... etc. as well as all children of every node, their subnodes their nodes, their children, etc. so I get everything in the hierarchy of a SuperNode when I select it. I've been playing with CTE's, but with that I can end up getting tonnes of rows, and I have to rebuild the structure (which is acceptable as plan B, but I still have issues getting SuperNode.nodes.sub_node.nodes
and so on), the other option would be using selectinload, but that will get me only a fixed number of layers, which is just not good enough.
CTE's seems like a way to do this, perhaps the best way? But I can't quite seem to understand how to use them in a case like this, and I have not been able to find any good examples.
Just a small EDIT: I have a solution that uses recursion in python coupled with lazyloading to do this, but I do not want to used lazyloading. Any sort of depth and it takes seconds to make a query.
So if you can tell me how, what, and why, I would be very happy indeed :D
Share Improve this question edited yesterday rasmus91 asked 2 days ago rasmus91rasmus91 3,2233 gold badges22 silver badges35 bronze badges1 Answer
Reset to default -1Well in this case you can use a recursive CTE to navigate through your node hierarchy. This works because CTEs allow you to write recursive queries that will repeatedly query data based on the result of the previous query.
So how can you do this? First, you should define the CTE for the hierarchical query. Start with SuperNode
and its immediate child Node instances, then recursively join with the Node table for sub-nodes.
Then fetch the Nodes with SQLAlchemy and use select
to query the nodes and their sub nodes.
An example below:
from sqlalchemy import select, func, join, column
from sqlalchemy.orm import selectinload
from sqlalchemy.orm import aliased
from sqlalchemy.ext.asyncio import AsyncSession
# Assuming you have an AsyncSession instance
async def get_supernode_with_hierarchy(session: AsyncSession, supernode_id: int):
# Create an alias for the node (to join recursively)
node_alias = aliased(Node)
# Step 1: CTE to get the root nodes (i.e., nodes directly under the supernode)
cte = (
select(Node.id, Node.supernode_id, Node.sub_node_id, Node.name)
.filter(Node.supernode_id == supernode_id)
.cte(name="node_cte", recursive=True)
)
# Step 2: Recursive part of the CTE to fetch sub-nodes
recursive_cte = (
select(Node.id, Node.supernode_id, Node.sub_node_id, Node.name)
.join(cte, cte.c.id == Node.supernode_id) # Join with the CTE on sub_node_id
.cte(name="node_cte", recursive=True)
)
# Step 3: Combine the base and recursive parts of the CTE
cte = cte.union_all(recursive_cte)
# Step 4: Query the SuperNode along with its hierarchy (nodes and sub-nodes)
query = (
select(SuperNode)
.filter(SuperNode.id == supernode_id)
.options(selectinload(SuperNode.nodes)) # This is to load the first layer of nodes
.join(cte, cte.c.supernode_id == SuperNode.id) # Join to load nodes using the CTE
)
# Execute query asynchronously
result = await session.execute(query)
supernode = result.scalar_one_or_none()
return supernode
So, how does this work?
CTE Setup: You can begin by defining a CTE (node_cte
) that starts by selecting all the nodes directly under the SuperNode
The recursive part of the CTE is designed to find the sub-nodes. You can join the Node table on the CTE result (cte.c.id == Node.supernode_id
) to fetch the child nodes of each node.
The union_all
operation is used to combine the base query and the recursive query (sub-nodes)
The final step joins the CTE with the SuperNode table to select the SuperNode
and its hierarchy of nodes and sub-nodes.
Then I used await session.execute(query)
to execute the query asynchronously, which falls in line with aiosqlite usage.