Tags: bill-of-material, columns, connect, database, downstream, example, mysql, oracle, prior, quantity_per, represents, rows, situation, sql, syntax, upstream
question on start with connect by prior syntax
I have a situation that represents a bill-of-material. It has three columns - downstream part, upstream part, quantity_per. For example, rows in it would be like this:
car chassis 1
car wheel 4
wheel tire 1
wheel nut 4
It is easy to extract an indented bill of material using the start with ... connect by prior... syntax.
I have a slightly more involved problem. I want to figure out that for a car, I need 16 nuts (i.e. at the leaf-level node). That is, I want to multiply and accumulate values of specific columns. Can this be done using the recursive SQL? I want to avoid having to write PL-SQL.
Leave a comment...
- 10 Comments
- There may be better solutions, but mine involves a function and a SQL query
SQL> create or replace function multiply_qty(p_str varchar2) return number is
2 l_num number:= 1;
3 l_idx number;
4 l_str varchar2(2000);
6 l_str := p_str;
7 while instr(l_str,'/') > 0 loop
8 l_idx := instr(l_str,'/');
9 l_num := l_num * to_number(NVL(substr(l_str,1,l_idx-1),1));
10 dbms_output.put_line('No is ' || l_num);
11 l_str := substr(l_str,l_idx+1);
12 end loop;
13 l_num := l_num * to_number(l_str);
14 dbms_output.put_line('Final No is ' || l_num);
15 return l_num;
16 end multiply_qty;
SQL> with bom as
3 select 'car' downstream, 'chassis' upstream, 1 qty from dual
4 union all
5 select 'car' downstream, 'wheel' upstream, 4 qty from dual
6 union all
7 select 'wheel' downstream, 'tire' upstream, 1 qty from dual
8 union all
9 select 'wheel' downstream, 'nut' upstream, 4 qty from dual
11 select sys_connect_by_path(upstream,'/') BOM, multiply_qty(sys_connect_by_path(qty ,'/')) quant
ity from bom
12 connect by prior upstream = downstream
13 start with downstream = 'car'
/wheel/nut 16#1; Sat, 23 Feb 2008 15:23:00 GMT
- Thanks, Sundar. No one else has replied, so I guess there is no 'slick' way of doing this, and the only alternative is to write a function as you did. I will probably end up using the same function, and write the query with an 'IS_LEAF', since I am looking for only the leaf-level
Srini#2; Sat, 23 Feb 2008 15:24:00 GMT
- Not the most inspiring title ;-)
If you'd indicated that you had some devious problem to solve with CONNECT .. BY no doubt the vultures would have found this...
...for a number of approaches.#3; Sat, 23 Feb 2008 15:25:00 GMT
- Wow! Thank you! That is just perfect - it is exactly what I am looking for. I think I will spend quite some time looking at it, and it should be well worth the time. I was trying to see if there is a slick way to do MRP-type calculations using just SQL, and this was the first and biggest step.
What would an inspiring title be that would catch people's eyes? "To loop is pl-sql, to recurse, sql" perhaps? :-)#4; Sat, 23 Feb 2008 15:26:00 GMT
- > What would an inspiring title be that would catch
> people's eyes? "To loop is pl-sql, to recurse, sql"
> perhaps? :-)
"Hierarchical weighted total" might be more inspiring title indeed. Let me take this opportunity to plug a snippet from "SQL Design Patterns" book.
-- Hierarchical Weighted Total ---
Before exploring aggregation on graphs, let's have a quick look to aggregation on trees. Aggregation on trees is much simpler and has a designated name: hierarchical total. A typical query is "Find combined salary of all the employees under (direct and indirect) supervision of King". This query, however, has no inherent complexity, and effectively reduces to familiar task of finding a set all node's descendants.
In graph context, the hierarchical total query is commonly referred to as bill of materials (BOM). Consider bicycle parts assembly
Part SubPart Quantity
-- -- --
Bicycle Wheel 2
Wheel Reflector 2
Bicycle Frame 1
Frame Reflector 2
Parts with the same name are assumed to be identical. They are modeled as nodes in an (acyclic directed) graph. The edges specify the construction flow:
1. Assemble each wheel from the required parts (including reflectors).
2. Assemble frame from the required components (including reflectors).
3. Assemble bicycle from the required parts (including frame and wheels).
Suppose we want to order the total list of parts for bike assembly. How do we calculate each part quantity? Unlike hierarchical total for a tree we can't just add the quantities, as they multiply along each path.
Figure 6.11: Bicycle assembly graph. The total number of reflector parts should be calculated as the sum of parts aggregated along each path. The path /Bicycle/Wheel/Reflector contribute to 2*2=4 parts: bicycle has 2 wheels, each wheel has 2 reflectors. Likewise, the path /Bicycle/Frame/Reflector contribute to 1*2=2 more parts.
Therefore, there are two levels of aggregation here, multiplication of the quantities along each path, and summation along each alternative path.
Aggregation on Graphs
Two levels of aggregation fit naturally into graphs queries. Consider finding a shortest path between two nodes. First, we add the distances along each path, and then we choose a path with minimal length.
Double aggregation is not something unique to graph queries, however. Consider
Find the sum of the salaries grouped by department. Select maximum of them.
When expressing such a query in SQL, we accommodate the first sentence as an inner subquery inside the outer query corresponding to the second sentence
select max(salaryExpences) from (
select deptno, sum(sal) salaryExpenses
group by dept
Hierarchical weighted total query has the same structure. The first level of aggregation where we join edges into paths is analogous to the group by subquery from the salary expense query. Formally, it is a transitive closure, which is enhanced with additional aggregates, or generalized transitive closure.
Figure 6.12: Generalized transitive closure. There are several aggregates naturally associated with each path: the first edge, the last edge, the path length, or any aggregate on the edge weights.
Let's suggest some hypothetical SQL syntax for generalized transitive closure
select distinct first(Part), last(SubPart), product (Quantity)
connect by prior Part = later SubPart
Unfamiliar syntax requires clarification:
§ The "product" is a non-standard aggregate from chapter 4.
§ The "first" and "last" refer to the first and last edges in the path, correspondingly. These aggregates are unique to ordered structures such as (directed) graphs.
§ Although we didn't use it in the example, concatenating edge labels into a string is one more natural aggregation function, which is unique to graphs. The "list" aggregate function is a standard way to accommodates it.
§ The "later" keyword is just a syntactic sugar fixing apparent asymmetry caused by the prior keyword.
§ There is no "start with" clause, which is, in fact, redundant. It is an outer query where we'll restrict paths to those originated in the 'Bicycle' node.
The generalized transitive closure query is enveloped with second level of aggregation, which is accomplished by standard means
select leaf, sum(factoredQuantity) from (
select product(Quantity) factoredQuantity,
first(Part) root, last(SubPart) leaf
connect by prior Part = later SubPart
) where root = 'Bicycle'
group by leaf
Enough theory, what do we do in real world to implement an aggregated weighted total query? Let's start with Oracle, because the proposed syntax for generalized transitive closure resembles Oracle connect by.
First, we have to be able to refer to the first and the last edges in the path
select connect_by_root(Part) root, SubPart leaf
connect by prior Part = SubPart
Unlike our fictional syntax, Oracle treats edges in the path asymmetrically. Any column from the AssemblyEdges table is assumed to (implicitly) refer to the last edge. This design dates back to version 7. The connect_by_root function referring to the first edge has been added in version 10. It is remarkable how this, essentially ad-hock design proved to be successful in practice.
Next, Oracle syntax has several more path aggregate functions:
1. level – the length of the path.
2. sys_connect_by_path – which is essentially the list aggregate in our fictitious syntax.
3. connect_by_is_leaf – an indicator if there is no paths which contain the current path as a prefix.
4. connect_by_is_cycle – an indicator if the first edge is adjacent to the last.
Unfortunately, we need an aggregate which is a product of edge weights, and not a string which is a concatenation of weights produced by sys_connect_by_path. You might be tempted to hack a function which accepts a string of the edge weights, parses it, and returns the required aggregate value.
Nah, too easy! Can we solve the problem without coding a function (even a simple one)? Yes we can, although this solution would hardly be more efficient. The critical idea is representing any path in the graph as a concatenation of three paths:
§ a prefix path from the start node to some intermediate node i
§ a path consisting of a single weighted edge (i,j)
§ a postfix path from the node i to the end node
Figure 6.13: A path from node x to node y is a composition of the path from node x to node i, the edge (i, j), and the path from j to y. The edge (i, j) can be positioned anywhere along the path from x to y.
Then, we freeze the path from x to y, while interpreting the edge (i,j) as a variable. All we need to do is aggregating the weights of all those edges.
Let's assemble the solution piece by piece. First, all the paths in the graphs are expressed as
with TClosure as (
select distinct connect_by_root(Part) x, SubPart y,
sys_connect_by_path('['||Part||','||SubPart||'>',' ') path
connect by nocycle Part=prior SubPart
select Part, Part, '' from AssemblyEdges
select SubPart, SubPart, '' from AssemblyEdges
This is essentially reflexive transitive closure relation enhanced with the path column. Paths are strings of concatenated edges; each edge is sugarcoated into '['||Part||','||SubPart||'>', which helps visual perception, but is inessential for the solution.
Next, we join two paths and intermediate edge together, and group by paths
... , PathQuantities as (
select t1.x, t2.y,
from TClosure t1, AssemblyEdges e, TClosure t2
where t1.y = e.Part and e.SubPart = t2.x
group by t1.x, t2.y, t1.p||' ['||e.Part||','||e.SubPart||'>'||t2.p
There we have all the paths with quantities aggregated along them. Let's group the paths by the first and last node in the path while adding the quantities
select x, y, sum(Quantity)
group by x, y
This query is almost final, as it needs only a minor touch: restricting the node x to 'Bicycle' and interpreting the y column as a Part in the assembly
select y Part, sum(Quantity)
where x = 'Bicycle'
group by x, y
Let's move on to recursive SQL solution. It turns out to be quite satisfactory
with TCAssembly as (
select Part, SubPart, Quantity AS factoredQuantity
where Part = 'Bicycle'
select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
from TCAssembly te, AssemblyEdges e
where te.SubPart = e.Part
) select SubPart, sum(Quantity) from TCAssembly
group by SubPart
The most important, it accommodated the inner aggregation with non-standard aggregate effortlessly! Second, the cycle detection issue that plagued recursive SQL in the section on transitive closure is not a problem for directed acyclic graphs.
Hierarchical total turns out to be surprisingly useful in the following two sections. Selecting all the subsets of items satisfying some aggregate criteria – Basket generation -- appears to have little with hierarchical total, at first. The other, rather unexpected application is Comparing hierarchies.#5; Sat, 23 Feb 2008 15:28:00 GMT
- Thanks, Vadim. I think it is good that you 'plugged' your book, because I am considering buying it. I noticed that I could not find it on Amazon, and at B&N.
Srini#6; Sat, 23 Feb 2008 15:29:00 GMT
- Thank you for encouraging words. There is a certain disconnection between me and the publisher who is apparently not used to publishing that kind of books. There is even a threat of canceling the publication. Are you reading this, Don?
BTW, here is a link to sample chapter:#7; Sat, 23 Feb 2008 15:30:00 GMT
- Here is the solution I am finally going with. I got this idea from the AskTom site whose URL has been posted in this thread.
Table MYTABLE columns - parentpart, childpart, quantity
SELECT T1.PARENTPART, T1.CHILDPART,
FROM MYTABLE T2
START WITH T2.CHILDPART = T1.CHILDPART
CONNECT BY PRIOR PARENTPART = CHILDPART
START WITH ...
CONNECT BY PRIOR CHILDPART = PARENTPART
The key thing is to note the inner recursive SQL - it goes the other way around. Also, I have not included the DECODE to take care of lousy data (nulls etc.) unlike in Tom's example.#8; Sat, 23 Feb 2008 15:30:00 GMT
Thanks for your post. While I got a quick solution from AskTom site, I will spend more time on your post when I find the time. I am definitely interested in figuring out graph traversals - that should solve whole classes of problems I encounter.
Incidentally, the A--> x, y --> that you posted - isn't that computationally VERY expensive? Of course, it may be that I do not yet correctly comprehend what you have posted. BTW, the tables I am likely to deal with might have half a million rows. So, a triple join is no walk-in-the-park. Perhaps, at some point, I will give up 'functional', think 'imperative' and go with PL-SQL.
Srini#9; Sat, 23 Feb 2008 15:32:00 GMT
- Hello Vadim
I see that on the Rampant website it's slated for publication in October 2007. Based on the contents list and the sample chapter you provided the link to, I'll definately buy a copy when it's published. If it is dropped by Rampant, would you consider publishing it as an eBook, or through another publisher maybe?
David#10; Sat, 23 Feb 2008 15:33:00 GMT