Shortest Path Analysis of Large Networks by SQL and PL/SQL

Let k be an iteration number, starting with k = 0, which will correspond to an input root node, r say.

Let n_k denote a node, n, reachable in a minimum of k steps from the root node, and let {n_k} denote the set of all such nodes. In a tree network (i.e. a network without loops), there is a single path from the root to any other reachable node. If the network has loops then there may be multiple paths between nodes.

We might write the set of all nodes reachable in a minimum of k steps from the root node, r, as:

N_k(r) = {n_k}(r)

Consider the set of all nodes that have a link connecting them to a node in N_i(r), but which are not themselves in any of the sets N_k(r) for k = 0,…,i. We can write this symbolically (dropping the (r) for concision), where N and L are the complete node and link sets respectively:

The union represents the fact that the new node may be at either end of the link, and note that a node in the set N_i+1 may occur in multiple links from the set N_i, but only appears once in the node set since set elements are unique.

Now we need to show that under our method of construction, N_i+1 matches our definition of N_k in general. This requires the following conditions to hold:

All nodes in N_i+1 are reachable in i + 1 steps
This follows from the fact that the nodes are obtained as a single step from nodes reachable by definition in i steps
All nodes in N_i+1 are not reachable in < i + 1 steps
This follows from the fact that by definition N_k contains all nodes reachable in k steps where k < i + 1, and by construction N_i+1 excludes these nodes
No nodes not in N_i+1 are reachable in i + 1 steps
Any node reachable in i + 1 steps must be linked to a node reachable in i steps, which must by definition be in N_i, and by construction this would lead to the inclusion of the node reachable in i + 1 steps in N_i+1

Given that we know that the root node is reachable in zero steps, we have:

N₀ = {r}

It follows, by induction, that the recursive definition of N_i+1 from N_i allows us to generate N_k for all k > 0.

Generating All Minimum Length Paths

A path, p, to a node, n, is a sequence of nodes starting from the root node with each adjacent pair of nodes appearing in the link set L.

We showed how to generate all the node sets reachable from a root node in a minimum number of steps. The generation process obtains the (i + 1)’th set by taking all the nodes linked to the i’th set and not in a prior set. We can extend this process to generate also all the minimum length paths associated with the nodes, by adding on these linked nodes to the prior paths. This can be expressed thus:

where (p, n) means the path obtained by adding the new node n onto the prior path in P_i, and P_k represents the set of all paths of length k that are minimum length paths to a node reachable from the root node, r.

P₀ = {}

It’s easy to see how this allows us to generate P_k for all k > 0 in the same way as the iteration scheme for the node sets, N_k.

Generating A Single Minimum Length Path Per Node

We have seen how to generate all the minimum length paths to each node, but suppose we want to keep just one path for each node. We can do this by using a ranking function that uniquely ranks all minimum length paths to a node, say R(p_i, n_i) for each minimum length i, with:

R(p^*_i, n_i) = 1 for a single path to node n_i

Then we can write the iteration scheme for the set of optimal minimum length paths of length i + 1, P^*_i+1 based on those of length i thus:

P^*₀ = {}

Note that this guarantees that each node will have a single minimum length path generated, and also that any sub-path to a node generated as part of a path to a higher level node will be the same as the original path.

Shortest Paths and Spanning Trees

↑ Contents
↓ 3 Subnetworks
↓ Foreign Keys
↓ Brightkite
↓ Bacon Numbers

It is clear that the network comprised by the single minimum length paths to each node constitutes a spanning tree for the subnetwork connected to the root, i.e. a network connecting all the subnetwork nodes without loops.

It is important for efficient implementation to note that the tree network can be described by (node/prior node) links, so that each node does not need to have an explicit list of each node in its path, only the prior node.

Subnetwork Grouper Algorithm

↑ Two Algorithms

Any undirected network can be partitioned into subsets in which each node in a subnetwork is reachable from any other node in the same subnetwork, but not from any other node. With N and L representing the full sets of nodes and links, we could write the sets of partitioning subsets NP and LP, say, in terms of the node and link subsets Nⁱ and Lⁱ, respectively:

The sets are unique for the given network and of course the set cardinalities are the same:

Now let us see how we can generate the node subsets recursively. Suppose that we have the first i node subsets:

and let us write the set of all nodes not in the first i subsets thus:

and suppose that we have a ranking function defined on a node and node set pair, R(n, {n}), with:

R(n^*, {n}) = 1 for a single node n^* in {n}

Now, assuming we know how to generate the complete connected subnetwork, N(r), for any given root, r, as discussed in the first section above, we can generate the next node subset thus:

Example Datasets

We apply the algorithms to a range of network datasets. The first two small datasets comprise a made-up example, and a foreign key network from an Oracle database. The brightkite dataset represents a social network and comes from a university web site, and there are several datasets of varying sizes from another university web site representing actors’ appearances in films. Further details are given below.

Dataset	#Nodes	#Links	#Subnetworks	#Isolated Nodes	#Isolated Links	Maxlev
three_subnets	14	13	3	1	1	3
foreign_keys	289	319	43	0	19	8
brightkite	58,228	214,078	547	0	362	10
bacon/small	161	3,342	1	0	0	5
bacon/top250	12,466	583,993	15	0	0	6
bacon/pre1950	134,131	8,095,294	2,432	1,489	425	13
bacon/only_tv_v	744,374	22,503,060	12,198	8,659	3,539	11
bacon/no_tv_v	2,386,567	87,866,033	55,276	22,598	24,050	10
bacon/post1950	2,696,175	101,597,227	60,544	26,225	25,638	10
bacon/full	2,800,309	109,262,592	62,557	27,513	26,372	10

Maxlev represents the longest shortest path encountered from the Min Pathfinder algorithm over all subnetworks, and depends on the root nodes selected.

3 Subnetworks

Dataset	#Nodes	#Links	#Subnetworks	#Isolated Nodes	#Isolated Links	Maxlev
three_subnets	14	13	3	1	1	3

This is the present author’s specially constructed demo network, as shown in the diagram in the earlier section describing the shortest path problems. It is designed to cover multiple scenarios in a small dataset, and has three subnetworks of which one is a single node (isolated node in the table), another has two nodes with one link (isolated link in the table). The other has the following features:

a three-node linear section
a three-node section with two nodes in a line and the third linking to the first as well as to the root, forming a loop
a two-node link joined at one node by two other nodes to the root, giving two paths of equal length to each of the two nodes

The view links_load_v is created with hard-coded node names, with a typical record looking like this:

NODE_NAME_FR	NODE_NAME_TO
S1-N0-1	S1-N1-1

Foreign Keys

Dataset	#Nodes	#Links	#Subnetworks	#Isolated Nodes	#Isolated Links	Maxlev
foreign_keys	289	319	43	0	19	8

Database tables can be considered to be in a directed network defined by foreign keys. We will use the foreign keys visible to the SYS account in an Oracle 21c database, and we’ll treat the links as undirected. Here is a diagram of one of the subnetworks, with tables from HR, OE and PM schemas:

Note that the links are shown with their base directionality, but may be traversed in either direction; also, note self-joins such as on the Employee entity are discarded, as are duplicated node pairs, so that only one of the Employee-Department links is included in the network data model links table.

The source data are collected into the table fk_links (Load Table in the diagram) from the sys schema by the following command:

CREATE TABLE shortest_path_sql.fk_links(
                constraint_name,
                table_fr,
                table_to
) AS
SELECT Lower(con_f.constraint_name || '|' || con_f.owner),
       con_f.table_name || '|' || con_f.owner,
       con_t.table_name || '|' || con_t.owner
  FROM all_constraints                  con_f
  JOIN all_constraints                  con_t
    ON con_t.constraint_name            = con_f.r_constraint_name
   AND con_t.owner                      = con_f.r_owner
 WHERE con_f.constraint_type            = 'R'
   AND con_t.constraint_type            = 'P'
   AND con_f.table_name NOT LIKE '%|%'
   AND con_t.table_name NOT LIKE '%|%'

A typical row in the table looks like this, with ‘

’ delimiting the schema name from the element name:

CONSTRAINT_NAME	TABLE_FR	TABLE_TO
dept_mgr_fk\|hr	DEPARTMENTS\|HR	EMPLOYEES\|HR

Brightkite

Friendship network of Brightkite users

Dataset	#Nodes	#Links	#Subnetworks	#Isolated Nodes	#Isolated Links	Maxlev
brightkite	58,228	214,078	547	0	362	10

This dataset comes from the Stanford University link above, which states:

Brightkite was once a location-based social networking service provider where users shared their locations by checking-in. The friendship network was collected using their public API, and consists of 58,228 nodes and 214,078 edges.

The source dataset has the links duplicated in reverse to represent non-directionality (which doubles the number of links above), but we have excluded these duplicates, as we handle non-directionality in the code instead.

The data file here contains lines consisting of just a pair of numbers separated by a comma, eg:

6520,34162

Bacon Numbers

Bacon Numbers Datasets from Oberlin College

As mentioned in the introduction the Bacon Number problem is based on links defined by actors appearing in the same film. Any pair of actors can appear in multiple films together, but only a single link will be created in the network data model for a given pair. Datasets are taken from Oberlin College Computer Science department:

Dataset	File Name	Oberlin Description	#Lines in File
bacon/small	imdb.small.txt	a 1,817 line file with just a handful of performers (161), fully connected	1,817
bacon/top250	imdb.top250.txt	a 14,339 line file listing just the top 250 movies on IMDB. (Disconnected groups of foreign films.)	14,339
bacon/pre1950	imdb.pre1950.txt	a 966,338 line file with movies made before 1950	1,014,465
bacon/only_tv_v	imdb.only-tv-v.txt	a 2,021,636 line file with only made for TV and direct to video movies	2,302,907
bacon/no_tv_v	imdb.no-tv-v.txt	a 5,793,218 line file without the made for TV and direct to video movies (best for the canonical Kevin Bacon game)	6,871,415
bacon/post1950	imdb.post1950.txt	a 6,848,516 line file with the movies made after 1950	8,159,857
bacon/full	imdb.full.txt	all 7,814,854 lines of IMDB for you to search through	9,174,322

The actual numbers of lines in the files are slightly larger than numbers mentioned in the Oberlin descriptions in some cases. The data files here contain lines consisting of an actor name separated by a ‘|’ delimiter from a film name, eg:

Arnold Schwarzenegger|Terminator, The (1984)

We can construct the network links as pairs of actors, in one direction only, like this:

SELECT src.actor, 
       dst.actor
FROM bacon_&TP._ext src
JOIN bacon_&TP._ext dst
  ON dst.film = src.film AND dst.actor > src.actor

This forms the basis of the Node Names Link view shown in the data model diagram above, with distinct node identifier pairs inserted into the network data model links table along with the node identifier/name pairs. In this example in particular the source node and link names are long strings, but using auto-generated numeric identifiers for the nodes keeps the network data model tables compact.

&TP in the SQL above represents a sqlplus variable that allows us to use the same script for each of the Bacon Numbers datasets.

The largest of The Bacon datasets results in a heavily looped network of 2,800,309 nodes and 109,262,592 links.

Data Model

↑ Contents
↓ Node Name Views
↓ Network Data Model
↓ Shortest Path Solution Tables
↓ Net Pipe Package

This section provides detail on the data model used. The example datasets are stored either in a script, or a database table, or on files accessed by Oracle external tables, as indicated in the diagram.

The shortest path algorithms are applied to tables of links and nodes containing the data for a single dataset at a time, populated by views specific to the dataset. The algorithms provide solutions to the problems in the two solution tables, which are queried in the scripts and joined to the nodes table where applicable.

The separate package for network analysis by depth-first recursion uses a separate view or table, links_v, which is here a table populated from the links table. it has a pipelined function returning network structure records, which can be queried and joined to the nodes table.

Node Name Views

↑ Data Model
↓ links_load_v
↓ nodes_load_v

links_load_v

↑ Node Name Views

This view selects the links as node name pairs from the source dataset.

Column	Type
node_name_fr	VARCHAR2(300)
node_name_to	VARCHAR2(300)

nodes_load_v

↑ Node Name Views

This view selects the node names from the source dataset.

Column	Type
node_name	VARCHAR2(300)

Network Data Model

↑ Data Model
↓ nodes
↓ links

nodes

↑ Network Data Model

This table holds the nodes. An auto-generated numeric unique identifier is assigned to each node name, which is then used in the links table.

Column	Type	Qualifier
id	NUMBER	GENERATED ALWAYS AS IDENTITY PRIMARY KEY
node_name	VARCHAR2(300)

links

↑ Network Data Model

This table holds the links. It consists of unique node id pairs.

Column	Type
node_id_fr	NUMBER
node_id_to	NUMBER

Both columns have non-unique indexes, and are logical foreign key references to nodes.id. This design keeps the links table compact, while the node names can be quite long, especially for the Bacon Numbers datasets.

Shortest Path Solution Tables

↑ Data Model
↓ min_tree_links
↓ node_roots

min_tree_links

↑ Shortest Path Solution Tables

This table holds a set of (node id, prior node id) pairs that constitutes the links for a spanning tree network connecting all the nodes in the subnetwork defined by the root, along with the path lengths. This tree network can easily be traversed using simple recursive SQL.

Column	Type	Note
node_id	NUMBER	Node id for all nodes connected to some root node, with unique index
node_id_prior	NUMBER	Node id for node prior to current node in shortest path from root
lev	NUMBER	Level, or path length, to node from root node in shortest path

node_roots

↑ Shortest Path Solution Tables

This table holds the root node ids for all nodes in the network. It groups the nodes into subnetworks identified by root node id.

Column	Type	Note
node_id	NUMBER	Node id for all nodes, with unique index
root_node_id	NUMBER	Root node id for the current node
lev	NUMBER	Level, or path length, to current node from root node

Net Pipe Package

↑ Data Model
↓ links_v

links_v

↑ Net Pipe Package

The Net_Pipe package has a pipelined function that returns all the links in a network with paths from root nodes for each subnetwork, identified by depth first recursion. The paths are not the shortest paths, and in fact are generally much longer than the shortest paths. It uses a table or view named links_v, as follows:

Column	Type
link_id	VARCHAR2(15)
node_id_fr	VARCHAR2(15)
node_id_to	VARCHAR2(15)

Both node id columns have non-unique indexes. The tables is populated from the links table, using string casting of the ids, as follows:

INSERT INTO links_v (
  link_id,
  node_id_fr,
  node_id_to
)
SELECT To_Char(node_id_fr) || '-' || To_Char(node_id_to),
       To_Char(node_id_fr),
       To_Char(node_id_to)
  FROM links

Network Paths by SQL

↑ Contents
↓ All Paths
↓ Minimum Length Paths

In this section we will show how pure SQL can be used to find netowrk paths, firstly all acyclic paths, and then shortest paths. THese methods are fairly simple, and workable for small data sets, but become unrealistic for larger problems owing to slow performance, as we’ll demonstrate later.

All Paths

↑ Network Paths by SQL
↓ SQL for All Paths
↓ SQL for All Paths: Performance

A query using a recursive subquery factor (also known as a Common Table Expression, or CTE) can be used to find all acyclic paths.

SQL for All Paths

↑ All Paths

WITH paths (node_id, lev) AS (
    SELECT &root_id_var, 0
      FROM DUAL
     UNION ALL
    SELECT CASE WHEN lnk.node_id_fr = pth.node_id THEN lnk.node_id_to ELSE lnk.node_id_fr END,
           pth.lev + 1
      FROM paths pth
      JOIN links lnk
        ON (lnk.node_id_fr = pth.node_id OR lnk.node_id_to = pth.node_id)
)  SEARCH DEPTH FIRST BY node_id SET line_no
CYCLE node_id SET cycle TO '*' DEFAULT ' '
SELECT n.node_name, 
       Substr(LPad ('.', 1 + 2 * p.lev, '.') || p.node_id, 2) node,
       p.lev
  FROM paths p
  JOIN nodes n
    ON n.id = p.node_id
 WHERE cycle = ' '
  ORDER BY p.line_no

Notes on SQL

The SELECT list of the recursive subquery consists of node id and level
The anchor branch selects the root node from dual
In the recursive branch links are joined to the nodes of the previous iteration using either the from or the to field of the link
The specified SEARCH clause provides for the output displayed below, with siblings ordered by node_id
As the network contains cycles, we need the CYCLE clause, and suppress the cycle nodes using the cycle pseudocolumn in the WHERE clause
The lev column is used to indent the solution records to show the network structure

Solution for three_subnets

NODE_NAME   NODE        LEV
----------- ---------- ----
S1-N0-1     1             0
S1-N1-1     ..2           1
S1-N2-1     ....7         2
S1-N3-1     ......10      3
S1-N1-2     ..3           1
S1-N1-3     ....4         2
S1-N2-2     ....8         2
S1-N1-3     ..4           1
S1-N1-2     ....3         2
S1-N2-2     ......8       3
S1-N1-4     ..5           1
S1-N2-3     ....9         2
S1-N1-5     ......6       3
S1-N3-2     ......11      3
S1-N1-5     ..6           1
S1-N2-3     ....9         2
S1-N1-4     ......5       3
S1-N3-2     ......11      3

18 rows selected.

SQL for All Paths: Performance

↑ All Paths

Execution Plan for three_subnets

The execution plan is obtained by adding the hint after the main SELECT keyword:

/*+ gather_plan_statistics XPLAN_ALL_PATHS */

and then executing a wrapper API procedure, passing in the marker code:

EXEC Utils.W(Utils.Get_XPlan(p_sql_marker => 'XPLAN_ALL_PATHS'));

This produces output:

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |              |      1 |        |     18 |00:00:00.01 |     110 |       |       |          |
|   1 |  SORT ORDER BY                             |              |      1 |   1054 |     18 |00:00:00.01 |     110 |  2048 |  2048 | 2048  (0)|
|   2 |   MERGE JOIN                               |              |      1 |   1054 |     18 |00:00:00.01 |     110 |       |       |          |
|   3 |    TABLE ACCESS BY INDEX ROWID             | NODES        |      1 |     14 |     12 |00:00:00.01 |       2 |       |       |          |
|   4 |     INDEX FULL SCAN                        | SYS_C0016204 |      1 |     14 |     12 |00:00:00.01 |       1 |       |       |          |
|*  5 |    SORT JOIN                               |              |     12 |   1054 |     18 |00:00:00.01 |     108 |  2048 |  2048 | 2048  (0)|
|*  6 |     VIEW                                   |              |      1 |   1054 |     18 |00:00:00.01 |     108 |       |       |          |
|   7 |      UNION ALL (RECURSIVE WITH) DEPTH FIRST|              |      1 |        |     39 |00:00:00.01 |     108 |  4096 |  4096 | 4096  (0)|
|   8 |       FAST DUAL                            |              |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |       NESTED LOOPS                         |              |      4 |   1053 |     38 |00:00:00.01 |     108 |       |       |          |
|  10 |        RECURSIVE WITH PUMP                 |              |      4 |        |     18 |00:00:00.01 |       0 |       |       |          |
|* 11 |        TABLE ACCESS FULL                   | LINKS        |     18 |      3 |     38 |00:00:00.01 |     108 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("N"."ID"="P"."NODE_ID")
filter("N"."ID"="P"."NODE_ID")
6 - filter("P"."CYCLE"=' ')
11 - filter(("LNK"."NODE_ID_FR"="PTH"."NODE_ID" OR "LNK"."NODE_ID_TO"="PTH"."NODE_ID"))

Performance Considerations

For tree networks (i.e. networks without loops) each node has only one path from the designated root, and the SQL solution can obtain these efficiently, as it can for small looped networks. In larger looped networks, the number of paths can become extremely large, and so finding all of them can be very resource-intensive; this would also be true for methods other than pure queries.

Minimum Length Paths

↑ Network Paths by SQL
↓ SQL for Minimum Length Paths 1: One Recursive Subquery
↓ One Recursive Subquery: Performance
↓ SQL for Minimum Length Paths 2: Two Recursive Subqueries
↓ Two Recursive Subqueries: Performance

A query using a recursive subquery factor can be used to find the acyclic paths of minimum length, with some extra complexity, including a ranking subquery. This is based on an article I wrote in April 2015, SQL for Shortest Path Problems.

SQL for Minimum Length Paths 1: One Recursive Subquery

WITH paths (node_id, rnk, lev) AS (
    SELECT &root_id_var, 1, 0
      FROM DUAL
     UNION ALL
    SELECT CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to ELSE l.node_id_fr END,
            Rank () OVER (PARTITION BY CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to 
                                                                          ELSE l.node_id_fr END
                          ORDER BY p.node_id),
            p.lev + 1
      FROM paths p
      JOIN links l
        ON p.node_id IN (l.node_id_fr, l.node_id_to)
     WHERE p.rnk = 1
)  SEARCH DEPTH FIRST BY node_id SET line_no
CYCLE node_id SET lp TO '*' DEFAULT ' '
, node_min_levs AS (
    SELECT node_id,
           Min (lev) KEEP (DENSE_RANK FIRST ORDER BY lev) lev,
           Min (line_no) KEEP (DENSE_RANK FIRST ORDER BY lev) line_no
      FROM paths
     GROUP BY node_id
)
SELECT /*+ gather_plan_statistics XPLAN_RSF_1 */ 
       n.node_name, 
       Substr(LPad ('.', 1 + 2 * m.lev, '.') || m.node_id, 2) node, 
       m.lev lev
  FROM node_min_levs m
  JOIN nodes n
    ON n.id = m.node_id
 ORDER BY m.line_no

Notes on SQL

The SELECT list of the recursive subquery now has an extra field, rnk
The new rnk field has the rank of each record for a given node at each iteration, based on the prior node id
At each iteration only the record of rank 1 is joined to new links, avoiding duplication
The new subquery, node_min_levs, selects the preferred record of minimum length for each node

Output for three_subnets

Now the 18 records for all acyclic paths are reduced to 11, for the minimum (preferred) path to each node:

NODE_NAME   NODE        LEV
----------- ---------- ----
S1-N0-1     1             0
S1-N1-1     ..2           1
S1-N2-1     ....7         2
S1-N3-1     ......10      3
S1-N1-2     ..3           1
S1-N2-2     ....8         2
S1-N1-3     ..4           1
S1-N1-4     ..5           1
S1-N2-3     ....9         2
S1-N3-2     ......11      3
S1-N1-5     ..6           1

11 rows selected.

One Recursive Subquery: Performance

Execution Plan for bacon/small

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |       |      1 |        |    161 |00:00:07.90 |    5032K|       |       |          |
|   1 |  SORT ORDER BY                              |       |      1 |    381G|    161 |00:00:07.90 |    5032K| 18432 | 18432 |16384  (0)|
|*  2 |   HASH JOIN                                 |       |      1 |    381G|    161 |00:00:07.90 |    5032K|  1449K|  1449K| 1664K (0)|
|   3 |    TABLE ACCESS FULL                        | NODES |      1 |    161 |    161 |00:00:00.01 |       7 |       |       |          |
|   4 |    VIEW                                     |       |      1 |    381G|    161 |00:00:07.90 |    5032K|       |       |          |
|   5 |     SORT GROUP BY                           |       |      1 |    381G|    161 |00:00:07.90 |    5032K| 31744 | 31744 |28672  (0)|
|   6 |      VIEW                                   |       |      1 |    381G|    220K|00:00:07.81 |    5032K|       |       |          |
|   7 |       UNION ALL (RECURSIVE WITH) DEPTH FIRST|       |      1 |        |    220K|00:00:07.77 |    5032K|    19M|  1646K|   17M (0)|
|   8 |        FAST DUAL                            |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |        WINDOW SORT                          |       |     79 |    381G|    220K|00:00:00.72 |   57440 |   478K|   448K|  424K (0)|
|  10 |         NESTED LOOPS                        |       |     79 |    381G|    220K|00:00:00.52 |   57440 |       |       |          |
|  11 |          RECURSIVE WITH PUMP                |       |     79 |        |   3590 |00:00:00.01 |       0 |       |       |          |
|* 12 |          TABLE ACCESS FULL                  | LINKS |   3590 |     45 |    220K|00:00:00.70 |   57440 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N"."ID"="M"."NODE_ID")
12 - filter(("P"."NODE_ID"="L"."NODE_ID_FR" OR "P"."NODE_ID"="L"."NODE_ID_TO"))

Performance Considerations

As with searching for all paths, the SQL solution can obtain the minimum length paths efficiently for tree networks and smaller looped networks. In larger looped networks, however, the number of paths overall can become extremely large, and can be much larger than the number of minimum length paths.

This poses a problem specific to the use of recursive subquery SQL: Although the recursive subquery discards all but one path to a given node at a given iteration, it has no access to other paths to the node that may have been reached at earlier iterations, and so may persist with longer paths that will be discarded in the later ranking subquery.

In May 2015 I explained how we can use the idea of approximative solutions to mitigate this problem. SQL for Shortest Path Problems 2: A Branch and Bound Approach. In summary, the idea is that we start with a recursive subquery that truncates paths after a specified number of iterations. The result set from this subquery provides a list of nodes with the minimum length found for each node.

As the number of iterations is limited not all nodes may be in the list, but for those that are we know that it’s not worth persisting with any node found in another search that has a path length longer than that found in the subquery. We can therefore run a similar recursive subquery with the first one as an input used to discard paths early, but without the iteration limit, and hence find the full solution more efficiently.

Run Statistics for Example Datasets: One Recursive Subquery

Dataset	#Nodes	#Links	Root Node	#Nodes	Maxlev	#Secs	query_min_paths_1_*.log
three_subnets	14	13	S1-N0-1	11	3	0.01	three_subnets_s1-n0-1
foreign_keys	289	319	ORDDCM_STORED_TAGS_WRK\|ORDDATA	15	5	0.01	sys_fks_orddcm_stored_tags_wrk-orddata
bacon/small	161	3,342	Willie Allemang	161	5	8	small_willie_allemang
bacon/top250	12,466	583,993	William O’Malley (II)	11,803	7	NA	top250william_omalley(ii)

The bacon/top250 run did not complete after running for several hours, so no larger datasets were tried.

SQL for Minimum Length Paths 2: Two Recursive Subqueries

The following query implements the idea mentioned above, and is essentially as given in the May 2015 article mentioned.

WITH paths_truncated (node_id, lev, rn) AS (
    SELECT &root_id_var, 0, 1
      FROM DUAL
     UNION ALL
    SELECT CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to ELSE l.node_id_fr END,
           p.lev + 1,
           Row_Number () OVER (PARTITION BY CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to 
                                                                               ELSE l.node_id_fr END
                               ORDER BY p.node_id)
      FROM paths_truncated p
      JOIN links l
        ON p.node_id IN (l.node_id_fr, l.node_id_to)
     WHERE p.rn = 1
       AND p.lev < &LEVMAX
)
CYCLE node_id SET lp TO '*' DEFAULT ' '
, approx_best_paths AS (
    SELECT node_id,
           Max (lev) KEEP (DENSE_RANK FIRST ORDER BY lev) lev
      FROM paths_truncated
     GROUP BY node_id
), paths (node_id, lev, rn) AS (
    SELECT &root_id_var, 0, 1
      FROM DUAL
     UNION ALL
    SELECT CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to 
                                              ELSE l.node_id_fr END,
           p.lev + 1,
           Row_Number () OVER (PARTITION BY CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to
                                                                               ELSE l.node_id_fr END
                               ORDER BY p.node_id)
      FROM paths p
      JOIN links l
        ON p.node_id IN (l.node_id_fr, l.node_id_to)
      LEFT JOIN approx_best_paths b
        ON b.node_id = CASE WHEN l.node_id_fr = p.node_id THEN l.node_id_to 
                                                          ELSE l.node_id_fr END
     WHERE p.rn = 1
       AND p.lev < Nvl (b.lev, 1000000)
)  SEARCH DEPTH FIRST BY node_id SET line_no CYCLE node_id SET lp TO '*' DEFAULT ' '
, node_min_levs AS (
    SELECT node_id,
           Min (lev) KEEP (DENSE_RANK FIRST ORDER BY lev) lev,
           Min (line_no) KEEP (DENSE_RANK FIRST ORDER BY lev) line_no
      FROM paths
     GROUP BY node_id
)
SELECT n.node_name, 
       Substr(LPad ('.', 1 + 2 * m.lev, '.') || m.node_id, 2) node, 
       m.lev lev
  FROM node_min_levs m
  JOIN nodes n
    ON n.id = m.node_id
 ORDER BY m.line_no

Notes on SQL

The first recursive subquery, paths_truncated, searches for paths as in the first query above, but now terminates after LEVMAX iterations
approx_best_paths finds the shortest path length of each node reached in paths_truncated
The second recursive subquery, paths, searches for paths again, this time essentially without iteration limit
It uses an outer join to approx_best_paths to truncate any paths longer than previously obtained for a node
The subquery node_min_levs and main section operate as in the first query

Two Recursive Subqueries: Performance

↑ Contents
↓ Function for Min Pathfinder
↓ Code Timing Min Pathfinder

Execution Plan for Bacon/top250

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                      |       |      1 |        |  11803 |00:12:55.42 |     130M|  72673 |  46194 |       |       |          |         |
|   1 |  SORT ORDER BY                                        |       |      1 |     18E|  11803 |00:12:55.42 |     130M|  72673 |  46194 |   903K|   523K|  802K (0)|         |
|*  2 |   HASH JOIN                                           |       |      1 |     18E|  11803 |00:12:55.41 |     130M|  72673 |  46194 |  1896K|  1896K| 2177K (0)|         |
|   3 |    TABLE ACCESS FULL                                  | NODES |      1 |  12466 |  12466 |00:00:00.01 |      46 |      0 |      0 |       |       |          |         |
|   4 |    VIEW                                               |       |      1 |     18E|  11803 |00:12:55.40 |     130M|  72673 |  46194 |       |       |          |         |
|   5 |     SORT GROUP BY                                     |       |      1 |     18E|  11803 |00:12:55.40 |     130M|  72673 |  46194 | 38912 | 38912 | 1495K (1)|    3072K|
|   6 |      VIEW                                             |       |      1 |     18E|    672K|00:12:54.66 |     130M|  72371 |  45892 |       |       |          |         |
|   7 |       UNION ALL (RECURSIVE WITH) DEPTH FIRST          |       |      1 |        |    672K|00:12:54.53 |     130M|  72371 |  45892 |    30M|  1975K| 1966K (3)|         |
|   8 |        FAST DUAL                                      |       |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   9 |        WINDOW SORT                                    |       |    128 |     18E|    672K|00:11:57.72 |      65M|  36987 |  36857 |  2628K|   735K| 1347K (1)|    3072K|
|* 10 |         FILTER                                        |       |    128 |        |    672K|00:11:57.25 |      65M|  36404 |  36274 |       |       |          |         |
|  11 |          MERGE JOIN OUTER                             |       |    128 |     18E|   1821K|00:11:57.02 |      65M|  36404 |  36274 |       |       |          |         |
|  12 |           SORT JOIN                                   |       |    128 |     18E|   1821K|00:06:38.56 |      24M|   8736 |   8606 |    22M|  1751K| 1936K (5)|      22M|
|  13 |            NESTED LOOPS                               |       |    128 |     18E|   1821K|00:06:39.34 |      24M|    130 |      0 |       |       |          |         |
|  14 |             RECURSIVE WITH PUMP                       |       |    128 |        |  21483 |00:00:00.11 |       1 |    130 |      0 |       |       |          |         |
|* 15 |             TABLE ACCESS FULL                         | LINKS |  21483 |     95 |   1821K|00:07:40.14 |      24M|      0 |      0 |       |       |          |         |
|* 16 |           SORT JOIN (REUSE)                           |       |   1821K|     70T|   1203K|00:05:17.98 |      41M|  27668 |  27668 |   372K|   372K|  330K (0)|         |
|  17 |            VIEW                                       |       |      1 |     70T|  11625 |00:05:17.18 |      41M|  27668 |  27668 |       |       |          |         |
|  18 |             SORT GROUP BY                             |       |      1 |     70T|  11625 |00:05:17.18 |      41M|  27668 |  27668 | 12288 | 12288 | 1546K (1)|    2048K|
|  19 |              VIEW                                     |       |      1 |     70T|   1666K|00:00:25.89 |      41M|  27365 |  27365 |       |       |          |         |
|  20 |               UNION ALL (RECURSIVE WITH) BREADTH FIRST|       |      1 |        |   1666K|00:00:25.48 |      41M|  27365 |  27365 |    27M|  1901K| 1810K (0)|         |
|  21 |                FAST DUAL                              |       |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  22 |                WINDOW SORT                            |       |      6 |     70T|   1666K|00:04:53.82 |      18M|  27365 |  22591 |    46M|  2408K| 2188K (8)|      44M|
|  23 |                 NESTED LOOPS                          |       |      6 |     70T|   1666K|00:04:50.98 |      18M|   4774 |      0 |       |       |          |         |
|  24 |                  RECURSIVE WITH PUMP                  |       |      6 |        |  16426 |00:00:00.25 |       2 |   4774 |      0 |       |       |          |         |
|* 25 |                  TABLE ACCESS FULL                    | LINKS |  16426 |     95 |   1666K|00:06:32.63 |      18M|      0 |      0 |       |       |          |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N"."ID"="M"."NODE_ID")
10 - filter("P"."LEV"<NVL("B"."LEV",1000000))
15 - filter(("P"."NODE_ID"="L"."NODE_ID_FR" OR "P"."NODE_ID"="L"."NODE_ID_TO"))
16 - access("B"."NODE_ID"=CASE "L"."NODE_ID_FR" WHEN "P"."NODE_ID" THEN "L"."NODE_ID_TO" ELSE "L"."NODE_ID_FR" END )
filter("B"."NODE_ID"=CASE "L"."NODE_ID_FR" WHEN "P"."NODE_ID" THEN "L"."NODE_ID_TO" ELSE "L"."NODE_ID_FR" END )
25 - filter(("P"."NODE_ID"="L"."NODE_ID_FR" OR "P"."NODE_ID"="L"."NODE_ID_TO"))

Performance Considerations

The second query for minimum length paths improves performance by obtaining a partial, approximative solution to enable early truncation of some of the paths in the subquery for the full solution.

However, the use of a hard-coded iteration limit in the first subquery has obvious limitations. If we make the limit too low, the first subquery will provide too little information to optimize the second; on the other hand, if we make it too large then the approximative subquery itself will have too much work to do.

This might lead one to consider having, instead of a single approximative subquery, a sequence of them, starting with a very small limit and increasing it gradually, with each subquery providing the input to the next one. Perhaps, instead of a single subquery truncating after LEVMAX iterations we would have LEVMAX subqueries truncating in sequence after from 1 to LEVMAX iterations, prior to the unlimited subquery.

The approach might well give better performance for larger looped networks, but the hard-coding of a fixed number of subqueries suggests that we may be over-extending the use of pure SQL here. If we were to think about this more prcedurally, we could have a single loop having a single section of code implementing each iteration (which could be an SQL statement). We could then save the results of each iteration to feed into the next.

Run Statistics for Example Datasets: Two Recursive Subqueries

LM is the value of the LEVMAX parameter mentioned above, while Maxlev is the maximum level encountered.

Dataset	#Nodes(all)	#Links	Root Node	#Nodes(sub)	Maxlev	Truncate at	#Secs	query_min_paths_2_*.log
three_subnets	14	13	S1-N0-1	11	3	3	0.02	three_subnets_s1-n0-1_3
foreign_keys	289	319	ORDDCM_STORED_TAGS_WRK\|ORDDATA	47	5	5	0.01	sys_fks_orddcm_stored_tags_wrk-orddata_5
brightkite	58,228	214,078	6	56,739	10	5	559	brightkite_6_5
bacon/small	161	3,342	Willie Allemang	161	5	5	0.1	small_willie_allemang_5
bacon/top250	12,466	583,993	William O’Malley (II)	11,803	7	5	796	top250william_omalley(ii)_5
bacon/top250	12,466	583,993	William O’Malley (II)	11,803	7	10	1,663	top250william_omalley(ii)_10

The results show that the two-step algorithm is much faster than the one-step algorithm, and on the larget dataset tried, a LEVMAX value of 5 performed better than a value of 10. However, we’ll see that the PL/SQL algorithm described in a later section outperfoms the run time of 13 minutes by a large margin, so I have not tried larger datasets with the pure SQL algorithms.

Min Pathfinder

In this section we present the PL/SQL procedure, Ins_Min_Tree_Links, implementing the Min Pathfinder algorithm and show the results across all datasets. We then show the output from code timing the procedure, which may provide ideas for tuning possibilities.

Function for Min Pathfinder

↑ Min Pathfinder
↓ Ins_Min_Tree_Links
↓ Query for Spanning Tree
↓ Run Statistics for Example Datasets - Ins_Min_Tree_Links

The box below has a PL/SQL function taking a root node id parameter and returning the number of records inserted into a solution table, min_tree_links. It inserts all connected nodes with path length and the prior node id in the path. The node id and prior node id pairs form a spanning tree for the subnetwork that can be easily traversed using a recursive query.

Ins_Min_Tree_Links

↑ Function for Min Pathfinder
↓ Notes on Ins_Min_Tree_Links
↓ Results for three_subnets Dataset - Ins_Min_Tree_Links

FUNCTION Ins_Min_Tree_Links(
            p_root_node_id                 PLS_INTEGER)
            RETURN                         PLS_INTEGER IS
  l_lev         PLS_INTEGER := 0;
  l_ins         PLS_INTEGER;
  l_ins_tot     PLS_INTEGER := 0;
BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE min_tree_links';
  INSERT INTO min_tree_links VALUES (p_root_node_id, '', 0);
  LOOP
    INSERT INTO min_tree_links
    SELECT CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                      ELSE lnk.node_id_fr END,
           Min (mlp_cur.node_id),
           l_lev + 1
      FROM min_tree_links mlp_cur
      JOIN links lnk
        ON (lnk.node_id_fr = mlp_cur.node_id OR lnk.node_id_to = mlp_cur.node_id)
      LEFT JOIN min_tree_links mlp_pri
        ON mlp_pri.node_id = CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                                        ELSE lnk.node_id_fr END
     WHERE mlp_pri.node_id IS NULL
       AND mlp_cur.lev = l_lev
     GROUP BY CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                         ELSE lnk.node_id_fr END;
    l_ins := SQL%ROWCOUNT;
    COMMIT;
    l_ins_tot := l_ins_tot + l_ins;
    EXIT WHEN l_ins = 0;
    l_lev := l_lev + 1;
  END LOOP;
  RETURN l_ins_tot;
END Ins_Min_Tree_Links;

Notes on Ins_Min_Tree_Links

↑ Ins_Min_Tree_Links

Truncate the solution table, min_tree_links, and insert the root node at level 0
Loop while records are inserted
- Insert a new node record at the next level:
  - for every link that is connected to a node at the current level:
    - that does not exist in the table for any prior level
    - and does not appear at the next level for any other link with a higher ranked path
- Commit
- Increment level and inserts counter
- Exit when no records inserted
Return the number of records inserted

Results for three_subnets Dataset - Ins_Min_Tree_Links

↑ Ins_Min_Tree_Links

The results can be queried from the solution table, joining the nodes table to get the node name:

SELECT mlp.lev, nod.node_name, nod_p.node_name node_name_prior
  FROM min_tree_links mlp
  JOIN nodes nod ON nod.id = mlp.node_id
  LEFT JOIN nodes nod_p ON nod_p.id = mlp.node_id_prior
 ORDER BY 1, 3, 2

Here is the result for three_subnets:

 LEV NODE_NAME   NODE_NAME_PRIOR
---- ----------- ---------------
   0 S1-N0-1
   1 S1-N1-1     S1-N0-1
     S1-N1-2     S1-N0-1
     S1-N1-3     S1-N0-1
     S1-N1-4     S1-N0-1
     S1-N1-5     S1-N0-1
   2 S1-N2-1     S1-N1-1
     S1-N2-2     S1-N1-2
     S1-N2-3     S1-N1-4
   3 S1-N3-1     S1-N2-1
     S1-N3-2     S1-N2-3

Note that the LEV (short for level) column is the length of the shortest path, and corresponds to the iteration number within the algorithm.

Query for Spanning Tree

↑ Function for Min Pathfinder
↓ Query Output for three_subnets

The solution table can also be used to traverse the spanning tree via a simple recursive query.

WITH paths (node_id, lev) AS (
    SELECT &root_id_var, 0
      FROM DUAL
     UNION ALL
    SELECT mtl.node_id, 
           pth.lev + 1
      FROM paths pth
      JOIN min_tree_links mtl
        ON mtl.node_id_prior = pth.node_id
)  SEARCH DEPTH FIRST BY node_id SET line_no
SELECT n.node_name, Substr(LPad ('.', 1 + 2 * p.lev, '.') || p.node_id, 2) node, p.lev lev
  FROM paths p
  JOIN nodes n
    ON n.id = p.node_id
  ORDER BY p.line_no

Notice that the query has no CYCLE clause because the solution table contains a tree network by construction.

The paths are shown by indentation and ordering in the query shown, but it is easy to add an explicit path as a separate column formatted as a delimited string if desired.

Query Output for three_subnets

↑ Query for Spanning Tree

NODE_NAME  NODE         LEV
---------- ----------- ----
S1-N0-1    1              0
S1-N1-1    ..2            1
S1-N2-1    ....7          2
S1-N3-1    ......10       3
S1-N1-2    ..3            1
S1-N2-2    ....8          2
S1-N1-3    ..4            1
S1-N1-4    ..5            1
S1-N2-3    ....9          2
S1-N3-2    ......11       3
S1-N1-5    ..6            1

Run Statistics for Example Datasets - Ins_Min_Tree_Links

↑ Function for Min Pathfinder

Dataset	#Nodes	#Links	Root Node	#Nodes	#Maxlev	Ela(s)	CPU(s)	ins_min_tree_links_*.log
three_subnets	14	13	S1-N0-1	11	3	0.3	0.3	three_subnets_s1-n0-1
foreign_keys	289	319	ORDDCM_STORED_TAGS_WRK\|ORDDATA	15	5	0.2	0.2	sys_fks_orddcm_stored_tags_wrk-orddata
brightkite	58,228	214,078	6	56,739	10	1.6	1.6	brightkite_6
bacon/small	161	3,342	Willie Allemang	161	5	0.2	0.2	small_willie_allemang
bacon/top250	12,466	583,993	William O’Malley (II)	11,803	7	4	4	top250william_omalley(ii)
bacon/pre1950	134,131	8,095,294	Andreas Aabel	128,787	13	46	45	pre1950_andreas_aabel
bacon/only_tv_v	744,374	22,503,060	Mike Ogas	680,060	12	206	200	only_tv_v_mike_ogas
bacon/no_tv_v	2,386,567	87,866,033	Mike Ogas	2,221,752	10	1,683	734	no_tv_v_mike_ogas
bacon/post1950	2,696,175	101,597,227	Mike Ogas	2,523,185	9	2,004	913	post1950_mike_ogas
bacon/full	2,800,309	109,262,592	Mike Ogas	2,623,450	10	3,746	1,349	full_mike_ogas

Notice that for the smaller problems the CPU time is close to the elapsed time, while for the three much larger problems the elapsed time is around 2.5 times larger. This may be related to the insertion of records and file system writes.

Note that three log files are too large to push to GitHub, so extracts files were pushed instead:

ins_min_tree_links_no_tv_v_mike_ogas-extract-for-github.log
ins_min_tree_links_post1950_mike_ogas-extract-for-github.log
ins_min_tree_links_full_mike_ogas-extract-for-github.log

Code Timing Min Pathfinder

↑ Min Pathfinder
↓ Code Timing Ins_Min_Tree_Links
↓ Code Timing Results - Ins_Min_Tree_Links
↓ Execution Plan for bacon/only_tv_v Dataset

The box below shows the function Ins_Min_Tree_Links with calls made to the custom code timing package, Timer_Set.

Code Timing Ins_Min_Tree_Links

↑ Code Timing Min Pathfinder

FUNCTION Ins_Min_Tree_Links(
            p_root_node_id                 PLS_INTEGER)
            RETURN                         PLS_INTEGER IS
  l_lev         PLS_INTEGER := 0;
  l_ins         PLS_INTEGER;
  l_ins_tot     PLS_INTEGER := 0;
  l_ts_id       PLS_INTEGER := Timer_Set.Construct('Ins_Min_Tree_Links: ' || p_root_node_id);
BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE min_tree_links';
  INSERT INTO min_tree_links VALUES (p_root_node_id, '', 0);
  LOOP
    INSERT INTO min_tree_links
    SELECT /*+ gather_plan_statistics XPLAN_MTL */ 
           CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                      ELSE lnk.node_id_fr END,
           Min (mlp_cur.node_id),
           l_lev + 1
      FROM min_tree_links mlp_cur
      JOIN links lnk
        ON (lnk.node_id_fr = mlp_cur.node_id OR lnk.node_id_to = mlp_cur.node_id)
      LEFT JOIN min_tree_links mlp_pri
        ON mlp_pri.node_id = CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                                        ELSE lnk.node_id_fr END
     WHERE mlp_pri.node_id IS NULL
       AND mlp_cur.lev = l_lev
     GROUP BY CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                         ELSE lnk.node_id_fr END;
    l_ins := SQL%ROWCOUNT;
    COMMIT;
    l_ins_tot := l_ins_tot + l_ins;
    Timer_Set.Increment_Time(l_ts_id, 'Level: ' || l_lev || ', nodes: ' || l_ins);
    EXIT WHEN l_ins = 0;
    l_lev := l_lev + 1;
  END LOOP;
  Utils.W(Utils.Get_XPlan(p_sql_marker => 'XPLAN_MTL'));
  Utils.W(Timer_Set.Format_Results(l_ts_id));
  RETURN l_ins_tot;
END Ins_Min_Tree_Links;

The timer set is constructed with the root node id parameter in the timer set name
A timing call to Increment_Time is made after the insert at each iteration, with level and number of records inserted in the timer name
The timing results are written after the loop

Code Timing Results - Ins_Min_Tree_Links

↑ Code Timing Min Pathfinder

Code timing results for bacon/only_tv_v Dataset:

Timer Set: Ins_Min_Tree_Links: 10001, Constructed at 30 Jul 2022 16:07:35, written at 16:11:01
==============================================================================================
Timer                       Elapsed         CPU       Calls       Ela/Call       CPU/Call
-----------------------  ----------  ----------  ----------  -------------  -------------
Level: 0, nodes: 38            0.05        0.04           1        0.04700        0.04000
Level: 1, nodes: 5169          0.04        0.03           1        0.04200        0.03000
Level: 2, nodes: 202118       13.77       13.72           1       13.76500       13.72000
Level: 3, nodes: 358824      104.69      100.59           1      104.69100      100.59000
Level: 4, nodes: 100099       75.15       74.11           1       75.14900       74.11000
Level: 5, nodes: 11298         9.61        9.61           1        9.60600        9.61000
Level: 6, nodes: 1865          1.15        1.14           1        1.14700        1.14000
Level: 7, nodes: 421           0.29        0.30           1        0.28900        0.30000
Level: 8, nodes: 170           0.16        0.16           1        0.16200        0.16000
Level: 9, nodes: 39            0.10        0.09           1        0.09700        0.09000
Level: 10, nodes: 11           0.07        0.08           1        0.07000        0.08000
Level: 11, nodes: 7            0.07        0.08           1        0.07300        0.08000
Level: 12, nodes: 0            0.07        0.06           1        0.07200        0.06000
(Other)                        0.39        0.39           1        0.39400        0.39000
-----------------------  ----------  ----------  ----------  -------------  -------------
Total                        205.60      200.40          14       14.68600       14.31429
-----------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.02061, CPU: 0.02245]

The results show elapsed and CPU times for each timer, as well as the level and the numbers of records inserted. As expected, times go up roughly in line with the number of records inserted.

This instrumentation is useful to see how the algorithm progresses through the iterations, although in this case we did not identify any tuning opportunities.

Execution Plan for bacon/only_tv_v Dataset

↑ Code Timing Min Pathfinder
↓ Execution Plan

The gather_plan_statistics hint is used to allow the execution plan to be obtained for an instance of the insert, using a marker string, XPLAN_MTL, to allow the correct record to be identified in the v$sql system view.

INSERT INTO min_tree_links
SELECT /*+ gather_plan_statistics XPLAN_MTL */ 
       CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                  ELSE lnk.node_id_fr END,
       Min (mlp_cur.node_id),
       l_lev + 1
  FROM min_tree_links mlp_cur
  JOIN links lnk
    ON (lnk.node_id_fr = mlp_cur.node_id OR lnk.node_id_to = mlp_cur.node_id)
  LEFT JOIN min_tree_links mlp_pri
    ON mlp_pri.node_id = CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                                    ELSE lnk.node_id_fr END
 WHERE mlp_pri.node_id IS NULL
   AND mlp_cur.lev = l_lev
 GROUP BY CASE WHEN lnk.node_id_fr = mlp_cur.node_id THEN lnk.node_id_to 
                                                     ELSE lnk.node_id_fr END;
                                                 ELSE lnk.node_id_fr END;

The plan is written out using a custom wrapper function Utils.Get_XPlan, that calls DBMS_XPlan.Display_Cursor, and is passed the marker string.

Utils.W(Utils.Get_XPlan(p_sql_marker => 'XPLAN_MTL'));

Execution Plan

↑ Execution Plan for bacon/only_tv_v Dataset

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                  |                 |      1 |        |      0 |00:00:00.07 |    6777 |       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL          | MIN_TREE_LINKS  |      1 |        |      0 |00:00:00.07 |    6777 |       |       |          |         |
|   2 |   HASH GROUP BY                   |                 |      1 |      2 |      0 |00:00:00.07 |    6777 |  1161K|  1161K| 1488K (0)|         |
|   3 |    VIEW                           | VW_ORE_BC29D05C |      1 |      2 |      0 |00:00:00.07 |    6777 |       |       |          |         |
|   4 |     UNION-ALL                     |                 |      1 |        |      0 |00:00:00.07 |    6777 |       |       |          |         |
|*  5 |      HASH JOIN ANTI               |                 |      1 |      1 |      0 |00:00:00.04 |    3388 |  1106K|  1106K| 1071K (0)|         |
|   6 |       NESTED LOOPS                |                 |      1 |     33 |     10 |00:00:00.01 |    1709 |       |       |          |         |
|   7 |        NESTED LOOPS               |                 |      1 |     33 |     10 |00:00:00.01 |    1699 |       |       |          |         |
|*  8 |         TABLE ACCESS FULL         | MIN_TREE_LINKS  |      1 |      1 |      7 |00:00:00.01 |    1683 |       |       |          |         |
|*  9 |         INDEX RANGE SCAN          | LINKS_FR_N1     |      7 |     33 |     10 |00:00:00.01 |      16 |       |       |          |         |
|  10 |        TABLE ACCESS BY INDEX ROWID| LINKS           |     10 |     33 |     10 |00:00:00.01 |      10 |       |       |          |         |
|  11 |       TABLE ACCESS FULL           | MIN_TREE_LINKS  |      1 |      1 |    680K|00:00:00.01 |    1679 |       |       |          |         |
|* 12 |      HASH JOIN ANTI               |                 |      1 |      1 |      0 |00:00:00.03 |    3389 |  1106K|  1106K|  823K (0)|         |
|  13 |       NESTED LOOPS                |                 |      1 |     33 |     12 |00:00:00.01 |    1710 |       |       |          |         |
|  14 |        NESTED LOOPS               |                 |      1 |     33 |     12 |00:00:00.01 |    1698 |       |       |          |         |
|* 15 |         TABLE ACCESS FULL         | MIN_TREE_LINKS  |      1 |      1 |      7 |00:00:00.01 |    1682 |       |       |          |         |
|* 16 |         INDEX RANGE SCAN          | LINKS_TO_N1     |      7 |     33 |     12 |00:00:00.01 |      16 |       |       |          |         |
|* 17 |        TABLE ACCESS BY INDEX ROWID| LINKS           |     12 |     33 |     12 |00:00:00.01 |      12 |       |       |          |         |
|  18 |       TABLE ACCESS FULL           | MIN_TREE_LINKS  |      1 |      1 |    680K|00:00:00.01 |    1679 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("MLP_PRI"."NODE_ID"=CASE "LNK"."NODE_ID_FR" WHEN "MLP_CUR"."NODE_ID" THEN "LNK"."NODE_ID_TO" ELSE "LNK"."NODE_ID_FR" END )
8 - filter("MLP_CUR"."LEV"=:B1)
9 - access("LNK"."NODE_ID_FR"="MLP_CUR"."NODE_ID")
12 - access("MLP_PRI"."NODE_ID"=CASE "LNK"."NODE_ID_FR" WHEN "MLP_CUR"."NODE_ID" THEN "LNK"."NODE_ID_TO" ELSE "LNK"."NODE_ID_FR" END )
15 - filter("MLP_CUR"."LEV"=:B1)
16 - access("LNK"."NODE_ID_TO"="MLP_CUR"."NODE_ID")
17 - filter(LNNVL("LNK"."NODE_ID_FR"="MLP_CUR"."NODE_ID"))

The UNION-ALL step shows that the links join with the OR condition has been used to split into two UNION query sections
Each section loops over the min_tree_links records at the current level
- It joins all links via the from or to nodes, using the relevant index
- It outer-joins to min_tree_links, using a hash join, and excludes any record where a join is made (HASH JOIN ANTI)

The plan looks reasonable. For the small three_subnets there is a different plan that does not use the indexes.

Subnetwork Grouper

↑ Contents
↓ Procedure for Subnetwork Grouper
↓ Code Timing Subnetwork Grouper
↓ Oracle Profilers for Subnetwork Grouper
↓ Comparison with Net_Pipe Array-Based Package

In this section we present the PL/SQL procedure, Ins_Node_Roots, implementing the Subnetwork Grouper algorithm and show the results across all datasets. We then show the output from code timing the procedure, which provides ideas for tuning possibilities.

This is the base version, which is tuned for improved performance in a later section.

Procedure for Subnetwork Grouper

↑ Subnetwork Grouper
↓ Ins_Node_Roots
↓ Notes on Ins_Node_Roots
↓ Results for three_subnets Dataset - Ins_Node_Roots
↓ Run Statistics for Example Datasets - Ins_Node_Roots

The box below shows the PL/SQL procedure implementing this algorithm. It calls the Min Pathfinder function within a loop to find the shortest paths for the subnetwork connected to a given root, then inserts the nodes found against the root; the next iteration takes an unassigned node as the new root until all nodes are assigned.

Ins_Node_Roots

↑ Procedure for Subnetwork Grouper

PROCEDURE Ins_Node_Roots IS
  l_root_id           PLS_INTEGER;
  l_ins_tot           PLS_INTEGER;
BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE node_roots';
  LOOP
    BEGIN
      SELECT id INTO l_root_id FROM nodes WHERE id NOT IN (SELECT node_id FROM node_roots) AND ROWNUM = 1;
    EXCEPTION
      WHEN NO_DATA_FOUND THEN 
        l_root_id := NULL;
    END;
    EXIT WHEN l_root_id IS NULL;
    l_ins_tot := Ins_Min_Tree_Links(l_root_id);
    INSERT INTO node_roots tgt
    SELECT node_id, l_root_id, lev FROM min_tree_links;
  END LOOP;
END Ins_Node_Roots;

Notes on Ins_Node_Roots

↑ Procedure for Subnetwork Grouper

Truncate the solution table, node_roots
Loop while a new root node is found
- Select a new root node id from nodes not in node_roots
  - Exit loop when none found
- Call Ins_Min_Tree_Links to populate the solution table, min_tree_links, for the new root node
- Insert all nodes in min_tree_links into node_roots against the new root node

Results for three_subnets Dataset - Ins_Node_Roots

↑ Procedure for Subnetwork Grouper
↓ Node count by root node
↓ Subnet count by number of nodes
↓ Subnet count by maximum level

The results can be queried in detail from the solution table, but we choose to aggregate the results in three ways:

Node count by root node

↑ Results for three_subnets Dataset - Ins_Node_Roots

SELECT nod.root_node_id, nod_r.node_name, COUNT() n_nodes, MAX(nod.lev) maxlev
  FROM node_roots nod
  JOIN nodes nod_r
    ON nod_r.id = nod.root_node_id
 GROUP BY nod.root_node_id, nod_r.node_name
 ORDER BY 3

Here is the result for three_subnets:

ROOT_NODE_ID NODE_NAME     N_NODES     MAXLEV
------------ ---------- ---------- ----------
          14 S3-N0-1             1          0
          12 S2-N0-1             2          1
           1 S1-N0-1            11          3
                        ----------
sum                             14

Subnet count by number of nodes

↑ Results for three_subnets Dataset - Ins_Node_Roots

WITH count_by_root AS (
    SELECT nod.root_node_id, COUNT() n_nodes
      FROM node_roots nod
     GROUP BY nod.root_node_id
)
SELECT n_nodes, COUNT() n_subnets
  FROM count_by_root
 GROUP BY n_nodes
 ORDER BY 1

Here is the result for three_subnets:

   N_NODES  N_SUBNETS
---------- ----------
         1          1
         2          1
        11          1
----------
        14

Subnet count by maximum level

↑ Results for three_subnets Dataset - Ins_Node_Roots

WITH count_by_root AS (
    SELECT nod.root_node_id, MAX(nod.lev) maxlev
      FROM node_roots nod
     GROUP BY nod.root_node_id
)
SELECT maxlev, COUNT() n_subnets
  FROM count_by_root
 GROUP BY maxlev
 ORDER BY 1

Here is the result for three_subnets:

    MAXLEV  N_SUBNETS
---------- ----------
         0          1
         1          1
         3          1

Run Statistics for Example Datasets - Ins_Node_Roots

↑ Procedure for Subnetwork Grouper

Dataset	#Nodes	#Links	#Subnets	#Maxlev	Ela(s)	CPU(s)	ins_node_roots_base_ts_*.log
three_subnets	14	13	3	3	0.07	0.06	three_subnets
foreign_keys	289	319	43	5	0.2	0.2	sys_fks
brightkite	58,228	214,078	547	10	7	7	brightkite
bacon/small	161	3,342	1	5	0.1	0.1	small
bacon/top250	12,466	583,993	15	6	1.9	1.9	top250
bacon/pre1950	134,131	8,095,294	2,432	13	85	83	pre1950
bacon/only_tv_v	744,374	22,503,060	19,641	11	1,714	1,704	only_tv_v
bacon/no_tv_v	2,386,567	87,866,033	55,276	11	16,108	15,852	no_tv_v
bacon/post1950	2,696,175	101,597,227	60,544	10	19,736	19,403	post1950
bacon/full	2,800,309	109,262,592	62,557	11	20,631	19,950	full

Notice that even for the three much larger problems the elapsed time is not much larger than the CPU. We’ll see from the code timing section later that most of the time is used in a single Select, and not in the insertion of records and consequent file system writes.

Code Timing Subnetwork Grouper

↑ Subnetwork Grouper
↓ Code Timing Ins_Node_Roots
↓ Code Timing Results - Ins_Node_Roots

The box below shows the Ins_Node_Roots procedure with calls to the custom code timing package, Timer_Set.

Code Timing Ins_Node_Roots

↑ Code Timing Subnetwork Grouper

PROCEDURE Ins_Node_Roots IS
  l_root_id           PLS_INTEGER;
  l_ins_tot           PLS_INTEGER;
  l_ts_id             PLS_INTEGER := Timer_Set.Construct('Ins_Node_Roots');
  l_suffix            VARCHAR2(60);
BEGIN
  EXECUTE IMMEDIATE 'TRUNCATE TABLE node_roots';
  LOOP
    BEGIN
      SELECT id INTO l_root_id FROM nodes WHERE id NOT IN (SELECT node_id FROM node_roots) AND ROWNUM = 1;
    EXCEPTION
      WHEN NO_DATA_FOUND THEN 
        l_root_id := NULL;
    END;
    Timer_Set.Increment_Time(l_ts_id, 'SELECT id INTO l_root_id');
    EXIT WHEN l_root_id IS NULL;
    l_ins_tot := Ins_Min_Tree_Links(l_root_id);

    l_suffix := CASE WHEN l_ins_tot = 0  THEN '(1 node)'
                     WHEN l_ins_tot = 1  THEN '(2 nodes)'
                     WHEN l_ins_tot = 2  THEN '(3 nodes)'
                     WHEN l_ins_tot < 40 THEN '(4-39 nodes)'
                     ELSE                     '(root node ' || l_root_id || ', size: ' || (l_ins_tot + 1) || ')'
                END;
    Timer_Set.Increment_Time(l_ts_id, 'Insert min_tree_links ' || l_suffix);
    INSERT INTO node_roots tgt
    SELECT node_id, l_root_id, lev FROM min_tree_links;
    Timer_Set.Increment_Time(l_ts_id, 'Insert node_roots ' || l_suffix);
  END LOOP;
  Utils.W(Timer_Set.Format_Results(l_ts_id));
END Ins_Node_Roots;

The timer set is constructed in the declaration section with identifier l_ts_id
A timing call to Increment_Time is made in three places within the main loop:
- After the query for the next root id
- After the call to Ins_Min_Tree_Links
- After the insert into node_roots from min_tree_links
The timing results are written after the loop

The call to Increment_Time passes in a timer name, and for the second two calls a suffix is appended to a base name. The suffix is based on the number of nodes in the subnetwork identified at that iteration. This allows these two timing sections to be aggregated according to subnetwork size: subnetworks are categorized as being of size 1, 2, 3, 4-39 nodes or more than 39 nodes. Timings for the first 4 categories are aggregated over all networks of the relevant size category, while larger networks are timesd separately by including their root node id in the timer name.

Code Timing Results - Ins_Node_Roots

↑ Code Timing Subnetwork Grouper

Code timing results for bacon/only_tv_v Dataset:

In the results below, 69 lines are removed at the ‘…’ for brevity.

Timer Set: Ins_Node_Roots, Constructed at 30 Jul 2022 16:15:48, written at 16:44:22
===================================================================================
Timer                                                   Elapsed         CPU       Calls       Ela/Call       CPU/Call
---------------------------------------------------  ----------  ----------  ----------  -------------  -------------
SELECT id INTO l_root_id                                1517.43     1506.68       19642        0.07725        0.07671
Insert min_tree_links (root node 579, size: 680060)      122.95      120.31           1      122.94500      120.31000
Insert node_roots (root node 579, size: 680060)            4.10        4.05           1        4.10400        4.05000
Insert min_tree_links (4-39 nodes)                        20.21       23.07        5317        0.00380        0.00434
Insert node_roots (4-39 nodes)                             1.56        1.61        5317        0.00029        0.00030
Insert min_tree_links (root node 646, size: 58)            0.01        0.01           1        0.00800        0.01000
Insert node_roots (root node 646, size: 58)                0.00        0.00           1        0.00000        0.00000
Insert min_tree_links (3 nodes)                            7.14        7.29        2091        0.00341        0.00349
Insert node_roots (3 nodes)                                0.50        0.62        2091        0.00024        0.00030
Insert min_tree_links (1 node)                            24.91       24.76        8659        0.00288        0.00286
Insert node_roots (1 node)                                 2.18        1.75        8659        0.00025        0.00020
Insert min_tree_links (2 nodes)                           11.74       11.67        3539        0.00332        0.00330
Insert node_roots (2 nodes)                                0.88        1.42        3539        0.00025        0.00040
...
(Other)                                                    0.00        0.00           1        0.00100        0.00000
---------------------------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                                   1714.02     1703.66       58925        0.02909        0.02891
---------------------------------------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.01282, CPU: 0.01282]

File: ins_node_roots_base_ts_only_tv_v.log

The results show a total elapsed time of 1,714 seconds and virtually all of that comes from the included lines, and 90% from the timer named SELECT id INTO l_root_id. This tells us that if we want to improve performance we need first to focus on that code section. Further, we note that the query was called 19,642 times with a time per call of 0.1 second, suggesting that significant time savings could be made either by reducing the number of calls, or by tuning the query itself, and in the next section we’ll consider both angles.

The next biggest time is 123 seconds from a single subnetwork of size 680060, but we probably can’t improve this as it comes from a single call to the function Ins_Min_Tree_Links that we have already reviewed.

We see 8,659 calls were made for '(1 node)' suffix timers and 3,539 for the '(2 nodes)' ones, with elapsed times totalling 27 seconds and 13 seconds, respectively. Each call would also correspond to an instance of SELECT id INTO l_root_id, making about 26% of the total for that line.

1 and 2 node subnetworks are obviously of trivial 1 node or 1 link structure, and in a later section we show how we can insert the node_roots records for these two special cases in two single inserts prior to the main algorithm.

Oracle Profilers for Subnetwork Grouper

↑ Subnetwork Grouper
↓ Flat Profiler
↓ Hierarchical Profiler
↓ Notes on the Standard Profilers

In this section we show the results of applying Oracle’s two PL/SQL profilers to the Ins_Node_Roots procedure on the bacon/only_tv_v dataset. I wrote a set of four articles on the use of profilers, including how to set them up, starting here PL/SQL Profiling 1: Overview. Here, we will just show the calling code blocks, and the results obtained from querying the profiling tables.

Flat Profiler

↑ Oracle Profilers for Subnetwork Grouper
↓ Calling Flat Profiler
↓ Run header (PLSQL_PROFILER_RUNS)
↓ Profiler data summary by unit (PLSQL_PROFILER_DATA)
↓ Profiler data by unit and line (PLSQL_PROFILER_DATA)
↓ Profiler data by time (PLSQL_PROFILER_DATA)

The flat profiler (DBMS_Profiler) provides line level elapsed time and call information.

Calling Flat Profiler

We can call the flat profiler to profile the call to Ins_Node_Roots as follows, with a custom script to query back the results from the DBMS_Profiler tables:

VAR RUN_ID NUMBER
DECLARE
  l_result           PLS_INTEGER;
BEGIN
  l_result := DBMS_Profiler.Start_Profiler(
          run_comment => 'Profile for Ins_Node_Roots',
          run_number  => :RUN_ID);
  Shortest_Path_SQL_Base.Ins_Node_Roots;
  l_result := DBMS_Profiler.Stop_Profiler;
END;
/
@..\..\dprof_queries :RUN_ID

The custom query script, dprof_queries.sql, queries the profiling tables mentioned in parentheses below.

Run header (PLSQL_PROFILER_RUNS)

Run header (PLSQL_PROFILER_RUNS)

 Run Id Time         Seconds    Microsecs
------- -------- ----------- ------------
     48 07:14:12    1989.250   1989250000

Profiler data overall summary (PLSQL_PROFILER_DATA)

    Seconds    Microsecs    Calls
----------- ------------ --------
   1968.085   1968084999   455322

This shows top level summary information, such as the total elapsed time taken.

Profiler data summary by unit (PLSQL_PROFILER_DATA)

Unit                        Unit#     Seconds    Microsecs    Calls
------------------------- ------- ----------- ------------ --------
<anonymous>                     1       0.000          224        3
...
SHORTEST_PATH_SQL_BASE          2    1964.486   1964486392   416035

19644 rows selected.

This shows timing and call information by program unit, which includes a large number of anonymous units as well as our own package.

Profiler data by unit and line (PLSQL_PROFILER_DATA)

    Seconds    Microsecs       Min S       Max S    Calls Unit                        Unit# Type              Line# Line Text
----------- ------------ ----------- ----------- -------- ------------------------- ------- --------------- ------- --------------------------------------------------------------------------------------------------------------
000            0       0.000       0.000        0 <anonymous>                     1 ANONYMOUS BLOCK       1
...
067        66802       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY          3 FUNCTION Ins_Min_Tree_Links(
006         5536       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY          7 l_lev         PLS_INTEGER := 0;
001         1132       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY          9 l_ins_tot     PLS_INTEGER := 0;
519     31518896       0.001       0.008    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         11 EXECUTE IMMEDIATE 'TRUNCATE TABLE min_tree_links';
828      4828400       0.000       0.001    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         12 INSERT INTO min_tree_links VALUES (p_root_node_id, '', 0);
000            0       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         13 LOOP
850    128849762       0.000      58.133    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         15 INSERT INTO min_tree_links
071        70908       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         30 l_ins := SQL%ROWCOUNT;
897      1897130       0.000       0.001    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         31 COMMIT;
010         9888       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         32 l_ins_tot := l_ins_tot + l_ins;
015        15468       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         33 EXIT WHEN l_ins = 0;
005         4704       0.000       0.000    11733 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         34 l_lev := l_lev + 1;
006         5939       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         37 RETURN l_ins_tot;
037        37230       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         39 END Ins_Min_Tree_Links;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         41 PROCEDURE Ins_All_Min_Path_Links(
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         44 l_lev         PLS_INTEGER := 0;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         46 l_ins_tot     PLS_INTEGER := 0;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         48 EXECUTE IMMEDIATE 'TRUNCATE TABLE min_tree_links';
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         49 INSERT INTO min_tree_links VALUES (p_root_node_id, '', 0);
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         50 LOOP
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         52 INSERT INTO min_tree_links
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         65 l_ins := SQL%ROWCOUNT;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         66 COMMIT;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         67 l_ins_tot := l_ins_tot + l_ins;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         68 EXIT WHEN l_ins = 0;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         69 l_lev := l_lev + 1;
000            0       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         73 END Ins_All_Min_Path_Links;
000            3       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         75 PROCEDURE Ins_Node_Roots IS
003         3204       0.000       0.003        1 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         81 EXECUTE IMMEDIATE 'TRUNCATE TABLE node_roots';
000            0       0.000       0.000    19642 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         82 LOOP
829   1789829185       0.004       0.556    19642 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         85 SELECT id INTO l_root_id FROM nodes WHERE id NOT IN (SELECT node_id FROM node_roots) AND ROWNUM = 1;
000            0       0.000       0.000        1 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         87 WHEN NO_DATA_FOUND THEN
000            0       0.000       0.000        1 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         88 l_root_id := NULL;
000            1       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         89 END;
006         5717       0.000       0.000    19642 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         90 EXIT WHEN l_root_id IS NULL;
019        18785       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         91 l_ins_tot := Ins_Min_Tree_Links(l_root_id);
318      7317698       0.000       3.780    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         93 INSERT INTO node_roots tgt
000            2       0.000       0.000        1 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         98 END Ins_Node_Roots;

157179 rows selected.

This shows timing and call information by program unit and line number, with the line text being queried back by the custom query script. It shows that 1,789 seconds are used by line 85, the root node selector. It shows 129 seconds used by line 15, the insert into min_tree_links.

Profiler data by time (PLSQL_PROFILER_DATA)

↑ Oracle Profilers for Subnetwork Grouper
↓ Calling Hierarchical Profiler
↓ Run header (DBMSHP_RUNS)
↓ Functions called (DBMSHP_FUNCTION_INFO)
↓ BPF Recursive Subquery Factor Tree Query (DBMSHP_PARENT_CHILD_INFO, DBMSHP_FUNCTION_INFO)
↓ HTML Reports File

    Seconds    Microsecs       Min S       Max S    Calls Unit                        Unit# Type              Line# Line Text
----------- ------------ ----------- ----------- -------- ------------------------- ------- --------------- ------- --------------------------------------------------------------------------------------------------------------
829   1789829185       0.004       0.556    19642 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         85 SELECT id INTO l_root_id FROM nodes WHERE id NOT IN (SELECT node_id FROM node_roots) AND ROWNUM = 1;
850    128849762       0.000      58.133    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         15 INSERT INTO min_tree_links
519     31518896       0.001       0.008    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         11 EXECUTE IMMEDIATE 'TRUNCATE TABLE min_tree_links';
318      7317698       0.000       3.780    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         93 INSERT INTO node_roots tgt
828      4828400       0.000       0.001    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         12 INSERT INTO min_tree_links VALUES (p_root_node_id, '', 0);
897      1897130       0.000       0.001    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         31 COMMIT;
071        70908       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         30 l_ins := SQL%ROWCOUNT;
067        66802       0.000       0.000        0 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY          3 FUNCTION Ins_Min_Tree_Links(
037        37230       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         39 END Ins_Min_Tree_Links;
019        18785       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         91 l_ins_tot := Ins_Min_Tree_Links(l_root_id);
015        15468       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         33 EXIT WHEN l_ins = 0;
010         9888       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         32 l_ins_tot := l_ins_tot + l_ins;
006         5939       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         37 RETURN l_ins_tot;
006         5717       0.000       0.000    19642 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         90 EXIT WHEN l_root_id IS NULL;
006         5536       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY          7 l_lev         PLS_INTEGER := 0;
005         4704       0.000       0.000    11733 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         34 l_lev := l_lev + 1;
003         3204       0.000       0.003        1 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         81 EXECUTE IMMEDIATE 'TRUNCATE TABLE node_roots';
001         1400       0.000       0.001        1 <anonymous>                 17358 ANONYMOUS BLOCK       3
001         1149       0.000       0.001        1 <anonymous>                 12529 ANONYMOUS BLOCK       3
001         1132       0.000       0.000    19641 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY          9 l_ins_tot     PLS_INTEGER := 0;
001          914       0.000       0.000        1 <anonymous>                 15474 ANONYMOUS BLOCK       3
...
000            0       0.000       0.000        1 <anonymous>                   929 ANONYMOUS BLOCK       5
000            0       0.000       0.000        1 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         87 WHEN NO_DATA_FOUND THEN
000            0       0.000       0.000        1 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         88 l_root_id := NULL;
000            0       0.000       0.000    19642 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         82 LOOP
000            0       0.000       0.000    31374 SHORTEST_PATH_SQL_BASE          2 PACKAGE BODY         13 LOOP

157179 rows selected.

This shows similar information to the previous query, but ordered by seconds descending, highlighting the most intensive lines.

Hierarchical Profiler

The hierarchical profiler (DBMS_HProf) provides elapsed time and call information hierarchically by function, with SQL statements treated as functions.

Calling Hierarchical Profiler

We can call the hierarchical profiler to profile the call to Ins_Node_Roots as follows, where HProf_Utils is a custom wrapper package, with a custom script, hprof_queries.sql, to query back the results from the profiler tables:

VAR RUN_ID NUMBER
BEGIN
  HProf_Utils.Start_Profiling;
  Shortest_Path_SQL_Base.Ins_Node_Roots;
  :RUN_ID := HProf_Utils.Stop_Profiling(
    p_run_comment => 'Profile for Ins_Node_Roots',
    p_filename    => 'hp_ins_node_roots_&SUB..html');
END;
/
@..\..\hprof_queries :RUN_ID

The custom query script, hprof_queries.sql, queries the profiling tables mentioned in parentheses below.

Note that the hierarchical profiler also produces a set of reports in HTML format, in a directory on the database server, when a value is passed for p_filename.

Run header (DBMSHP_RUNS)

 Run Id Run Time                          Microsecs   Seconds Comment
------- ---------------------------- -------------- --------- ------------------------------------------------------------
     41 30-JUL-22 06.10.57.482000 PM     1668506614  1668.510 Profile for Ins_Node_Roots

This shows top level summary information, such as the total elapsed time taken.

Functions called (DBMSHP_FUNCTION_INFO)

Owner              Module                     Sy Function                            Line# Subtree MicroS Function MicroS   Calls
------------------ ------------------------- --- ----------------------------------- ----- -------------- --------------- -------
LIB                HPROF_UTILS                 3 STOP_PROFILING                         61             22              22       1
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE      4 INS_MIN_TREE_LINKS                      3      169705305         1195428   19641
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE      5 INS_NODE_ROOTS                         75     1668506464          332444       1
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE     10 __dyn_sql_exec_line11                  11       20610292         9775994   19641
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE     11 __dyn_sql_exec_line81                  81          76008           75449       1
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE     12 __static_sql_exec_line12               12        4413730         4413730   19641
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE     13 __static_sql_exec_line15               15      141321652       141321652   31378
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE     14 __static_sql_exec_line31               31        2164203         2164203   31378
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE     15 __static_sql_exec_line85               85     1490685875      1490685875   19642
SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE     16 __static_sql_exec_line93               93        7706832         7706832   19641
SYS                DBMS_HPROF                  6 STOP_PROFILING                        747              0               0       1
SYS                DBMS_HPROF                 17 __static_sql_exec_line700             700            128             128       1
SYS                DICTIONARY_OBJ_NAME         7 DICTIONARY_OBJ_NAME                     1          45323           45323   39284
SYS                DICTIONARY_OBJ_OWNER        8 DICTIONARY_OBJ_OWNER                    1         812866          812866   39284
SYS                IS_VPD_ENABLED              9 IS_VPD_ENABLED                          1        6934407          395925   39284
SYS                IS_VPD_ENABLED             18 __static_sql_exec_line22               22        6538482         6538482   39284
                                               1 __anonymous_block                       0       10763826         2971230   19642
                                               2 __plsql_vm                              0       10834857           71031   19642

This shows timing and call information by function, which includes dynamic and static SQL statements, as well as our own package program units.

Line# 85, __static_sql_exec_line85 is the root node selector and has a function time of 1,491 seconds, which is included in the subtree time of 1,669 seconds for the function INS_NODE_ROOTS.

BPF Recursive Subquery Factor Tree Query (DBMSHP_PARENT_CHILD_INFO, DBMSHP_FUNCTION_INFO)

Function tree                                                 Sy Owner              Module                    Inst.  Subtree MicroS Function MicroS   Calls Row
------------------------------------------------------------ --- ------------------ ------------------------- ------ -------------- --------------- ------- ---
INS_NODE_ROOTS                                                 5 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE               1668506464          332444       1   1
  __static_sql_exec_line85                                    15 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE               1490685875      1490685875   19642   2
  INS_MIN_TREE_LINKS                                           4 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE                169705305         1195428   19641   3
    __static_sql_exec_line15                                  13 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE                141321652       141321652   31378   4
    __dyn_sql_exec_line11                                     10 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE                 20610292         9775994   19641   5
      __plsql_vm                                               2                                              1 of 2       10834298           71027   19641   6
        __anonymous_block                                      1                                              1 of 2       10763826         2971230   19642   7
          IS_VPD_ENABLED                                       9 SYS                IS_VPD_ENABLED            1 of 2        6934407          395925   39284   8
            __static_sql_exec_line22                          18 SYS                IS_VPD_ENABLED            1 of 2        6538482         6538482   39284   9
          DICTIONARY_OBJ_OWNER                                 8 SYS                DICTIONARY_OBJ_OWNER      1 of 2         812866          812866   39284  10
          DICTIONARY_OBJ_NAME                                  7 SYS                DICTIONARY_OBJ_NAME       1 of 2          45323           45323   39284  11
    __static_sql_exec_line12                                  12 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE                  4413730         4413730   19641  12
    __static_sql_exec_line31                                  14 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE                  2164203         2164203   31378  13
  __static_sql_exec_line93                                    16 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE                  7706832         7706832   19641  14
  __dyn_sql_exec_line81                                       11 SHORTEST_PATH_SQL  SHORTEST_PATH_SQL_BASE                    76008           75449       1  15
    __plsql_vm                                                 2                                              2 of 2            559               4       1  16
__static_sql_exec_line700                                     17 SYS                DBMS_HPROF                                  128             128       1  22
STOP_PROFILING                                                 3 LIB                HPROF_UTILS                                  22              22       1  23
  STOP_PROFILING                                               6 SYS                DBMS_HPROF                                    0               0       1  24

This shows the hierarchy of funtion calls (including SQL statements as functions), and shows times used by function calls by parent function.

It shows INS_NODE_ROOTS using 1,669 seconds in total, with 1,491 seconds used by the function __static_sql_exec_line85, with an additional 170 seconds used in the call to INS_MIN_TREE_LINKS, and most of the remaining 8 seconds used by the function __static_sql_exec_line93, which inserts from min_tree_links into node_roots.

HTML Reports File

↑ Oracle Profilers for Subnetwork Grouper

The hierarchical profiler produces an HTML file on the database server (here called: hp_ins_node_roots_only_tv_v.html) which includes several reports displayed as HTML tables, the first of which is shown in the screenshot below:

Notes on the Standard Profilers

The profilers allow us to quickly highlight hotspots within our PL/SQL code at line or function level without adding any code to the package profiled.

They don’t provide domain-specific breakdowns of resource usage such as time spent on subnetworks of different sizes, as custom code timing can.

Comparison with Net_Pipe Array-Based Package

↑ Subnetwork Grouper
↓ Net_Pipe Code - Calling SQL and Main Block
↓ Net_Pipe Results with Timing - Brightkite Dataset
↓ Run Statistics for Example Datasets - Ins_Node_Roots vs Net_Pipe

In May 2015 I posted an article, PL/SQL Pipelined Function for Network Analysis, describing a pipelined function that returns the structure of each subnetwork, along with subnetwork identifiers.

This was an array-based solution that explored the subnetworks using depth-first recursion in PL/SQL, within a higher-level algorithm that selected new root nodes. I had found this useful for listing database foreign key network structures, but being array-based it is obviously limited in the size of networks it can be used for.

The depth-first recursion produces paths that are typically much longer than the shortest paths.

Net_Pipe Code - Calling SQL and Main Block

↑ Comparison with Net_Pipe Array-Based Package

Call to Net_Pipe for Network Summary

PROMPT Network summary 2 - grouped by numbers of nodes
WITH network_counts AS (
SELECT root_node_id,
       Count(DISTINCT node_id) n_nodes
  FROM TABLE(Net_Pipe.All_Nets)
 GROUP BY root_node_id
)
SELECT n_nodes "#Nodes",
       COUNT(*) "#Networks"
  FROM network_counts
 GROUP BY n_nodes
 ORDER BY 1
/

Net_Pipe Main Block

BEGIN
  g_line_no := 0;
  FOR r_nod IN (SELECT Trim(node_id_fr) root_node_id FROM links_v
                 UNION
                SELECT Trim(node_id_to) FROM links_v) LOOP
    IF g_line_no = 0 THEN
      Timer_Set.Increment_Time(g_ts_id, 'r_nod cursor initial');
    ELSE
      Timer_Set.Increment_Time(g_ts_id, 'r_nod cursor remainder');
    END IF;
    IF g_node_hash.EXISTS(r_nod.root_node_id) THEN CONTINUE; END IF; -- skip if in prior subnetwork
    init_Net;
    Timer_Set.Increment_Time(g_ts_id, 'init_Net');
    expand_Node(p_root_node_id  => r_nod.root_node_id,
                p_dirn          => ' ',
                p_node_id       => r_nod.root_node_id,
                p_link_id       => ROOT_LINK_ID,
                p_node_level    => 0,
                p_is_loop       => FALSE);
    Timer_Set.Increment_Time(g_ts_id, 'expand_Node');
    FOR i IN 1..g_net_tab.COUNT LOOP
      PIPE ROW (g_net_tab(i));
    END LOOP;
    Timer_Set.Increment_Time(g_ts_id, 'PIPE ROW');
  END LOOP;
  Utils.W(Timer_Set.Format_Results(g_ts_id));
END All_Nets;

Net_Pipe Results with Timing - Brightkite Dataset

↑ Comparison with Net_Pipe Array-Based Package

Network summary 2 - grouped by numbers of nodes

 #Nodes  #Networks
------- ----------
      2        362
      3        103
      4         40
      5         21
      6          9
      7          3
      8          2
      9          1
     10          2
     11          2
     49          1
  56739          1

12 rows selected.

Timer Set: All_Nets, Constructed at 03 Aug 2022 07:25:25, written at 07:28:00
=============================================================================
Timer                      Elapsed         CPU       Calls       Ela/Call       CPU/Call
----------------------  ----------  ----------  ----------  -------------  -------------
r_nod cursor initial          0.13        0.12           1        0.12700        0.12000
init_Net                      0.07        0.08         547        0.00012        0.00015
expand_Node                 153.80      152.95         547        0.28116        0.27962
PIPE ROW                      0.21        0.21         547        0.00039        0.00038
r_nod cursor remainder        0.87        0.89       58227        0.00001        0.00002
(Other)                       0.00        0.00           1        0.00000        0.00000
----------------------  ----------  ----------  ----------  -------------  -------------
Total                       155.07      154.25       59870        0.00259        0.00258
----------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.02590, CPU: 0.02564]

File: run_net_pipe_sum_brightkite.log

The results show 547 subnetworks, the same as shown by the Ins_Node_Roots procedure.

Code timing shows that almost all time is used in the recursive function expand_Node. In particular, the root node selection sections (r_nod cursor…) are not a significant contributor, unlike for the base version of Ins_Node_Roots where they take about half the time.

Run Statistics for Example Datasets - Ins_Node_Roots vs Net_Pipe

↑ Comparison with Net_Pipe Array-Based Package

Dataset	#Nodes	#Links	#Subnets	#Maxlev	Ins_Node_Roots(s)	Net_Pipe(s)
three_subnets	14	13	3	3	0.07	0.1
foreign_keys	289	319	43	5	0.2	0.1
brightkite	58,228	214,078	547	10	7	155
bacon/small	161	3,342	1	5	0.1	0.1
bacon/top250	12,466	583,993	15	6	1.9	23
bacon/pre1950	134,131	8,095,294	2,432	13	85	NA

Net_Pipe fails on the last example, with error:

  FROM TABLE(Net_Pipe.All_Nets) t
             *
ERROR at line 5:
ORA-00028: your session has been killed

This error arises from the PGA memory limit being exceeded by the arrays used.

The base log files are of form: ins_node_roots_base_ts_.log (Ins_Node_Roots, and : run_net_pipe_sum_.log (Net_Pipe), where * represents the dataset code (excluding the bacon prefix).

The comparison shows that the Ins_Node_Roots procedure is much faster than Net_Pipe once the network size gets larger, and Net_Pipe is impractical for networks of the size of bacon/pre1950

Tuning Subnetwork Grouper

↑ Contents
↓ SQL for Isolated Nodes
↓ SQL for Isolated Links
↓ SQL for Root Node Selector
↓ Results for Tuned Subnetwork Grouper

In this section we take account of the code timing and profiling results, and look to tune the areas consuming the most time.

First, we look at ways of inserting isolated nodes into the min_tree_links table in advance of the main algorithm by a single SQL statement.

Second, we do the same for isolated links, and third, we try to find alternative methods for selecting the root node for each subnetwork, which was by far the most time-consuming statement.

SQL for Isolated Nodes

↑ Tuning Subnetwork Grouper
↓ SQL 1 - Not Exists / Or
↓ SQL 2 - Outer Joins Unhinted
↓ SQL 3 - Outer Joins Hinted

As single node subnetworks are of special, very simple structure it’s likely to be more efficient to handle these separately ahead of the main algorithm.

SQL 1 - Not Exists / Or

↑ SQL for Isolated Nodes

Logically, these subnetworks consist of all the nodes that are present only in the nodes table but not in the links table. This can be expressed in a single SQL statement for the insert:

INSERT INTO node_roots
SELECT nod.id, nod.id, 0
  FROM nodes nod
 WHERE NOT EXISTS (SELECT 1
                     FROM links lnk
                    WHERE lnk.node_id_fr = nod.id OR lnk.node_id_to = nod.id);

If we run this on the bacon/only_tv_v dataset (744,374 nodes and 22,503,060 links), we get the following execution plan:

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:25.78 |     191K|  93127 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL | NODE_ROOTS   |      1 |        |      0 |00:00:25.78 |     191K|  93127 |       |       |          |
|*  2 |   HASH JOIN ANTI         |              |      1 |  53174 |   8659 |00:00:25.73 |     176K|  93122 |    37M|  6400K|   30M (0)|
|   3 |    INDEX FAST FULL SCAN  | SYS_C0018310 |      1 |    744K|    744K|00:00:00.08 |    1461 |      0 |       |       |          |
|   4 |    VIEW                  | VW_SQ_1      |      1 |     45M|     45M|00:00:22.86 |     174K|  93122 |       |       |          |
|   5 |     UNION-ALL            |              |      1 |        |     45M|00:00:15.93 |     174K|  93122 |       |       |          |
|   6 |      TABLE ACCESS FULL   | LINKS        |      1 |     22M|     22M|00:00:02.61 |   87315 |  46561 |       |       |          |
|   7 |      TABLE ACCESS FULL   | LINKS        |      1 |     22M|     22M|00:00:02.19 |   87315 |  46561 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("VW_COL_1"="NOD"."ID")

File: ins_isolated_nodes_links_only_tv_v_15.log

The CBO transforms the OR condition into a UNION ALL, and uses this as the probe input to a hash join (anti) with the unique index on nodes as the hash table. It obtains the 8,659 isolated nodes in 21 seconds.

SQL 2 - Outer Joins Unhinted

↑ SQL for Isolated Nodes

As is well known a simple NOT EXISTS condition that a master record does not have any correlated record in a linked table can generally be transformed into an outer join of the linked table, with a condition that the outer join fails. In this case, where we have an OR condition we can treat it as two NOT EXISTS, like this:

INSERT INTO node_roots
SELECT nod.id, nod.id, 0
  FROM nodes nod
  LEFT JOIN links lnk_f
    ON lnk_f.node_id_fr = nod.id
  LEFT JOIN links lnk_t
    ON lnk_t.node_id_to = nod.id
 WHERE lnk_f.node_id_fr IS NULL
   AND lnk_t.node_id_fr IS NULL;

Running this on the same dataset, we get the following execution plan:

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:12.48 |     191K|  93127 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL | NODE_ROOTS   |      1 |        |      0 |00:00:12.48 |     191K|  93127 |       |       |          |
|*  2 |   HASH JOIN ANTI         |              |      1 |    532 |   8659 |00:00:12.43 |     176K|  93122 |  4599K|  3201K| 3726K (0)|
|*  3 |    HASH JOIN ANTI        |              |      1 |  53174 |  57851 |00:00:08.41 |   88776 |  46561 |    37M|  6400K|   28M (0)|
|   4 |     INDEX FAST FULL SCAN | SYS_C0018310 |      1 |    744K|    744K|00:00:00.07 |    1461 |      0 |       |       |          |
|   5 |     TABLE ACCESS FULL    | LINKS        |      1 |     22M|     22M|00:00:02.58 |   87315 |  46561 |       |       |          |
|   6 |    TABLE ACCESS FULL     | LINKS        |      1 |     22M|     22M|00:00:01.83 |   87315 |  46561 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("LNK_T"."NODE_ID_TO"="NOD"."ID")
3 - access("LNK_F"."NODE_ID_FR"="NOD"."ID")

File: ins_isolated_nodes_links_only_tv_v_25.log

Now the CBO uses a full scan of the links table as the probe input to a hash join (anti) with an index scan of the unique index on nodes as the hash table, at the inner level. It then uses that as the hash table for another hash join (anti) with another full scan of the links table as the probe input.

It obtains the 8,659 isolated nodes in 12 seconds, which is a good deal faster than the first version. However, I wondered how this would compare with a plan using nested loop joins.

SQL 3 - Outer Joins Hinted

↑ SQL for Isolated Nodes

In this version we add hints to test whether nested loop joins might be more efficient:

INSERT INTO node_roots
SELECT /*+  gather_plan_statistics XPLAN_ISO_N USE_NL (lnk_f) USE_NL (lnk_t) */
       nod.id, nod.id, 0
  FROM nodes nod
  LEFT JOIN links lnk_f
    ON lnk_f.node_id_fr = nod.id
  LEFT JOIN links lnk_t
    ON lnk_t.node_id_to = nod.id
 WHERE lnk_f.node_id_fr IS NULL
   AND lnk_t.node_id_to IS NULL;

Running this on the same dataset, we get the following execution plan:

------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:01.27 |     624K|      5 |
|   1 |  LOAD TABLE CONVENTIONAL | NODE_ROOTS   |      1 |        |      0 |00:00:01.27 |     624K|      5 |
|   2 |   NESTED LOOPS ANTI      |              |      1 |    532 |   8659 |00:00:00.89 |     622K|      0 |
|   3 |    NESTED LOOPS ANTI     |              |      1 |  53174 |  57851 |00:00:01.04 |     506K|      0 |
|   4 |     INDEX FAST FULL SCAN | SYS_C0018310 |      1 |    744K|    744K|00:00:00.11 |    1461 |      0 |
|*  5 |     INDEX RANGE SCAN     | LINKS_FR_N1  |    744K|     20M|    686K|00:00:00.83 |     505K|      0 |
|*  6 |    INDEX RANGE SCAN      | LINKS_TO_N1  |  57851 |     22M|  49192 |00:00:00.15 |     115K|      0 |
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("LNK_F"."NODE_ID_FR"="NOD"."ID")
6 - access("LNK_T"."NODE_ID_TO"="NOD"."ID")

File: ins_isolated_nodes_links_only_tv_v_35.log

It obtains the 8,659 isolated nodes in 1.5 seconds, so this is the most efficient by quite a lot. We see that the query uses only indexes without table accesses. We note also that the estimated rows for the two range scans are about 29 and 447 times the actual rows returned.

It is interesting to consider the E-Rows and A-Rows here in some detail. I have added my own notes in place of the last three columns in the table below.

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows | Notes                                                           |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |                                                                 |
|   1 |  LOAD TABLE CONVENTIONAL | NODE_ROOTS   |      1 |        |      0 |                                                                 |
|   2 |   NESTED LOOPS ANTI      |              |      1 |    532 |   8659 | A-Rows = Starts(6) - A-Rows(6) ~ final result set cardinality   |
|   3 |    NESTED LOOPS ANTI     |              |      1 |  53174 |  57851 | A-Rows ~ A-Rows(4) - A-Rows(5) ~ #rows failing the from join    |
|   4 |     INDEX FAST FULL SCAN | SYS_C0018310 |      1 |    744K|    744K| E-Rows, A-Rows ~ #nodes                                         |
|*  5 |     INDEX RANGE SCAN     | LINKS_FR_N1  |    744K|     20M|    686K| Starts ~ #nodes; E-Rows ~ 0.91 x #links; A-Rows ~ 0.92 x #nodes |
|*  6 |    INDEX RANGE SCAN      | LINKS_TO_N1  |  57851 |     22M|  49192 | Starts = A-Rows(3); E-Rows ~ #links; A-Rows ~ 0.85 x Starts     |
----------------------------------------------------------------------------------------------------------------------------------------------

It seems clear from the numbers that the E-Rows and A-Rows in steps 5 and 6 are estimated and actual counts of different things. E-Rows is estimating the total number of links records matching the join condition for the input node set, while the A-Rows is the number of nodes matching it. The A-Rows in steps 2 and 3 are then the differences between the numbers of input nodes and the numbers of join-matching nodes.

It is almost as though the SQL engine is smart enough to know that, in the context of the anti-join, there is no point in bringing back all the joining records when these will all be eliminated later, but that the CBO is not, and chooses a bad plan, when unhinted, for that reason.

SQL for Isolated Links

↑ Tuning Subnetwork Grouper
↓ SQL 1 - Not Exists / 3 Ors
↓ SQL 2 - Four Not Exists Subqueries
↓ SQL 3 - Four Outer Joins
↓ SQL 4 - Four Outer Joins with 3 Nested Loop Hints
↓ SQL 5 - Group Counting

As isolated link (or two-node) subnetworks are of special, very simple structure it’s likely to be more efficient to handle these separately ahead of the main algorithm.

SQL 1 - Not Exists / 3 Ors

Logically, these subnetworks consist of all the links that do not connect to any other links. In other words a link is isolated if:

its from node is neither a from nor a to node in any other link, and
its to node is neither a from nor a to node in any other link

This can be expressed in a single SQL statement for the insert, with a driving instance of the links table and a NOT EXISTS subquery expressing the conditions above against an additional instance:

INSERT INTO node_roots
  WITH isolated_links AS (
    SELECT lnk.node_id_fr, lnk.node_id_to
      FROM links lnk
    WHERE NOT EXISTS (
                SELECT 1
                  FROM links lnk_1
                 WHERE (lnk_1.node_id_fr = lnk.node_id_to OR 
                        lnk_1.node_id_to = lnk.node_id_fr OR 
                        lnk_1.node_id_fr = lnk.node_id_fr OR 
                        lnk_1.node_id_to = lnk.node_id_to)
                   AND lnk_1.ROWID != lnk.ROWID
          )
  )
  SELECT node_id_fr, node_id_fr, 0
    FROM isolated_links
  UNION
  SELECT node_id_to, node_id_fr, 1
    FROM isolated_links;

If we run this on the bacon/pre1950 dataset (134,131 nodes and 8,095,294 links), we get the following execution plan:

------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                         |                            |      1 |        |      0 |00:41:31.41 |     113M|       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |      0 |00:41:31.41 |     113M|       |       |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C4F_4F65443 |      1 |        |      0 |00:41:31.37 |     113M|  1024 |  1024 |          |
|*  3 |    FILTER                                |                            |      1 |        |    425 |01:20:08.08 |     113M|       |       |          |
|   4 |     TABLE ACCESS FULL                    | LINKS                      |      1 |   8095K|   8095K|00:00:01.57 |   16747 |       |       |          |
|   5 |     TABLE ACCESS BY INDEX ROWID BATCHED  | LINKS                      |   8095K|      1 |   8094K|01:08:05.02 |     113M|       |       |          |
|*  6 |      BITMAP CONVERSION TO ROWIDS         |                            |   8095K|        |   8094K|01:07:45.55 |     105M|       |       |          |
|   7 |       BITMAP OR                          |                            |   8095K|        |   8094K|01:07:35.34 |     105M|       |       |          |
|*  8 |        BITMAP CONVERSION FROM ROWIDS     |                            |   8095K|        |   7978K|00:09:42.09 |      23M|       |       |          |
|*  9 |         INDEX RANGE SCAN                 | LINKS_TO_N1                |   8095K|        |   3076M|00:08:56.46 |      23M|       |       |          |
|* 10 |        BITMAP CONVERSION FROM ROWIDS     |                            |   8095K|        |   8086K|00:17:19.09 |      29M|       |       |          |
|* 11 |         INDEX RANGE SCAN                 | LINKS_TO_N1                |   8095K|        |   5926M|00:16:17.18 |      29M|       |       |          |
|* 12 |        BITMAP CONVERSION FROM ROWIDS     |                            |   8095K|        |   7974K|00:09:27.91 |      22M|       |       |          |
|* 13 |         INDEX RANGE SCAN                 | LINKS_FR_N1                |   8095K|        |   3076M|00:08:41.13 |      22M|       |       |          |
|* 14 |        BITMAP CONVERSION FROM ROWIDS     |                            |   8095K|        |   8086K|00:19:00.44 |      30M|       |       |          |
|* 15 |         INDEX RANGE SCAN                 | LINKS_FR_N1                |   8095K|        |   6232M|00:17:21.29 |      30M|       |       |          |
|  16 |   LOAD TABLE CONVENTIONAL                | NODE_ROOTS                 |      1 |        |      0 |00:00:00.04 |    1498 |       |       |          |
|  17 |    HASH UNIQUE                           |                            |      1 |     16M|    850 |00:00:00.03 |       0 |  1485K|  1485K|   81M (0)|
|  18 |     UNION-ALL                            |                            |      1 |        |    850 |00:00:00.01 |       0 |       |       |          |
|  19 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       0 |       |       |          |
|  20 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C4F_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       0 |       |       |          |
|  21 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       0 |       |       |          |
|  22 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C4F_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter( IS NULL)
6 - filter("LNK_1".ROWID<>:B1)
8 - filter("LNK_1".ROWID<>:B1)
9 - access("LNK_1"."NODE_ID_TO"=:B1)
filter("LNK_1".ROWID<>:B1)
10 - filter("LNK_1".ROWID<>:B1)
11 - access("LNK_1"."NODE_ID_TO"=:B1)
filter("LNK_1".ROWID<>:B1)
12 - filter("LNK_1".ROWID<>:B1)
13 - access("LNK_1"."NODE_ID_FR"=:B1)
filter("LNK_1".ROWID<>:B1)
14 - filter("LNK_1".ROWID<>:B1)
15 - access("LNK_1"."NODE_ID_FR"=:B1)
filter("LNK_1".ROWID<>:B1)

File: ins_isolated_nodes_links_pre1950_11.log

The CBO transforms the OR conditions into a 4-section BITMAP OR, followed by a BITMAP CONVERSION TO ROWIDS and a links table access to filter the driving instance.

There are 425 isolated links in this dataset and the result is obtained in 4,103 seconds. This is very slow, and pre1950 is far from the largest of the Bacon datasets. We will try different query formulations to see if we can improve on it. Later we’ll run the best against bacon/only_tv_v as for the isolated nodes queries.

SQL 2 - Four Not Exists Subqueries

Here we split the conditions into four subqueries:

INSERT INTO node_roots
WITH isolated_links AS (
  SELECT lnk.node_id_fr, lnk.node_id_to
    FROM links lnk
  WHERE NOT EXISTS (
              SELECT 1
                FROM links lnk_1
               WHERE lnk_1.node_id_fr = lnk.node_id_fr
                 AND lnk_1.ROWID != lnk.ROWID
        )
    AND NOT EXISTS (
              SELECT 1
                FROM links lnk_2
               WHERE lnk_2.node_id_to = lnk.node_id_to
                 AND lnk_2.ROWID != lnk.ROWID
        )
    AND NOT EXISTS (
              SELECT 1
                FROM links lnk_3
               WHERE (lnk_3.node_id_fr = lnk.node_id_to)
                 AND lnk_3.ROWID != lnk.ROWID
        )
    AND NOT EXISTS (
              SELECT 1
                FROM links lnk_4
               WHERE (lnk_4.node_id_to = lnk.node_id_fr)
                 AND lnk_4.ROWID != lnk.ROWID
        )
)
SELECT node_id_fr, node_id_fr, 0
  FROM isolated_links
 UNION
SELECT node_id_to, node_id_fr, 1
  FROM isolated_links;

Running this on the same dataset, we get the following execution plan:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                         |                            |      1 |        |      0 |00:00:12.78 |   85226 |    134K|    134K|       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |      0 |00:00:12.78 |   85226 |    134K|    134K|       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C1D_4F65443 |      1 |        |      0 |00:00:12.77 |   83733 |    134K|    134K|  1042K|  1042K|          |         |
|*  3 |    HASH JOIN RIGHT ANTI                  |                            |      1 |   8095K|    425 |00:00:13.60 |   83730 |    134K|    134K|   469M|    16M|   16M (1)|     211M|
|   4 |     TABLE ACCESS FULL                    | LINKS                      |      1 |   8095K|   8095K|00:00:00.69 |   16746 |      0 |      0 |       |       |          |         |
|*  5 |     HASH JOIN RIGHT ANTI                 |                            |      1 |   8095K|    484 |00:00:09.94 |   66984 |    108K|    109K|   469M|    16M|   16M (1)|     211M|
|   6 |      TABLE ACCESS FULL                   | LINKS                      |      1 |   8095K|   8095K|00:00:00.59 |   16746 |      0 |      0 |       |       |          |         |
|*  7 |      HASH JOIN RIGHT ANTI                |                            |      1 |   8095K|   4196 |00:00:05.59 |   50238 |  82305 |  82305 |   469M|    16M|   16M (1)|     211M|
|   8 |       TABLE ACCESS FULL                  | LINKS                      |      1 |   8095K|   8095K|00:00:00.58 |   16746 |      0 |      0 |       |       |          |         |
|*  9 |       HASH JOIN ANTI                     |                            |      1 |   8095K|    116K|00:00:04.99 |   33492 |  56203 |  56203 |   522M|    29M|   23M (1)|     454M|
|  10 |        TABLE ACCESS FULL                 | LINKS                      |      1 |   8095K|   8095K|00:00:00.59 |   16746 |      0 |      0 |       |       |          |         |
|  11 |        TABLE ACCESS FULL                 | LINKS                      |      1 |   8095K|   8095K|00:00:00.55 |   16746 |      0 |      0 |       |       |          |         |
|  12 |   LOAD TABLE CONVENTIONAL                | NODE_ROOTS                 |      1 |        |      0 |00:00:00.02 |    1489 |      3 |      0 |       |       |          |         |
|  13 |    HASH UNIQUE                           |                            |      1 |     16M|    850 |00:00:00.01 |       7 |      3 |      0 |  1485K|  1485K|   81M (0)|         |
|  14 |     UNION-ALL                            |                            |      1 |        |    850 |00:00:00.01 |       7 |      3 |      0 |       |       |          |         |
|  15 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       5 |      1 |      0 |       |       |          |         |
|  16 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C1D_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       5 |      1 |      0 |       |       |          |         |
|  17 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       2 |      2 |      0 |       |       |          |         |
|  18 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C1D_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       2 |      2 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("LNK_1"."NODE_ID_FR"="LNK"."NODE_ID_FR")
filter("LNK_1".ROWID<>"LNK".ROWID)
5 - access("LNK_2"."NODE_ID_TO"="LNK"."NODE_ID_TO")
filter("LNK_2".ROWID<>"LNK".ROWID)
7 - access("LNK_3"."NODE_ID_FR"="LNK"."NODE_ID_TO")
filter("LNK_3".ROWID<>"LNK".ROWID)
9 - access("LNK_4"."NODE_ID_TO"="LNK"."NODE_ID_FR")
filter("LNK_4".ROWID<>"LNK".ROWID)

File: ins_isolated_nodes_links_pre1950_12.log

The result is obtained in 20 seconds, a big improvement.

The plan shows that the NOT EXISTS subqueries are implemented as a sequence of hash joins. A full scan of the links table is used for the hash table in each case, with the driving rowset being the result from the previous join after the first one (step 9).

The ANTI keyword indicates an antijoin in which the result set consists of the records in the input rowset that fail to match the hash table.

It is notable that the E-Rows between steps 3 and 11 are all equal to the links table cardinality, while the A-Rows decrease as the sequence progresses (upwards), reaching 425 at step 3.

One might surmise that, with the A-Rows down to 116K in step 9 after the first hash join, nested loops might be favoured thereafter. Let’s start by manually translating the NOT EXISTS subqueries into outer joins.

SQL 3 - Four Outer Joins

Here we replace the four subqueries with outer joins with WHERE conditions that specify join fails:

INSERT INTO node_roots
WITH isolated_links AS (
  SELECT lnk.node_id_fr, lnk.node_id_to
    FROM links lnk
    LEFT JOIN links lnk_1
        ON (lnk_1.node_id_fr = lnk.node_id_fr AND lnk_1.ROWID != lnk.ROWID)
    LEFT JOIN links lnk_2
        ON (lnk_2.node_id_fr = lnk.node_id_to AND lnk_2.ROWID != lnk.ROWID)
    LEFT JOIN links lnk_3
        ON (lnk_3.node_id_to = lnk.node_id_fr AND lnk_3.ROWID != lnk.ROWID)
    LEFT JOIN links lnk_4
        ON (lnk_4.node_id_to = lnk.node_id_to AND lnk_4.ROWID != lnk.ROWID)
  WHERE lnk_1.node_id_fr IS NULL
    AND lnk_2.node_id_fr IS NULL
    AND lnk_3.node_id_to IS NULL
    AND lnk_4.node_id_to IS NULL
)
SELECT node_id_fr, node_id_fr, 0
  FROM isolated_links
 UNION
SELECT node_id_to, node_id_fr, 1
  FROM isolated_links;

Running this on the same dataset, we get the following execution plan:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                         |                            |      1 |        |      0 |00:17:46.54 |   85247 |    134K|    134K|       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |      0 |00:17:46.54 |   85247 |    134K|    134K|       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C1E_4F65443 |      1 |        |      0 |00:17:46.52 |   83733 |    134K|    134K|  1042K|  1042K|          |         |
|*  3 |    FILTER                                |                            |      1 |        |    425 |00:17:46.11 |   83730 |    134K|    134K|       |       |          |         |
|*  4 |     HASH JOIN RIGHT OUTER                |                            |      1 |   8095K|   1302 |00:17:46.21 |   83730 |    134K|    134K|   469M|    16M|   16M (1)|     211M|
|   5 |      TABLE ACCESS FULL                   | LINKS                      |      1 |   8095K|   8095K|00:00:00.73 |   16746 |      0 |      0 |       |       |          |         |
|*  6 |      FILTER                              |                            |      1 |        |    472 |00:15:41.39 |   66984 |    108K|    108K|       |       |          |         |
|*  7 |       HASH JOIN RIGHT OUTER              |                            |      1 |   8095K|  49224 |00:12:50.53 |   66984 |    108K|    108K|   469M|    16M|   16M (1)|     211M|
|   8 |        TABLE ACCESS FULL                 | LINKS                      |      1 |   8095K|   8095K|00:00:00.68 |   16746 |      0 |      0 |       |       |          |         |
|*  9 |        FILTER                            |                            |      1 |        |   3724 |00:15:31.22 |   50238 |  82274 |  82274 |       |       |          |         |
|* 10 |         HASH JOIN RIGHT OUTER            |                            |      1 |   8095K|    267K|00:15:16.91 |   50238 |  82274 |  82274 |   469M|    16M|   16M (1)|     211M|
|  11 |          TABLE ACCESS FULL               | LINKS                      |      1 |   8095K|   8095K|00:00:00.61 |   16746 |      0 |      0 |       |       |          |         |
|* 12 |          FILTER                          |                            |      1 |        |   8819 |00:14:59.94 |   33492 |  56172 |  56172 |       |       |          |         |
|* 13 |           HASH JOIN OUTER                |                            |      1 |   8095K|   6232M|00:18:35.29 |   33492 |  56172 |  56172 |   522M|    29M|   23M (1)|     453M|
|  14 |            TABLE ACCESS FULL             | LINKS                      |      1 |   8095K|   8095K|00:00:00.59 |   16746 |      0 |      0 |       |       |          |         |
|  15 |            TABLE ACCESS FULL             | LINKS                      |      1 |   8095K|   8095K|00:00:00.55 |   16746 |      0 |      0 |       |       |          |         |
|  16 |   LOAD TABLE CONVENTIONAL                | NODE_ROOTS                 |      1 |        |      0 |00:00:00.02 |    1510 |      3 |      0 |       |       |          |         |
|  17 |    HASH UNIQUE                           |                            |      1 |     16M|    850 |00:00:00.01 |       7 |      3 |      0 |  1485K|  1485K|   81M (0)|         |
|  18 |     UNION-ALL                            |                            |      1 |        |    850 |00:00:00.01 |       7 |      3 |      0 |       |       |          |         |
|  19 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       5 |      1 |      0 |       |       |          |         |
|  20 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C1E_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       5 |      1 |      0 |       |       |          |         |
|  21 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       2 |      2 |      0 |       |       |          |         |
|  22 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C1E_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       2 |      2 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("LNK_4"."NODE_ID_TO" IS NULL)
4 - access("LNK_4"."NODE_ID_TO"="LNK"."NODE_ID_TO")
filter("LNK_4".ROWID<>"LNK".ROWID)
6 - filter("LNK_3"."NODE_ID_TO" IS NULL)
7 - access("LNK_3"."NODE_ID_TO"="LNK"."NODE_ID_FR")
filter("LNK_3".ROWID<>"LNK".ROWID)
9 - filter("LNK_2"."NODE_ID_FR" IS NULL)
10 - access("LNK_2"."NODE_ID_FR"="LNK"."NODE_ID_TO")
filter("LNK_2".ROWID<>"LNK".ROWID)
12 - filter("LNK_1"."NODE_ID_FR" IS NULL)
13 - access("LNK_1"."NODE_ID_FR"="LNK"."NODE_ID_FR")
filter("LNK_1".ROWID<>"LNK".ROWID)

File: ins_isolated_nodes_links_pre1950_23.log

The result is obtained in 1,259 seconds, better than the first but much worse than the second.

At first sight, it seems surprising that it performs so much worse than the second method, since that also effectively converted the NOT EXISTS subqueries into outer joins. Reviewing the plan, we can see one important difference that might help to explain this: Where, in the earlier plan we had HASH JOIN ANTI / HASH JOIN RIGHT ANTI steps, these have been replaced by a pair of steps, HASH JOIN OUTER / HASH JOIN RIGHT OUTER followed by a FILTER step.

It seems that the plan requires the execution of the standard outer join before using the result set to filter out from the input rowset. For example, in step 13 we see an A-Rows of 6232M, while the A-Rows in the FILTER step, 12, is only 8819. By comparison, step 9, HASH JOIN ANTI, in the earlier plan had an A-Rows of only 116K.

In the isolated nodes query, the CBO was able to recognize the outer join with WHERE condition as an antijoin and avoid the expensive combination of equijoin plus FILTER step. In the slightly more complicated case for isolated links, it has failed to do so.

SQL 4 - Four Outer Joins with 3 Nested Loop Hints

We can force nested loop joins after the first hash join with the hint

	/*+ USE_NL (lnk_2) USE_NL (lnk_3) USE_NL (lnk_4) */

Running this on the same dataset, we get the following execution plan:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                         |                            |      1 |        |      0 |00:18:01.02 |   61806 |  53354 |  53321 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |      0 |00:18:01.02 |   61806 |  53354 |  53321 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C1F_4F65443 |      1 |        |      0 |00:18:01.00 |   60294 |  53351 |  53321 |  1042K|  1042K|          |         |
|*  3 |    FILTER                                |                            |      1 |        |    425 |00:07:17.60 |   60291 |  53351 |  53320 |       |       |          |         |
|   4 |     NESTED LOOPS OUTER                   |                            |      1 |   8095K|   1302 |00:06:08.63 |   60291 |  53351 |  53320 |       |       |          |         |
|*  5 |      FILTER                              |                            |      1 |        |    472 |00:24:39.93 |   59343 |  53351 |  53320 |       |       |          |         |
|   6 |       NESTED LOOPS OUTER                 |                            |      1 |   8095K|  49224 |00:17:06.17 |   59343 |  53351 |  53320 |       |       |          |         |
|*  7 |        FILTER                            |                            |      1 |        |   3724 |00:16:46.44 |   51774 |  53351 |  53320 |       |       |          |         |
|   8 |         NESTED LOOPS OUTER               |                            |      1 |   8095K|    267K|00:19:40.16 |   51774 |  53351 |  53320 |       |       |          |         |
|*  9 |          FILTER                          |                            |      1 |        |   8819 |00:14:03.84 |   33492 |  53351 |  53320 |       |       |          |         |
|* 10 |           HASH JOIN OUTER                |                            |      1 |   8095K|   6232M|00:19:02.80 |   33492 |  53351 |  53320 |   522M|    29M|   91M (1)|     431M|
|  11 |            TABLE ACCESS FULL             | LINKS                      |      1 |   8095K|   8095K|00:00:00.81 |   16746 |      0 |      0 |       |       |          |         |
|  12 |            TABLE ACCESS FULL             | LINKS                      |      1 |   8095K|   8095K|00:00:00.76 |   16746 |      0 |      0 |       |       |          |         |
|* 13 |          INDEX RANGE SCAN                | LINKS_FR_N1                |   8819 |      1 |    263K|00:00:00.27 |   18282 |      0 |      0 |       |       |          |         |
|* 14 |        INDEX RANGE SCAN                  | LINKS_TO_N1                |   3724 |      1 |  48752 |00:00:00.05 |    7569 |      0 |      0 |       |       |          |         |
|* 15 |      INDEX RANGE SCAN                    | LINKS_TO_N1                |    472 |      1 |    877 |00:00:00.01 |     948 |      0 |      0 |       |       |          |         |
|  16 |   LOAD TABLE CONVENTIONAL                | NODE_ROOTS                 |      1 |        |      0 |00:00:00.02 |    1508 |      3 |      0 |       |       |          |         |
|  17 |    HASH UNIQUE                           |                            |      1 |     16M|    850 |00:00:00.01 |       7 |      3 |      0 |  1485K|  1485K|   81M (0)|         |
|  18 |     UNION-ALL                            |                            |      1 |        |    850 |00:00:00.01 |       7 |      3 |      0 |       |       |          |         |
|  19 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       5 |      1 |      0 |       |       |          |         |
|  20 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C1F_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       5 |      1 |      0 |       |       |          |         |
|  21 |      VIEW                                |                            |      1 |   8095K|    425 |00:00:00.01 |       2 |      2 |      0 |       |       |          |         |
|  22 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6C1F_4F65443 |      1 |   8095K|    425 |00:00:00.01 |       2 |      2 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("LNK_4"."NODE_ID_TO" IS NULL)
5 - filter("LNK_3"."NODE_ID_TO" IS NULL)
7 - filter("LNK_2"."NODE_ID_FR" IS NULL)
9 - filter("LNK_1"."NODE_ID_FR" IS NULL)
10 - access("LNK_1"."NODE_ID_FR"="LNK"."NODE_ID_FR")
filter("LNK_1".ROWID<>"LNK".ROWID)
13 - access("LNK_2"."NODE_ID_FR"="LNK"."NODE_ID_TO")
filter("LNK_2".ROWID<>"LNK".ROWID)
14 - access("LNK_3"."NODE_ID_TO"="LNK"."NODE_ID_FR")
filter("LNK_3".ROWID<>"LNK".ROWID)
15 - access("LNK_4"."NODE_ID_TO"="LNK"."NODE_ID_TO")
filter("LNK_4".ROWID<>"LNK".ROWID)

File: ins_isolated_nodes_links_pre1950_34.log

The result is obtained in 1,243 seconds, only slightly better than the third.

Both the inner hash join and the three nested loop joins, have the separate FILTER step slowing performance.

Let’s take a step back, and see if we can rethink the logic behind isolated links.

SQL 5 - Group Counting

↑ Tuning Subnetwork Grouper
↓ Method 0: Select from Unused Nodes (Unordered)
↓ Method 1: Select from Unused Nodes (Minimum Id)
↓ Method 2: Select from Unused Nodes (Ordered by Id, ROWNUM = 1)
↓ Method 3: Fetch from Cursor (Ordered by Id), Check Unused

At the start of this section we expressed the logic for a link being isolated in two conditions:

its from node is neither a from nor a to node in any other link, and
its to node is neither a from nor a to node in any other link

The queries formulated to express these conditions in SQL involve self-joins on the large links table, with best results coming where the execution plan explicitly recognizes the join logic as being of antijoin nature.

Another way of defining an isolated link is by the following two conditions:

its from node appears in exactly one link
its to node appears in exactly one link

This can be expressed in a single SQL statement for the insert, with a subquery structure:

all_nodes gets all node instances with a type of F or T
unique_nodes selects from all_nodes the nodes having exactly one instance, along with its type
isolated_links selects all links and inner-joins them to unique_nodes on both ends
the main query is similar to the earlier queries, just adding both nodes with from node as root

SQL

INSERT INTO node_roots
WITH all_nodes AS (
  SELECT node_id_fr node_id, 'F' tp
    FROM links
 UNION ALL
  SELECT node_id_to, 'T'
    FROM links
), unique_nodes AS (
  SELECT node_id, Max(tp) tp
    FROM all_nodes
   GROUP BY node_id
  HAVING COUNT(*) = 1
), isolated_links AS (
  SELECT lnk.node_id_fr, lnk.node_id_to
    FROM links lnk
    JOIN unique_nodes frn ON frn.node_id = lnk.node_id_fr AND frn.tp = 'F'
    JOIN unique_nodes ton ON ton.node_id = lnk.node_id_to AND ton.tp = 'T'
)
SELECT node_id_fr, node_id_fr, 0
  FROM isolated_links
 UNION ALL
SELECT node_id_to, node_id_fr, 1
  FROM isolated_links;

Running this on the same dataset, we get the following execution plan:

------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                         |                            |      1 |        |      0 |00:00:01.83 |   37699 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |      0 |00:00:01.83 |   37699 |       |       |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C20_4F65443 |      1 |        |      0 |00:00:01.82 |   33492 |  1024 |  1024 |          |
|*  3 |    FILTER                                |                            |      1 |        |   1797 |00:00:01.91 |   33492 |       |       |          |
|   4 |     HASH GROUP BY                        |                            |      1 |     26 |    132K|00:00:01.82 |   33492 |  9400K|  3717K| 8895K (0)|
|   5 |      VIEW                                |                            |      1 |     16M|     16M|00:00:00.36 |   33492 |       |       |          |
|   6 |       UNION-ALL                          |                            |      1 |        |     16M|00:00:00.34 |   33492 |       |       |          |
|   7 |        TABLE ACCESS FULL                 | LINKS                      |      1 |   8095K|   8095K|00:00:00.16 |   16746 |       |       |          |
|   8 |        TABLE ACCESS FULL                 | LINKS                      |      1 |   8095K|   8095K|00:00:00.15 |   16746 |       |       |          |
|   9 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6C21_4F65443 |      1 |        |      0 |00:00:00.01 |    2690 |  1024 |  1024 |          |
|* 10 |    HASH JOIN                             |                            |      1 |      1 |    425 |00:00:00.01 |    2690 |  2078K|  2078K| 1539K (0)|
|* 11 |     VIEW                                 |                            |      1 |     26 |    901 |00:00:00.01 |       0 |       |       |          |
|  12 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6C20_4F65443 |      1 |     26 |   1797 |00:00:00.01 |       0 |       |       |          |
|  13 |     NESTED LOOPS                         |                            |      1 |   1685 |    896 |00:00:00.01 |    2690 |       |       |          |
|  14 |      NESTED LOOPS                        |                            |      1 |   1690 |    896 |00:00:00.01 |    1794 |       |       |          |
|* 15 |       VIEW                               |                            |      1 |     26 |    896 |00:00:00.01 |       0 |       |       |          |
|  16 |        TABLE ACCESS FULL                 | SYS_TEMP_0FD9D6C20_4F65443 |      1 |     26 |   1797 |00:00:00.01 |       0 |       |       |          |
|* 17 |       INDEX RANGE SCAN                   | LINKS_TO_N1                |    896 |     65 |    896 |00:00:00.01 |    1794 |       |       |          |
|  18 |      TABLE ACCESS BY INDEX ROWID         | LINKS                      |    896 |     65 |    896 |00:00:00.01 |     896 |       |       |          |
|  19 |   LOAD TABLE CONVENTIONAL                | NODE_ROOTS                 |      1 |        |      0 |00:00:00.01 |    1517 |       |       |          |
|  20 |    UNION-ALL                             |                            |      1 |        |    850 |00:00:00.01 |       0 |       |       |          |
|  21 |     VIEW                                 |                            |      1 |      1 |    425 |00:00:00.01 |       0 |       |       |          |
|  22 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6C21_4F65443 |      1 |      1 |    425 |00:00:00.01 |       0 |       |       |          |
|  23 |     VIEW                                 |                            |      1 |      1 |    425 |00:00:00.01 |       0 |       |       |          |
|  24 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6C21_4F65443 |      1 |      1 |    425 |00:00:00.01 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter((COUNT(*)=1 AND (MAX("TP")='F' OR MAX("TP")='T')))
10 - access("FRN"."NODE_ID"="LNK"."NODE_ID_FR")
11 - filter("FRN"."TP"='F')
15 - filter("TON"."TP"='T')
17 - access("TON"."NODE_ID"="LNK"."NODE_ID_TO")

File: ins_isolated_nodes_links_pre1950_35.log

The result is obtained in under 2 seconds, much faster than any other method.

Most of the time is spent in step 4, HASH GROUP BY, and the A-Rows value of 1,797 for step 3 is the cardinality of the unique_nodes subquery.

The rowset from unique_nodes is joined via nested loops at step 14 to the links to nodes using index LINKS_TO_N1, with 896 rows as output. The fromnodes for those links are then obtained at step 18 via table access from the index rowids, and used in the hash join at step 10, with 425 rows as output.

SQL for Root Node Selector

Now we look at alternative methods for selecting the next root node in the Subnetwork Grouper algorithm.

In the code timing section we noted that 90% of the time taken comes from the timer named SELECT id INTO l_root_id. The SQL corresponding to this timer is:

BEGIN
  SELECT id INTO l_root_id 
   FROM nodes 
  WHERE id NOT IN (SELECT node_id FROM node_roots) 
    AND ROWNUM = 1;
EXCEPTION
  WHEN NO_DATA_FOUND THEN 
    l_root_id := NULL;
END;

The query takes as root node the first node in the nodes table that is not in the node_roots table, and does not take the nodes in any particular order.

We would like to reduce the time taken here, and we might also prefer to take nodes in some specific order. We have tried three alternative approaches, and in this section we’ll give the code and execution plans for each, and list the timings obtained by each on the bacon/only_tv_v dataset.

Method 0: Select from Unused Nodes (Unordered)

This is the baseline method, where at the start and after each subnetwork is obtained, a new root node is selected for the next subnetwork, until all nodes are exhausted.

The method just selects the first one from nodes that are not in the root_nodes table, without ordering, using a ROWNUM = 1 condition.

Execution Plan

The execution plan comes from running the query in a separate script after the algorithm has completed, so that all nodes are in node_roots, and no new root is returned (file: ins_node_roots_xplans_only_tv_v.log).

Select Code

SELECT id INTO l_root_id 
  FROM nodes 
 WHERE id NOT IN (SELECT node_id FROM node_roots) 
   AND ROWNUM = 1;

Select Plan

---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |               |      1 |        |      0 |00:00:00.37 |   42508 |
|*  1 |  COUNT STOPKEY          |               |      1 |        |      0 |00:00:00.37 |   42508 |
|*  2 |   FILTER                |               |      1 |        |      0 |00:00:00.37 |   42508 |
|   3 |    NESTED LOOPS ANTI SNA|               |      1 |     20 |      0 |00:00:00.36 |   40791 |
|   4 |     INDEX FAST FULL SCAN| SYS_C0018310  |      1 |    520 |    744K|00:00:00.09 |    1460 |
|*  5 |     INDEX UNIQUE SCAN   | NODE_ROOTS_N1 |    744K|    714K|    744K|00:00:00.23 |   39331 |
|*  6 |    TABLE ACCESS FULL    | NODE_ROOTS    |      1 |      1 |      0 |00:00:00.01 |    1717 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
2 - filter( IS NULL)
5 - access("ID"="NODE_ID")
6 - filter("NODE_ID" IS NULL)

Code Timing Results

Here is an extract from the code timing results for dataset bacon/only_tv_v (file: ins_node_roots_only_tv_v_350.log), listing the root selection line and the total line:

Timer                                Elapsed         CPU       Calls       Ela/Call       CPU/Call
--------------------------------  ----------  ----------  ----------  -------------  -------------
.
SELECT id INTO l_root_id              302.95      299.26        7444        0.04070        0.04020
.
--------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                 524.41      507.86       22333        0.02348        0.02274

Elapsed times:

Root Selection	ms/Call	%Total	Non-Root Selection	%Total	Total
303	41	58	221	42	524

Method 1: Select from Unused Nodes (Minimum Id)

This method selects the minimum id from the unused nodes.

Execution Plan

Select Code

SELECT Min(id) INTO l_root_id 
  FROM nodes WHERE id NOT IN (SELECT node_id FROM node_roots);

Select Plan

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |              |      1 |        |      1 |00:00:00.50 |    3178 |       |       |          |
|   1 |  SORT AGGREGATE          |              |      1 |      1 |      1 |00:00:00.50 |    3178 |       |       |          |
|*  2 |   HASH JOIN RIGHT ANTI NA|              |      1 |  28678 |      0 |00:00:00.50 |    3178 |    37M|  6400K|   30M (0)|
|   3 |    TABLE ACCESS FULL     | NODE_ROOTS   |      1 |    715K|    744K|00:00:00.06 |    1717 |       |       |          |
|   4 |    INDEX FAST FULL SCAN  | SYS_C0018310 |      1 |    744K|    744K|00:00:00.08 |    1461 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"="NODE_ID")

Code Timing Results

Here is an extract from the code timing results for dataset bacon/only_tv_v (file: ins_node_roots_only_tv_v_351.log), listing the root selection line and the total line:

Timer                                Elapsed         CPU       Calls       Ela/Call       CPU/Call
--------------------------------  ----------  ----------  ----------  -------------  -------------
.
SELECT id INTO l_root_id             2046.22     2044.35        7444        0.27488        0.27463
.
--------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                2232.96     2224.79       22333        0.09998        0.09962

Elapsed times:

Root Selection	ms/Call	%Total	Non-Root Selection	%Total	Total
2,046	275	92	208	8	2,233

Method 2: Select from Unused Nodes (Ordered by Id, ROWNUM = 1)

This method orders the unused nodes by id in a subquery, then selects the first one using a ROWNUM = 1 condition in the main section.

Execution Plan

Select Code

SELECT id INTO l_root_id 
  FROM (SELECT id FROM nodes WHERE id NOT IN (
            SELECT node_id FROM node_roots) ORDER BY 1
       ) 
 WHERE ROWNUM = 1;

Select Plan

----------------------------------------------------------------------------------------------------
| Id  | Operation                | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |               |      1 |        |      0 |00:00:00.42 |   26499 |
|*  1 |  COUNT STOPKEY           |               |      1 |        |      0 |00:00:00.42 |   26499 |
|   2 |   VIEW                   |               |      1 |      1 |      0 |00:00:00.42 |   26499 |
|*  3 |    FILTER                |               |      1 |        |      0 |00:00:00.42 |   26499 |
|   4 |     NESTED LOOPS ANTI SNA|               |      1 |     20 |      0 |00:00:00.41 |   24782 |
|   5 |      INDEX FULL SCAN     | SYS_C0018310  |      1 |    744K|    744K|00:00:00.10 |    1398 |
|*  6 |      INDEX UNIQUE SCAN   | NODE_ROOTS_N1 |    744K|    688K|    744K|00:00:00.26 |   23384 |
|*  7 |     TABLE ACCESS FULL    | NODE_ROOTS    |      1 |      1 |      0 |00:00:00.01 |    1717 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
3 - filter( IS NULL)
6 - access("ID"="NODE_ID")
7 - filter("NODE_ID" IS NULL)

Code Timing Results

Here is an extract from the code timing results for dataset bacon/only_tv_v (file: ins_node_roots_only_tv_v_352.log), listing the root selection line and the total line:

Timer                                Elapsed         CPU       Calls       Ela/Call       CPU/Call
--------------------------------  ----------  ----------  ----------  -------------  -------------
.
SELECT id INTO l_root_id              289.40      283.07        7444        0.03888        0.03803
.
--------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                 482.44      475.33       22333        0.02160        0.02128

Elapsed times:

Root Selection	ms/Call	%Total	Non-Root Selection	%Total	Total
289	39	60	193	40	482

Method 3: Fetch from Cursor (Ordered by Id), Check Unused