Part 6 in a series on: Optimization Problems with Items and Categories in Oracle

The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

The knapsack problem and many other problems in combinatorial optimization require the selection of a subset of items to maximize an objective function subject to constraints. A common approach to solving these problems algorithmically involves recursively generating sequences of items of increasing length in a search for the best subset that meets the constraints.

I applied this kind of approach using SQL for a number of problems, starting in January 2013 with A Simple SQL Solution for the Knapsack Problem (SKP-1), and I wrote a summary article, Knapsacks and Networks in SQL, in December 2017 when I put the code onto GitHub, sql_demos - Brendan’s repo for interesting SQL.

This is the sixth in a series of eight articles that aim to provide a more formal treatment of algorithms for item sequence generation and optimization, together with practical implementations, examples and verification techniques in SQL and PL/SQL.

List of Articles

GitHub

Twitter

In the fifth article we used two larger test datasets to analyse the performance of our initial recursive query, with Value Filtering techniques included, as described in the fourth article, and looked at variations on the query designed to improve performance.

In the current article, we demonstrate how PL/SQL can be used to implement both recursive and iterative versions of the basic one-level algorithm with embedded SQL, and we go on to implement a two-level Iterative Refinement algorithm that proves to be highly efficient.

The performance results cited are from instance 3 of running the driver script, Run-All.ps1: Run-All_03.log (summary) and results_03 folder (detail) in the GitHub project


[Image by Eugenio Hansen, OFS from Pixabay]

Contents

↓ 1 Recursion and Iteration: Process Flows
↓ 2 PL/SQL-Driven Solution Methods: Arrays and Temporary Tables
↓ 3 PL/SQL-Driven Solution Methods: Performance
↓ 4 Tuning Memory Usage
↓ 5 Iterative Refinement Algorithms
↓ 6 Conclusion

1 Recursion and Iteration: Process Flows

↑ Contents
↓ Recursion
↓ Iteration

This section has diagrams showing the process flows for recursive and iterative versions of the one-level algorithm using PL/SQL to control the flow, with embedded SQL inserting to temporary table or array. In a later section we will show PL/SQL implementations of the algorithm using the four combinations of recursion / iteration and table / array.

Recursion

↑ 1 Recursion and Iteration: Process Flows

The diagram below shows the process flow for a recursive algorithm for item/category sequence optimization.

Iteration

↑ 1 Recursion and Iteration: Process Flows

The diagram below shows the process flow for an iterative algorithm for item/category sequence optimization.

2 PL/SQL-Driven Solution Methods: Arrays and Temporary Tables

↑ Contents
↓ Common Procedures
↓ Paths Table Solution Methods
↓ Paths Array Solution Methods

In the sections below we show the PL/SQL and SQL code for four solution methods implementing recursive and iterative algorithms using both a temporary table and an array for the storage of intermediate paths.

  • Table / Recursion - Pop_Table_Recurse
  • Table / Iteration - Pop_Table_Iterate
  • Array / Recursion - Array_Recurse
  • Array / Iteration - Array_Iterate

In each case the steps from the set of subsequences of a given length to the set longer by one item is controlled within PL/SQL, either by iteration or recursion, instead of by recursive SQL. This results in simpler SQL, and in particular we no longer need to join the item running sums either as a table or an array, because at a given step its values are constants.

We’ll see later that different approaches show differences in performance characteristics, and that all are faster than recursive SQL.

Common Procedures

↑ 2 PL/SQL-Driven Solution Methods: Arrays and Temporary Tables

There is a driving procedure, Init, that is called at the start of each solution method. It first sets some package globals from system context values to simplify access throughout the package, and calls two procedures to populate each of the tables. These procedures are described in the earlier article, OPICO 5: Tuning Recursive SQL for Item/Category Optimization:

  • Init: Driving procedure for the pre-calculations
  • pop_Item_Running_Sums: Local procedure to populate the g_items_running_sum_lis array (IRS)
  • pop_Items_Ranked: Local procedure to populate the ITEMS_RANKED index-organized table (IRK)

set_Sums

This procedure is called at the start of each iteration, and sets the globals, g_sum_price and g_sum_value, using the iteration number (or level) and the g_items_running_sum_lis array. These are referenced within the queries, avoiding the need for a table or array join.

PROCEDURE set_Sums(
            p_iter                         PLS_INTEGER) IS
BEGIN
  g_sum_price := 0;
  g_sum_value := 0;
  IF p_iter < g_seq_size then
    g_sum_price := g_items_running_sum_lis(g_seq_size - p_iter).sum_price;
    g_sum_value := g_items_running_sum_lis(g_seq_size - p_iter).sum_value;
  END IF;
END set_Sums;

Paths Table Solution Methods

↑ 2 PL/SQL-Driven Solution Methods: Arrays and Temporary Tables
↓ Common View - PATHS_RANKED_V
↓ Common Procedures - Paths
↓ Table / Recursion - Pop_Table_Recurse
↓ Table / Iteration - Pop_Table_Iterate

The two solution methods based on a temporary table, PATHS, use a common view, PATHS_RANKED_V, that ranks the records in the table, after its population by their respective procedures.

Both table solution methods populate the PATHS table via a call to their own procedure, which calls the common initialization procedure at the start to pre-populate the g_items_running_sum_lis array and the ITEMS_RANKED index-organized table.

Common View - PATHS_RANKED_V

↑ Paths Table Solution Methods

This view reads the temporary table PATHS, and ranks the records.

CREATE OR REPLACE VIEW paths_ranked_v AS
SELECT path,
       tot_price,
       tot_value,
       Row_Number() OVER (ORDER BY tot_value DESC, tot_price) rnk
  FROM paths

Common Procedures - Paths

↑ Paths Table Solution Methods
↓ insert_Initial_Path
↓ insert_Paths

The two solution methods based on a temporary table, PATHS, use common procedures to insert the initial, root, PATHS record, and the subsequent records.

insert_Initial_Path

↑ Common Procedures - Paths

This procedure inserts the root record with a next_cat_id value taken from the item ranking first in ITEMS_RANKED.

PROCEDURE insert_Initial_Path(
            p_timer_set                    PLS_INTEGER) IS
BEGIN
  DELETE paths;
  INSERT INTO paths (
     path_rnk, item_rnk, lev, tot_price, tot_value, cat_id, next_cat_id, same_cats, min_items, cats_path, path
  )
  SELECT 0, 0, 0, 0, 0, 'AL', cat_id, 0, 0, '',''
    FROM items_ranked
   WHERE item_rnk = 1;
END insert_Initial_Path;
insert_Paths

↑ Common Procedures - Paths

This procedure inserts the records at each subsequent iteration.

Notice that there is no join to an item running sums collection as there was in the recursive SQL solution methods of the previous article, because the values are now constants (g_sum_price, g_sum_value) during a single iteration.

PROCEDURE insert_Paths(
            p_timer_set                    PLS_INTEGER,
            p_iter                         PLS_INTEGER) IS
BEGIN
  set_Sums(p_iter => p_iter);
  INSERT INTO paths (
    path_rnk, item_rnk, lev, tot_price, tot_value, cat_id, next_cat_id, same_cats, min_items, cats_path, path
  )
  WITH path_join AS (
      SELECT Row_Number() OVER (PARTITION BY trw.cats_path || irk.cat_id ORDER BY trw.tot_value + irk.item_value DESC,  trw.tot_price + irk.item_price) path_rnk,
             irk.item_rnk,
             p_iter lev,
             trw.tot_price + irk.item_price tot_price,
             trw.tot_value + irk.item_value tot_value,
             irk.cat_id,
             irk.next_cat_id,
             CASE irk.cat_id WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END same_cats,
             irk.min_items,
             trw.cats_path || irk.cat_id cats_path,
             trw.path || irk.item_id path
        FROM paths trw
        JOIN items_ranked irk
          ON irk.item_rnk BETWEEN (trw.item_rnk + 1) AND (irk.n_items - (g_seq_size - trw.lev - 1))
       WHERE trw.tot_price + irk.item_price + g_sum_price <= g_max_price
         AND trw.tot_value + irk.item_value + g_sum_value >= g_min_value
         AND trw.lev < g_seq_size
         AND CASE irk.cat_id WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END <= irk.max_items
         AND g_seq_size - (trw.lev + 1) + Least(CASE irk.cat_id
                                                WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END,
                                                irk.min_items)
             >= irk.min_remain
         AND (irk.cat_id = trw.cat_id OR trw.same_cats >= trw.min_items)
         AND (irk.cat_id = trw.cat_id OR irk.cat_id = Nvl(trw.next_cat_id, irk.cat_id))
  )
  SELECT path_rnk,
         item_rnk,
         lev,
         tot_price,
         tot_value,
         cat_id,
         next_cat_id,
         same_cats,
         min_items,
         cats_path,
         path
    FROM path_join
   WHERE path_rnk <= g_keep_num OR g_keep_num = 0;
  DELETE paths WHERE lev = p_iter - 1;
END insert_Paths;

Table / Recursion - Pop_Table_Recurse

↑ Paths Table Solution Methods

Table / Recursion Driving Block

The table / recursion solution method uses a driving block to populate the PATHS table, via a call to the procedure Pop_Table_Recurse, which is then queried via the view PATHS_RANKED_V.

BEGIN
  Item_Cat_Seqs.Pop_Table_Recurse(p_keep_num      => &KEEP_NUM,
                                  p_min_value     => &MIN_VALUE);
END;
Procedure - Pop_Table_Recurse

This procedure implements the table / recursion solution, with the procedure calling itself recursively.

PROCEDURE Pop_Table_Recurse(
            p_keep_num                     PLS_INTEGER,
            p_min_value                    PLS_INTEGER,
            p_timer_set                    PLS_INTEGER := NULL,
            p_lev                          PLS_INTEGER := NULL) IS
  l_timer_set       PLS_INTEGER := p_timer_set;
BEGIN
  IF p_timer_set IS NULL THEN
    Init(p_keep_num    => p_keep_num,
         p_min_value   => p_min_value);
  END IF;
  IF p_lev = 0 THEN
    insert_Initial_Path(p_timer_set => l_timer_set);
    RETURN;
  END IF;
  Pop_Table_Recurse(p_keep_num    => p_keep_num,
                    p_min_value   => p_min_value,
                    p_timer_set   => l_timer_set,
                    p_lev         => Nvl(p_lev, g_seq_size) - 1);
  insert_Paths(p_timer_set => l_timer_set,
               p_iter      => Nvl(p_lev, g_seq_size));
END Pop_Table_Recurse;

Table / Iteration - Pop_Table_Iterate

↑ Paths Table Solution Methods

Table / Iteration Driving Block

The table / iteration solution method uses a driving block to populate the PATHS table, via a call to the procedure Pop_Table_Iterate, which is then queried via the view PATHS_RANKED_V.

BEGIN
  Item_Cat_Seqs.Pop_Table_Iterate(p_keep_num      => &KEEP_NUM,
                                  p_min_value     => &MIN_VALUE);
END;
Procedure - Pop_Table_Iterate

This procedure implements the table / iteration solution method, with the procedure having a loop over the sequence slots, with a call to the procedure insert_Paths inside the loop.

PROCEDURE Pop_Table_Iterate(
            p_keep_num                     PLS_INTEGER,
            p_min_value                    PLS_INTEGER) IS
BEGIN
  Init(p_keep_num    => p_keep_num,
       p_min_value   => p_min_value);
  insert_Initial_Path(p_timer_set => l_timer_set);
  FOR i IN 1..g_seq_size LOOP
    insert_Paths(p_timer_set => l_timer_set,
                 p_iter      => i);
  END LOOP;
END Pop_Table_Iterate;

Paths Array Solution Methods

↑ 2 PL/SQL-Driven Solution Methods: Arrays and Temporary Tables
↓ Array Driving Block
↓ Common Function - Initial Path
↓ Array / Recursion - Array_Recurse
↓ Array / Iteration - Array_Iterate

The two array solution methods call the Init driving procedure first, to populate the item temporary tables, then call pipelined functions from their respective views, since functions called from a query can’t effect any DML.

Array Driving Block

↑ Paths Array Solution Methods

Both array solution methods uses a common driving block to pre-populate the the array g_items_running_sum_lis and the index-organized table ITEMS_RANKED, using the common procedure Init.

BEGIN
  Item_Cat_Seqs.Init(p_keep_num      => &KEEP_NUM,
                     p_min_value     => &MIN_VALUE);
END;

Common Function - Initial Path

↑ Paths Array Solution Methods

Both array solution methods use a common function to return the initial root record. The function returns the root record with a next_cat_id value taken from the item ranking first in ITEMS_RANKED.

FUNCTION initial_Path
            RETURN                         paths%ROWTYPE IS
  l_paths_rec       paths%ROWTYPE;
BEGIN
  SELECT 0, 0, 0, 0, 0, 'AL', cat_id, 0, 0, '',''
    INTO l_paths_rec
    FROM items_ranked
   WHERE item_rnk = 1;
  RETURN l_paths_rec;
END initial_Path;

Array / Recursion - Array_Recurse

↑ Paths Array Solution Methods

View - ARRAY_RECURSE_V

This view calls the array / recursion function Array_Recurse, and ranks the records it returns.

CREATE OR REPLACE VIEW array_recurse_v AS
SELECT path,
       tot_value,
       tot_price,
       Row_Number() OVER (ORDER BY tot_value DESC, tot_price) rnk
  FROM TABLE(Item_Cat_Seqs.Array_Recurse)
Function - Array_Recurse

This function implements the array / recursion solution method, with the function having a recursive call to itself from the cursor within the loop that pipes out its result records in array form.

Notice that there is no join to an item running sums collection as there was in the recursive SQL solution methods of the previous article, because the values are now constants (g_sum_price, g_sum_value) during a single iteration.

FUNCTION Array_Recurse(
            p_timer_set                    PLS_INTEGER := NULL,
            p_lev                          PLS_INTEGER := NULL)
            RETURN                         paths_arr PIPELINED IS
BEGIN
  IF p_lev = 0 THEN
    PIPE ROW(initial_Path);
    RETURN;
  END IF;
  set_Sums(p_iter => p_lev);
  FOR rec IN (
    WITH path_join AS (
        SELECT Row_Number() OVER (PARTITION BY trw.cats_path || irk.cat_id ORDER BY trw.tot_value + irk.item_value DESC,  trw.tot_price + irk.item_price) path_rnk,
               irk.item_rnk,
               trw.lev + 1 lev,
               trw.tot_price + irk.item_price tot_price,
               trw.tot_value + irk.item_value tot_value,
               irk.cat_id,
               irk.next_cat_id,
               CASE irk.cat_id WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END same_cats,
               irk.min_items,
               trw.cats_path || irk.cat_id cats_path,
               trw.path || irk.item_id path
          FROM Array_Recurse(p_timer_set    => l_timer_set,
                             p_lev          => Nvl(p_lev, g_seq_size) - 1) trw
          JOIN items_ranked irk
            ON irk.item_rnk BETWEEN (trw.item_rnk + 1) AND (irk.n_items - (g_seq_size - trw.lev - 1))
         WHERE trw.tot_price + irk.item_price + g_sum_price <= g_max_price
           AND trw.tot_value + irk.item_value + g_sum_value >= g_min_value
           AND trw.lev < g_seq_size
           AND CASE irk.cat_id WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END <= irk.max_items
           AND g_seq_size - (trw.lev + 1) + Least(CASE irk.cat_id
                                                  WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END,
                                                  irk.min_items)
               >= irk.min_remain
           AND (irk.cat_id = trw.cat_id OR trw.same_cats >= trw.min_items)
           AND (irk.cat_id = trw.cat_id OR irk.cat_id = Nvl(trw.next_cat_id, irk.cat_id))
    )
    SELECT path_rnk,
           item_rnk,
           lev,
           tot_price,
           tot_value,
           cat_id,
           next_cat_id,
           same_cats,
           min_items,
           cats_path,
           path
      FROM path_join
     WHERE path_rnk <= g_keep_num OR g_keep_num = 0) LOOP
      PIPE ROW(rec);
  END LOOP;
END Array_Recurse;

Array / Iteration - Array_Iterate

↑ Paths Array Solution Methods

View - ARRAY_ITERATE_V

This view calls the array / iteration function Array_Iterate, and ranks the records it returns.

CREATE OR REPLACE VIEW array_iterate_v AS
SELECT path,
       tot_value,
       tot_price,
       Row_Number() OVER (ORDER BY tot_value DESC, tot_price) rnk
  FROM TABLE(Item_Cat_Seqs.Array_Iterate)
Function - Array_Iterate

This function implements the array / iteration solution method, with the function having a loop over the sequence slots, with a recursive call to itself within the loop BULK COLLECT query. The paths array collected replaces the previous array at each iteration.

Notice that there is no join to an item running sums collection as there was in the recursive SQL solution methods of the previous article, because the values are now constants (g_sum_price, g_sum_value) during a single iteration.

FUNCTION Array_Iterate
            RETURN                         paths_arr PIPELINED IS
  l_paths_lis       paths_arr := paths_arr(initial_Path);
  l_paths_new_lis   paths_arr;
BEGIN
  FOR i IN 1..g_seq_size LOOP
    set_Sums(p_iter => i);
    WITH path_join AS (
        SELECT Row_Number() OVER (PARTITION BY trw.cats_path || irk.cat_id ORDER BY trw.tot_value + irk.item_value DESC,  trw.tot_price + irk.item_price) path_rnk,
               irk.item_rnk,
               trw.lev + 1 lev,
               trw.tot_price + irk.item_price tot_price,
               trw.tot_value + irk.item_value tot_value,
               irk.cat_id,
               irk.next_cat_id,
               CASE irk.cat_id WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END same_cats,
               irk.min_items,
               trw.cats_path || irk.cat_id cats_path,
               trw.path || irk.item_id path
          FROM TABLE(l_paths_lis) trw
          JOIN items_ranked irk
            ON irk.item_rnk BETWEEN (trw.item_rnk + 1) AND (irk.n_items - (g_seq_size - trw.lev - 1))
         WHERE trw.tot_price + irk.item_price + g_sum_price <= g_max_price
           AND trw.tot_value + irk.item_value + g_sum_value >= g_min_value
           AND trw.lev < g_seq_size
           AND CASE irk.cat_id WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END <= irk.max_items
           AND g_seq_size - (trw.lev + 1) + Least(CASE irk.cat_id
                                                  WHEN trw.cat_id THEN trw.same_cats + 1 ELSE 1 END,
                                                  irk.min_items)
               >= irk.min_remain
           AND (irk.cat_id = trw.cat_id OR trw.same_cats >= trw.min_items)
           AND (irk.cat_id = trw.cat_id OR irk.cat_id = Nvl(trw.next_cat_id, irk.cat_id))
    )
    SELECT
           path_rnk,
           item_rnk,
           lev,
           tot_price,
           tot_value,
           cat_id,
           next_cat_id,
           same_cats,
           min_items,
           cats_path,
           path
      BULK COLLECT INTO l_paths_new_lis
      FROM path_join
     WHERE path_rnk <= g_keep_num OR g_keep_num = 0;
    IF i < g_seq_size THEN
      l_paths_lis := l_paths_new_lis;
    END IF;
  END LOOP;
  FOR i IN 1..l_paths_new_lis.COUNT LOOP
    PIPE ROW(l_paths_new_lis(i));
  END LOOP;
END Array_Iterate;

3 PL/SQL-Driven Solution Methods: Performance

↑ Contents
↓ Results Summary
↓ Code Timing
↓ Execution Plans

Results of testing are summarised for the smaller, Brazil, and larger, England, datasets. Then a table of code timing results is given showing detailed row counts and timings by iteration for one of the solution methods. Finally, execution plans are shown for each method.

Results Summary

↑ 3 PL/SQL-Driven Solution Methods: Performance
↓ Solution Methods
↓ Brazil Dataset
↓ England Dataset

Solution Methods

↑ Results Summary

This table shows the different solution methods tested, and assigns a short code to each one for convenience.

Code Procedure / Function Table / Array Recurse / Iterate
PTR Pop_Table_Recurse Table Recurse
PTI Pop_Table_Iterate Table Iterate
ARE Array_Recurse Array Recurse
AIT Array_Iterate Array Iterate

Brazil Dataset

↑ Results Summary

Solutions

Two solutions were found across the four pairs of parameters, one, B-B, suboptimal and the other, B-A, optimal:

Solution B-A (optimal)
Path                                 Total Value Total Price  Rank
------------------------------------ ----------- ----------- -----
078022098099058059060001002003038039       10923       18176     1
078023098099058059060001002003038039       10905       18795     2
078022098102058059060001002003038039       10833       18994     3
078022098099058059060001002003038040       10825       18190     4
078023098099058059060001002003038040       10807       18809     5
078022098099058059061001002003038039       10791       18915     6
078022098099058059060001002004038039       10790       17201     7
078021098099058059060001002004038039       10790       18052     8
078023098099058059060001002004038039       10772       17820     9
078022098099058059060001002003038041       10766       17948    10

Solution B-B (suboptimal)
Path                                 Total Value Total Price  Rank
------------------------------------ ----------- ----------- -----
078022098099058059060001002003038039       10923       18176     1
078023098099058059060001002003038039       10905       18795     2
078022098099058059060001002003038040       10825       18190     3
078023098099058059060001002003038040       10807       18809     4
078022098099058059060001002004038039       10790       17201     5
078021098099058059060001002004038039       10790       18052     6
078023098099058059060001002004038039       10772       17820     7
078022098099058059060001002003038041       10766       17948     8
078021098099058059060001002003038041       10766       18799     9
078022098099058059060061001002003038       10748       18209    10
Execution Times (Seconds)
KEEP_NUM MIN_VALUE Solution Set PTR PTI ARE AIT
10 0 B-B (suboptimal) 0.4 0.4 0.4 0.4
100 0 B-A (optimal) 0.6 0.6 0.6 0.6
100 10748 B-A (optimal) 0.4 0.4 0.4 0.4
0 10748 B-A (optimal) 0.4 0.4 0.4 0.4

None of the execution times exceed a second here, and there is little to distinguish the different solution methods on this smaller dataset.

England Dataset

↑ Results Summary

Solutions

Two solutions were found across the four pairs of parameters, one, E-B, suboptimal and the other, E-A, optimal:

Solution E-A (optimal)
Path                              Total Value Total Price  Rank
--------------------------------- ----------- ----------- -----
037024160463488298027452193344166        1965         889     1
037024160264488298045027452193166        1965         890     2
037024160463488298044027452193344        1963         890     3
037024160463488298044027452193166        1962         884     4
037024160264488298314027452193166        1962         885     5
037024160272488298044027452193166        1962         889     6
252024160264488298044027452193166        1959         889     7
037024160463488298045027452193344        1958         887     8
037024160463488298045027452193166        1957         881     9
037024160463488298314027452328166        1957         886    10

Solution E-B (suboptimal)
Path                              Total Value Total Price  Rank
--------------------------------- ----------- ----------- -----
037024160463488298027452193344166        1965         889     1
037024160264488298045027452193166        1965         890     2
037024160463488298044027452193344        1963         890     3
037024160463488298044027452193166        1962         884     4
037024160264488298314027452193166        1962         885     5
037024160272488298044027452193166        1962         889     6
037024160264488298027452193344478        1957         887     7
037024160264488298027452193166478        1956         881     8
037024160264488298044027452193478        1954         882     9
037024160264488298027452193166460        1952         886    10
Execution Times (Seconds)
KEEP_NUM MIN_VALUE Solution Set PTR PTI ARE AIT
50 0 E-B (suboptimal) 1 1 2 2
300 0 E-A (optimal) 7 7 9 9
300 1952 E-A (optimal) 1 1 1 1
0 1952 E-A (optimal) 83 81 140 116

On this larger dataset we can see performance differences across the solution methods, particularly on the hardest parameter pair, (0, 1952).

The two Table methods perform similarly, at around 82 seconds. The Array / Recurse method is much worse at 140 seconds, and the Array / Iterate method took 116 seconds.

Using arrays in SQL can lead to performance problems owing to lack of good cardinality estimates, and the execution plan for Array / Recurse, shown in the next section, is perhaps suboptimal for that reason, with the execution time being 70% longer on the hardest parameter pair, (0, 1952). However, the plan for Array / Iterate was similar to that of the Table methods and was still significantly slower on (0, 1952), with the execution time being 40% longer.

Code Timing

↑ 3 PL/SQL-Driven Solution Methods: Performance

Each solution method was instrumented using the author’s own code timing package Oracle PL/SQL code timing module. Here are the results for the Pop_Table_Iterate solution on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952. The INSERT and DELETE statements are timed separately at each iteration. The code timing package allows the timer name to include the rows processed, as shown below, providing quite detailed statistics at each iteration within PL/SQL.

Timer Set: Pop_Table_Iterate, Constructed at 10 Jun 2024 23:39:03, written at 23:40:24
======================================================================================
Timer                        Elapsed         CPU       Calls       Ela/Call       CPU/Call
------------------------  ----------  ----------  ----------  -------------  -------------
Initial delete (0)              0.01        0.01           1        0.00900        0.01000
Initial insert (1)              0.00        0.00           1        0.00000        0.00000
Insert paths 1 (17)             0.00        0.00           1        0.00000        0.00000
Delete paths 1 (1)              0.00        0.00           1        0.00000        0.00000
Insert paths 2 (290)            0.00        0.01           1        0.00400        0.01000
Delete paths 2 (17)             0.00        0.00           1        0.00000        0.00000
Insert paths 3 (553)            0.03        0.02           1        0.03100        0.02000
Delete paths 3 (290)            0.00        0.00           1        0.00000        0.00000
Insert paths 4 (279)            0.06        0.06           1        0.05600        0.06000
Delete paths 4 (553)            0.00        0.00           1        0.00000        0.00000
Insert paths 5 (1252)           0.03        0.03           1        0.03100        0.03000
Delete paths 5 (279)            0.00        0.00           1        0.00000        0.00000
Insert paths 6 (15349)          0.12        0.11           1        0.12000        0.11000
Delete paths 6 (1252)           0.00        0.01           1        0.00200        0.01000
Insert paths 7 (110100)         1.19        1.12           1        1.18700        1.12000
Delete paths 7 (15349)          0.02        0.01           1        0.01500        0.01000
Insert paths 8 (460638)         6.32        6.11           1        6.31600        6.11000
Delete paths 8 (110100)         0.11        0.07           1        0.10700        0.07000
Insert paths 9 (864748)        21.31       20.49           1       21.31400       20.49000
Delete paths 9 (460638)         0.41        0.30           1        0.41000        0.30000
Insert paths 10 (389615)       35.84       35.50           1       35.83500       35.50000
Delete paths 10 (864748)        0.68        0.42           1        0.67900        0.42000
Insert paths 11 (50)           14.29       14.24           1       14.29100       14.24000
Delete paths 11 (389615)        0.31        0.22           1        0.30500        0.22000
(Other)                         0.00        0.00           1        0.00000        0.00000
------------------------  ----------  ----------  ----------  -------------  -------------
Total                          80.71       78.73          25        3.22848        3.14920
------------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.00952, CPU: 0.01143]

The Insert rows and times have been extracted to an Excel file, and the times were normalized so that the maximum normalized time equals the maximum number of rows. The rows and normalized times have then been plotted on a graph against the iteration.

The graph shows that no significant time is taken until the rows exceeds about 100,000 in iteration 7, and shows that the times rise in proportion to the rows at the prior iteration. This makes sense because the driving row set at each iteration is the set of paths returned at the prior iteration.

Execution Plans

↑ 3 PL/SQL-Driven Solution Methods: Performance
↓ Pop_Table_Recurse
↓ Pop_Table_Iterate
↓ Array_Recurse
↓ Array_Iterate

Execution plans were taken from the SELECT statement at the 10’th iteration for the England dataset, where the code timing results showed most time was spent. The plans are generally simpler than those in the previous article from recursive SQL solution methods, with no join to an item running sums collection in particular.

The plans were gathered automatically within the code, by means of adding a hint to the query and then making a call to a wrapper utility function, Utils.Get_XPlan, Oracle PL/SQL General Utilities Module. For example:

Hint
/*+ gather_plan_statistics INSERT_PATHS */
Utility Function Calls

At the 10’th iteration the most recent plan with the marker INSERT_PATHS is retrieved to an array of strings by Utils.Get_XPlan, and written out by Utils.W.

  IF p_iter = 10 THEN
    Utils.W(Utils.Get_XPlan(p_sql_marker => 'INSERT_PATHS'));
  END IF;

Pop_Table_Recurse

↑ Execution Plans

The code timing line for the 10’th iteration on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952 was:

Timer                        Elapsed         CPU       Calls       Ela/Call       CPU/Call
------------------------  ----------  ----------  ----------  -------------  -------------
Insert paths 10 (389615)       36.37       36.03           1       36.37100       36.03000

The plan shows a nested loops join between PATHS and SYS_IOT_TOP_165264, which is the index on the index-organized table, ITEMS_RANKED, and is accessed via an index range scan.

Plan hash value: 2684864324
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                    |      1 |        |      0 |00:00:36.16 |     922K|       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL | PATHS              |      1 |        |      0 |00:00:36.16 |     922K|       |       |          |         |
|*  2 |   VIEW                   |                    |      1 |      1 |    389K|00:00:35.94 |     890K|       |       |          |         |
|   3 |    WINDOW SORT           |                    |      1 |      1 |    389K|00:00:35.76 |     890K|    82M|  3117K|   73M (0)|         |
|   4 |     NESTED LOOPS         |                    |      1 |      1 |    389K|00:00:12.54 |     890K|       |       |          |         |
|*  5 |      TABLE ACCESS FULL   | PATHS              |      1 |      1 |    864K|00:00:00.15 |   16167 |       |       |          |         |
|*  6 |      INDEX RANGE SCAN    | SYS_IOT_TOP_177213 |    864K|      1 |    389K|00:00:32.73 |     874K|       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("PATH_RNK"<=:B7 OR :B7=0))
5 - filter("TRW"."LEV"<:B1)
6 - access("IRK"."ITEM_RNK">="TRW"."ITEM_RNK"+1)
filter(("TRW"."TOT_PRICE"+"IRK"."ITEM_PRICE"+:B5<=:B4 AND "TRW"."TOT_VALUE"+"IRK"."ITEM_VALUE"+:B3>=:B2 AND
"IRK"."ITEM_RNK"<="IRK"."N_ITEMS"-(:B1-"TRW"."LEV"-1) AND "IRK"."MAX_ITEMS">=CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END  AND "IRK"."MIN_REMAIN"<=:B1-("TRW"."LEV"+1)+LEAST(CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END ,"IRK"."MIN_ITEMS") AND ("IRK"."CAT_ID"="TRW"."CAT_ID" OR "TRW"."SAME_CATS">="TRW"."MIN_ITEMS") AND
("IRK"."CAT_ID"="TRW"."CAT_ID" OR "IRK"."CAT_ID"=NVL("TRW"."NEXT_CAT_ID","IRK"."CAT_ID"))))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Pop_Table_Iterate

↑ Execution Plans

The code timing line for the 10’th iteration on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952 was:

Timer                        Elapsed         CPU       Calls       Ela/Call       CPU/Call
------------------------  ----------  ----------  ----------  -------------  -------------
Insert paths 10 (389615)       35.84       35.50           1       35.83500       35.50000

The plan here is the same as for Pop_Table_Recurse.

Plan hash value: 2684864324
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                    |      1 |        |      0 |00:00:35.61 |     919K|       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL | PATHS              |      1 |        |      0 |00:00:35.61 |     919K|       |       |          |         |
|*  2 |   VIEW                   |                    |      1 |      1 |    389K|00:00:35.38 |     889K|       |       |          |         |
|   3 |    WINDOW SORT           |                    |      1 |      1 |    389K|00:00:35.20 |     889K|    82M|  3117K|   73M (0)|         |
|   4 |     NESTED LOOPS         |                    |      1 |      1 |    389K|00:00:07.07 |     889K|       |       |          |         |
|*  5 |      TABLE ACCESS FULL   | PATHS              |      1 |      1 |    864K|00:00:00.13 |   15142 |       |       |          |         |
|*  6 |      INDEX RANGE SCAN    | SYS_IOT_TOP_177213 |    864K|      1 |    389K|00:00:32.22 |     874K|       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("PATH_RNK"<=:B7 OR :B7=0))
5 - filter("TRW"."LEV"<:B1)
6 - access("IRK"."ITEM_RNK">="TRW"."ITEM_RNK"+1)
filter(("TRW"."TOT_PRICE"+"IRK"."ITEM_PRICE"+:B5<=:B4 AND "TRW"."TOT_VALUE"+"IRK"."ITEM_VALUE"+:B3>=:B2 AND
"IRK"."ITEM_RNK"<="IRK"."N_ITEMS"-(:B1-"TRW"."LEV"-1) AND "IRK"."MAX_ITEMS">=CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END  AND "IRK"."MIN_REMAIN"<=:B1-("TRW"."LEV"+1)+LEAST(CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END ,"IRK"."MIN_ITEMS") AND ("IRK"."CAT_ID"="TRW"."CAT_ID" OR "TRW"."SAME_CATS">="TRW"."MIN_ITEMS") AND
("IRK"."CAT_ID"="TRW"."CAT_ID" OR "IRK"."CAT_ID"=NVL("TRW"."NEXT_CAT_ID","IRK"."CAT_ID"))))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Array_Recurse

↑ Execution Plans

The code timing line for the 10’th iteration on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952 was:

Timer                          Elapsed         CPU       Calls       Ela/Call       CPU/Call
------------------------    ----------  ----------  ----------  -------------  -------------
Recurse level: 10 (389615)       63.21       62.56           1       63.21000       62.56000

The plan here does not include an INSERT, but is more complicated than for Pop_Table_* solution methods, with merge join and two sort joins, and the time used is dignificantly larger.

Plan hash value: 4226761357
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |      1 |        |    389K|00:01:51.12 |      75 |  18012 |  18012 |       |       |          |         |
|*  1 |  VIEW                                 |                    |      1 |      1 |    389K|00:01:51.12 |      75 |  18012 |  18012 |       |       |          |         |
|   2 |   WINDOW SORT                         |                    |      1 |      1 |    389K|00:01:50.88 |      75 |  18012 |  18012 |    80M|  3082K|   71M (0)|         |
|   3 |    MERGE JOIN                         |                    |      1 |      1 |    389K|00:00:54.06 |      75 |  18012 |  18012 |       |       |          |         |
|   4 |     SORT JOIN                         |                    |      1 |    408 |    864K|00:00:49.43 |      68 |  18012 |  18012 |   104M|  3482K|   92M (0)|         |
|*  5 |      COLLECTION ITERATOR PICKLER FETCH| ARRAY_RECURSE      |      1 |    408 |    864K|00:00:55.05 |      68 |  18012 |  18012 |       |       |          |         |
|*  6 |     FILTER                            |                    |    864K|        |    389K|00:00:57.04 |       7 |      0 |      0 |       |       |          |         |
|*  7 |      SORT JOIN                        |                    |    864K|    560 |    193M|00:00:34.17 |       7 |      0 |      0 |   124K|   124K|  110K (0)|         |
|   8 |       INDEX FAST FULL SCAN            | SYS_IOT_TOP_177213 |      1 |    560 |    560 |00:00:00.01 |       7 |      0 |      0 |       |       |          |         |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("PATH_RNK"<=:B8 OR :B8=0))
5 - filter(VALUE(KOKBF$)<:B3)
6 - filter((VALUE(KOKBF$)+"IRK"."ITEM_PRICE"+:B7<=:B6 AND VALUE(KOKBF$)+"IRK"."ITEM_VALUE"+:B5>=:B4 AND "IRK"."ITEM_RNK"<="IRK"."N_ITEMS"-(:B3-VALUE(KOKBF$)-1)
AND ("IRK"."CAT_ID"=VALUE(KOKBF$) OR "IRK"."CAT_ID"=NVL(VALUE(KOKBF$),"IRK"."CAT_ID")) AND "IRK"."MAX_ITEMS">=CASE "IRK"."CAT_ID" WHEN VALUE(KOKBF$) THEN
VALUE(KOKBF$)+1 ELSE 1 END  AND ("IRK"."CAT_ID"=VALUE(KOKBF$) OR VALUE(KOKBF$)>=VALUE(KOKBF$)) AND "IRK"."MIN_REMAIN"<=:B3-(VALUE(KOKBF$)+1)+LEAST(CASE
"IRK"."CAT_ID" WHEN VALUE(KOKBF$) THEN VALUE(KOKBF$)+1 ELSE 1 END ,"IRK"."MIN_ITEMS")))
7 - access("IRK"."ITEM_RNK">=VALUE(KOKBF$)+1)
filter("IRK"."ITEM_RNK">=VALUE(KOKBF$)+1)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Array_Iterate

↑ Execution Plans

The code timing line for the 10’th iteration on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952 was:

Timer                    Elapsed         CPU       Calls       Ela/Call       CPU/Call
--------------------  ----------  ----------  ----------  -------------  -------------
Insert 10 (1)              51.58       50.55           1       51.57900       50.55000

The plan here is closer to that of the table methods, although somewhat slower, but faster than for Array_Recurse.

Plan hash value: 1740109178
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                    |      1 |        |    389K|00:00:51.12 |     874K|   9797 |   9797 |       |       |          |         |
|*  1 |  VIEW                                |                    |      1 |      1 |    389K|00:00:51.12 |     874K|   9797 |   9797 |       |       |          |         |
|   2 |   WINDOW SORT                        |                    |      1 |      1 |    389K|00:00:50.87 |     874K|   9797 |   9797 |    86M|  3186K| 5045K (1)|      77M|
|   3 |    NESTED LOOPS                      |                    |      1 |      1 |    389K|00:00:06.91 |     874K|      0 |      0 |       |       |          |         |
|*  4 |     COLLECTION ITERATOR PICKLER FETCH|                    |      1 |      1 |    864K|00:00:01.10 |       0 |      0 |      0 |       |       |          |         |
|*  5 |     INDEX RANGE SCAN                 | SYS_IOT_TOP_177213 |    864K|      1 |    389K|00:00:45.69 |     874K|      0 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("PATH_RNK"<=:B7 OR :B7=0))
4 - filter(VALUE(KOKBF$)<:B2)
5 - access("IRK"."ITEM_RNK">=VALUE(KOKBF$)+1)
filter((VALUE(KOKBF$)+"IRK"."ITEM_PRICE"+:B6<=:B5 AND VALUE(KOKBF$)+"IRK"."ITEM_VALUE"+:B4>=:B3 AND "IRK"."ITEM_RNK"<="IRK"."N_ITEMS"-(:B2-VALUE(KOKBF$)-1)
AND ("IRK"."CAT_ID"=VALUE(KOKBF$) OR "IRK"."CAT_ID"=NVL(VALUE(KOKBF$),"IRK"."CAT_ID")) AND "IRK"."MAX_ITEMS">=CASE "IRK"."CAT_ID" WHEN VALUE(KOKBF$) THEN
VALUE(KOKBF$)+1 ELSE 1 END  AND ("IRK"."CAT_ID"=VALUE(KOKBF$) OR VALUE(KOKBF$)>=VALUE(KOKBF$)) AND "IRK"."MIN_REMAIN"<=:B2-(VALUE(KOKBF$)+1)+LEAST(CASE
"IRK"."CAT_ID" WHEN VALUE(KOKBF$) THEN VALUE(KOKBF$)+1 ELSE 1 END ,"IRK"."MIN_ITEMS")))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

4 Tuning Memory Usage

↑ Contents
↓ Path Columns
↓ Category Path Compacting
↓ Item Path Compacting
↓ Item and Category Path Compacting Combined
↓ Results Comparison

In the section on performance we saw that the Table methods performed slightly better than the Array methods, and appear more robust in terms of cardinality estimation. Let’s take the best performing (by a small margin) Table / Iterate method and see whether we can tune memory usage and see what effect that may have on performance.

Path Columns

↑ 4 Tuning Memory Usage

Here is the execution plan steps for the England dataset on the parameter pair: KEEP_NUM = 0, MIN_VALUE = 1952.

Plan hash value: 2684864324
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                    |      1 |        |      0 |00:00:35.61 |     919K|       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL | PATHS              |      1 |        |      0 |00:00:35.61 |     919K|       |       |          |         |
|*  2 |   VIEW                   |                    |      1 |      1 |    389K|00:00:35.38 |     889K|       |       |          |         |
|   3 |    WINDOW SORT           |                    |      1 |      1 |    389K|00:00:35.20 |     889K|    82M|  3117K|   73M (0)|         |
|   4 |     NESTED LOOPS         |                    |      1 |      1 |    389K|00:00:07.07 |     889K|       |       |          |         |
|*  5 |      TABLE ACCESS FULL   | PATHS              |      1 |      1 |    864K|00:00:00.13 |   15142 |       |       |          |         |
|*  6 |      INDEX RANGE SCAN    | SYS_IOT_TOP_177213 |    864K|      1 |    389K|00:00:32.22 |     874K|       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("PATH_RNK"<=:B7 OR :B7=0))
5 - filter("TRW"."LEV"<:B1)
6 - access("IRK"."ITEM_RNK">="TRW"."ITEM_RNK"+1)
filter(("TRW"."TOT_PRICE"+"IRK"."ITEM_PRICE"+:B5<=:B4 AND "TRW"."TOT_VALUE"+"IRK"."ITEM_VALUE"+:B3>=:B2 AND
"IRK"."ITEM_RNK"<="IRK"."N_ITEMS"-(:B1-"TRW"."LEV"-1) AND "IRK"."MAX_ITEMS">=CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END  AND "IRK"."MIN_REMAIN"<=:B1-("TRW"."LEV"+1)+LEAST(CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END ,"IRK"."MIN_ITEMS") AND ("IRK"."CAT_ID"="TRW"."CAT_ID" OR "TRW"."SAME_CATS">="TRW"."MIN_ITEMS") AND
("IRK"."CAT_ID"="TRW"."CAT_ID" OR "IRK"."CAT_ID"=NVL("TRW"."NEXT_CAT_ID","IRK"."CAT_ID"))))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

Much of the work appears to be done in step 6, the index range scan of SYS_IOT_TOP_165264, which is the index on the index-organised table, ITEMS_RANKED in the nested loops join from the PATHS table, where 864K starts results in 389K rows, with an A-Time of 00:00:32.22, and Buffers of 874K.

The index-organized table ITEMS_RANKED is created thus:

CREATE TABLE items_ranked (
        item_id                 VARCHAR2(3),
        cat_id                  VARCHAR2(5),
        item_price              INTEGER,
        item_value              INTEGER,
        min_items               INTEGER,
        max_items               INTEGER,
        min_remain              INTEGER,
        next_cat_id             VARCHAR2(5),
        item_rnk                INTEGER PRIMARY KEY,
        n_items                 INTEGER
)
ORGANIZATION INDEX

The temporary table PATHS is created thus:

CREATE GLOBAL TEMPORARY TABLE paths (
        path_rnk                INTEGER,
        item_rnk                INTEGER,
        lev                     INTEGER,
        tot_price               INTEGER,
        tot_value               INTEGER,
        cat_id                  VARCHAR2(5),
        next_cat_id             VARCHAR2(5),
        same_cats               INTEGER,
        min_items               INTEGER,
        cats_path               VARCHAR2(50),
        path                    VARCHAR2(50)
)
ON COMMIT DELETE ROWS

There are two columns, cats_path and path, in the PATHS table that accumulate different kinds of path, concatenating a string value to the current path values at each iteration. It seemed possible that we might be able to reduce the storage associated with these paths, and that processing smaller amounts of data could reduce the execution times.

Category Path Compacting

↑ 4 Tuning Memory Usage
↓ A Compact Method for Representing Category Path
↓ Procedure pop_Items_Ranked_Base
↓ Results

A Compact Method for Representing Category Path

↑ Category Path Compacting
↓ Category Numbers via Number Bases
↓ Revised Data Model

Category Numbers via Number Bases

↑ A Compact Method for Representing Category Path

The column CATS_PATH is used to store the category id path through the iterations. This path is used to partition the path ranking that is used to filter out lower valued paths according to the KEEP_NUM value:

   WHERE path_rnk <= g_keep_num OR g_keep_num = 0;

The path rank is calculated as follows:

SELECT Row_Number() OVER (PARTITION BY trw.cats_path || irk.cat_id
                              ORDER BY trw.tot_value + irk.item_value DESC,
                                       trw.tot_price + irk.item_price) path_rnk

The reason for this partitioning is that there are upper and lower limits on the total numbers of items in each category allowed, and if we calculate the ranks globally paths may be excluded early on value grounds that might be required to meet the category limits. This reduces the effectiveness of the approximative KEEP_NUM parameter as it would have to be kept higher than would otherwise be necessary.

However, although we need to maintain category based partitioning, there is another, more compact, way of achieving this. Firstly, note that we only need the numbers of items in each category, not the order they occur in the iteration scheme. Secondly, note that there are a relatively small number of categories, and each has a small maximum limit (which can’t of course be larger than the sequence length).

If we have M categories in a given order, then we can represent the category path by a sequence of integers indicating the numbers of items in each category:


If we have = the maximum number of items possible in any category, then we can consider the sequence to represent a single number in the number base . This can be represented as a decimal integer by the following sum:


Remember that our only use of the category path is in the partitioning clause, and this unique identifier for that path is sufficient for the purpose. Furthermore, we can replace the string category identifier with the integer value:


Then at each iteration we can simply add to the integer representing the category path this new integer category identifier, which we’ll call cat_base_id.

Now instead of accumulating the category path string, cats_path, as:

   trw.cats_path || irk.cat_id cats_path

we can calculate cat_base_sum as the integer value:

   trw.cat_base_sum + irk.cat_base_id cat_base_sum

and the path rank calculation becomes:

SELECT Row_Number() OVER (PARTITION BY trw.cat_base_sum + irk.cat_base_id
                              ORDER BY trw.tot_value + irk.item_value DESC,
                                       trw.tot_price + irk.item_price) path_rnk
Revised Data Model

↑ A Compact Method for Representing Category Path

The revised data model uses a sequence for the unique identifier for the records in the new links table, and the SQL for the new and updated objects is given below.

CREATE TABLE items_ranked_base (
        item_id                 VARCHAR2(3),
        cat_base_id             INTEGER,
        item_price              INTEGER,
        item_value              INTEGER,
        min_items               INTEGER,
        max_items               INTEGER,
        min_remain              INTEGER,
        next_cat_base_id        INTEGER,
        item_rnk                INTEGER PRIMARY KEY,
        n_items                 INTEGER
)
ORGANIZATION INDEX
/
CREATE GLOBAL TEMPORARY TABLE paths_base (
        path_rnk                INTEGER,
        item_rnk                INTEGER,
        lev                     INTEGER,
        tot_price               INTEGER,
        tot_value               INTEGER,
        cat_base_id             INTEGER,
        next_cat_base_id        INTEGER,
        same_cats               INTEGER,
        min_items               INTEGER,
        cat_base_sum            INTEGER,
        path                    VARCHAR2(50)
)
ON COMMIT DELETE ROWS
/

Procedure pop_Items_Ranked_Base

↑ Category Path Compacting

The procedure pop_Items_Ranked is replaced by a new version using the integer identifiers:

PROCEDURE pop_Items_Ranked_Base IS
BEGIN
  DELETE items_ranked_based;
  INSERT INTO items_ranked_base
  WITH  cat_base_ids AS (
    SELECT id, Power (max_max_items + 1, cat_rnk - 2) cat_base_id
      FROM category_rsums_v
  )
  SELECT itm.id,
         cbi.cat_base_id,
         itm.item_price,
         itm.item_value,
         crs.min_items,
         crs.max_items,
         crs.min_remain,
         cbi_n.cat_base_id next_cat_base_id,
         Row_Number() OVER (ORDER BY crs.cat_rnk, itm.item_value DESC, itm.id),
         Count(*) OVER ()
    FROM items itm
    JOIN category_rsums_v crs
      ON crs.id = itm.category_id
    JOIN cat_base_ids cbi
      ON cbi.id = itm.category_id
    LEFT JOIN cat_base_ids cbi_n
      ON cbi_n.id = crs.next_cat_id;
END pop_Items_Ranked_Base;

Results

↑ Category Path Compacting
↓ Timings
↓ Execution Plans

Timings

↑ Results

Here are the times taken on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952:

Timer                      Elapsed         CPU
----------------------  ----------  ----------
Pop_Table_Iterate            80.87       78.89
Pop_Table_Iterate_Base       87.28       82.65

The new method has used slightly more time overall. The code timing line for the 10’th iteration were:

Timer                             Elapsed         CPU       Calls       Ela/Call       CPU/Call
-----------------------------  ----------  ----------  ----------  -------------  -------------
Insert paths 10 (389615)            35.84       35.50           1       35.83500       35.50000
Insert paths_Base 10 (389615)       42.01       39.00           1       42.00700       39.00000

The revised insert has taken a bit more elapsed time than the other one. Let’s see how the execution plan has changed.

Execution Plans

↑ Results
↓ Pop_Table_Iterate
↓ Pop_Table_Iterate_Base

Pop_Table_Iterate

↑ Execution Plans

Plan hash value: 2684864324
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                    |      1 |        |      0 |00:00:35.61 |     919K|       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL | PATHS              |      1 |        |      0 |00:00:35.61 |     919K|       |       |          |         |
|*  2 |   VIEW                   |                    |      1 |      1 |    389K|00:00:35.38 |     889K|       |       |          |         |
|   3 |    WINDOW SORT           |                    |      1 |      1 |    389K|00:00:35.20 |     889K|    82M|  3117K|   73M (0)|         |
|   4 |     NESTED LOOPS         |                    |      1 |      1 |    389K|00:00:07.07 |     889K|       |       |          |         |
|*  5 |      TABLE ACCESS FULL   | PATHS              |      1 |      1 |    864K|00:00:00.13 |   15142 |       |       |          |         |
|*  6 |      INDEX RANGE SCAN    | SYS_IOT_TOP_177213 |    864K|      1 |    389K|00:00:32.22 |     874K|       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("PATH_RNK"<=:B7 OR :B7=0))
5 - filter("TRW"."LEV"<:B1)
6 - access("IRK"."ITEM_RNK">="TRW"."ITEM_RNK"+1)
filter(("TRW"."TOT_PRICE"+"IRK"."ITEM_PRICE"+:B5<=:B4 AND "TRW"."TOT_VALUE"+"IRK"."ITEM_VALUE"+:B3>=:B2 AND
"IRK"."ITEM_RNK"<="IRK"."N_ITEMS"-(:B1-"TRW"."LEV"-1) AND "IRK"."MAX_ITEMS">=CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END  AND "IRK"."MIN_REMAIN"<=:B1-("TRW"."LEV"+1)+LEAST(CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END ,"IRK"."MIN_ITEMS") AND ("IRK"."CAT_ID"="TRW"."CAT_ID" OR "TRW"."SAME_CATS">="TRW"."MIN_ITEMS") AND
("IRK"."CAT_ID"="TRW"."CAT_ID" OR "IRK"."CAT_ID"=NVL("TRW"."NEXT_CAT_ID","IRK"."CAT_ID"))))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Pop_Table_Iterate_Base

↑ Execution Plans

Plan hash value: 2131628056
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                    |      1 |        |      0 |00:00:41.77 |     911K|       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL | PATHS_BASE         |      1 |        |      0 |00:00:41.77 |     911K|       |       |          |         |
|*  2 |   VIEW                   |                    |      1 |      1 |    389K|00:00:41.51 |     887K|       |       |          |         |
|   3 |    WINDOW SORT           |                    |      1 |      1 |    389K|00:00:41.32 |     887K|    68M|  2869K|   61M (0)|         |
|   4 |     NESTED LOOPS         |                    |      1 |      1 |    389K|00:00:08.29 |     887K|       |       |          |         |
|*  5 |      TABLE ACCESS FULL   | PATHS_BASE         |      1 |      1 |    864K|00:00:00.20 |   12680 |       |       |          |         |
|*  6 |      INDEX RANGE SCAN    | SYS_IOT_TOP_177215 |    864K|      1 |    389K|00:00:37.83 |     874K|       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("PATH_RNK"<=:B7 OR :B7=0))
5 - filter("TRW"."LEV"<:B1)
6 - access("IRK"."ITEM_RNK">="TRW"."ITEM_RNK"+1)
filter(("TRW"."TOT_PRICE"+"IRK"."ITEM_PRICE"+:B5<=:B4 AND "TRW"."TOT_VALUE"+"IRK"."ITEM_VALUE"+:B3>=:B2 AND
"IRK"."ITEM_RNK"<="IRK"."N_ITEMS"-(:B1-"TRW"."LEV"-1) AND "IRK"."MAX_ITEMS">=CASE "IRK"."CAT_BASE_ID" WHEN "TRW"."CAT_BASE_ID" THEN
"TRW"."SAME_CATS"+1 ELSE 1 END  AND "IRK"."MIN_REMAIN"<=:B1-("TRW"."LEV"+1)+LEAST(CASE "IRK"."CAT_BASE_ID" WHEN "TRW"."CAT_BASE_ID"
THEN "TRW"."SAME_CATS"+1 ELSE 1 END ,"IRK"."MIN_ITEMS") AND ("IRK"."CAT_BASE_ID"="TRW"."CAT_BASE_ID" OR
"TRW"."SAME_CATS">="TRW"."MIN_ITEMS") AND ("IRK"."CAT_BASE_ID"="TRW"."CAT_BASE_ID" OR
"IRK"."CAT_BASE_ID"=NVL("TRW"."NEXT_CAT_BASE_ID","IRK"."CAT_BASE_ID"))))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

The new plan shows the same steps and the same numbers of E-Rows and A-Rows at each step. The numbers of Buffers have changed slightly, and all the Mem values have reduced, the largest reducing from 82M to 68M, that is by about 17%.

Item Path Compacting

↑ 4 Tuning Memory Usage
↓ A Compact Method for Representing Item Path
↓ Procedure pop_Items_Ranked_Link
↓ View PATHS_LINK_RANKED_ITEM_V
↓ View PATHS_LINK_RANKED_PATH_V
↓ Results

A Compact Method for Representing Item Path

↑ Item Path Compacting

We would like to try to achieve a similar reduction in storage as with category paths for item paths, again through bypassing the concatenated path. However, this is more difficult for item path: The equivalent of the storage method used there would be to use a binary number for items of length equal to the number of items, and this would use much more storage than the original concatenated string.

Instead we have had to adopt a more complex approach, in which we create a new temporary table to represent paths as a chain of links from a current item to a prior item, using the integer rank as the item identifier. Initially we create this against the original data model with the concatenated category path, then subsequently we’ll try a combined model.

Revised Data Model

The revised data model uses a sequence number for the unique identifier for the records in the new links table, and the SQL for the new and updated objects is given below.

CREATE SEQUENCE iln_seq START WITH 1 CACHE 1000
/
CREATE GLOBAL TEMPORARY TABLE item_links (
  id                            INTEGER,
  item_rnk                      INTEGER,
  prior_iln_id                  INTEGER
)
ON COMMIT DELETE ROWS
/
CREATE TABLE items_ranked_link (
        cat_id                  VARCHAR2(5),
        item_price              INTEGER,
        item_value              INTEGER,
        min_items               INTEGER,
        max_items               INTEGER,
        min_remain              INTEGER,
        next_cat_id             VARCHAR2(5),
        item_rnk                INTEGER PRIMARY KEY
)
ORGANIZATION INDEX
/
CREATE GLOBAL TEMPORARY TABLE paths_link (
        path_rnk                INTEGER,
        item_rnk                INTEGER,
        lev                     INTEGER,
        tot_price               INTEGER,
        tot_value               INTEGER,
        cat_id                  VARCHAR2(5),
        next_cat_id             VARCHAR2(5),
        same_cats               INTEGER,
        min_items               INTEGER,
        cats_path               VARCHAR2(50),
        prior_iln_id            INTEGER,
        iln_id                  INTEGER
)
ON COMMIT DELETE ROWS
/

↑ Item Path Compacting

The procedure pop_Items_Ranked_Base is replaced by a new version using the integer item_rnk as item identifier:

PROCEDURE pop_Items_Ranked_Link IS
BEGIN
  DELETE items_ranked_link;
  INSERT INTO items_ranked_link
  SELECT itm.category_id,
         itm.item_price,
         itm.item_value,
         crs.min_items,
         crs.max_items,
         crs.min_remain,
         crs.next_cat_id,
         Row_Number() OVER (ORDER BY crs.cat_rnk, itm.item_value DESC, itm.id)
    FROM items itm
    JOIN category_rsums_v crs
      ON crs.id = itm.category_id;
  SELECT Count(*) INTO g_n_items FROM items;
END pop_Items_Ranked_Link;

↑ Item Path Compacting

The view PATHS_LINK_RANKED_ITEM_V uses a recursive query to retrieve the items in each of the top-ranked paths, using PATHS_LINK as the basis for the anchor branch, and joining to ITEM_LINKS in the recursive branch:

CREATE OR REPLACE VIEW paths_link_ranked_item_v AS
WITH paths_ranked AS (
    SELECT iln_id, item_rnk, prior_iln_id, tot_value, tot_price,
           Row_Number() OVER (ORDER BY tot_value DESC, tot_price) rnk
      FROM paths_link
), rsf (iln_id, item_rnk, prior_iln_id, lev, tot_value, tot_price, rnk) AS (
    SELECT iln_id, item_rnk, prior_iln_id, 0, tot_value, tot_price, rnk
      FROM paths_ranked
     WHERE rnk <= To_Number(SYS_Context('RECURSION_CTX', 'TOP_N'))
     UNION ALL
    SELECT iln.id, iln.item_rnk, iln.prior_iln_id, rsf.lev + 1,
           rsf.tot_value, rsf.tot_price, rsf.rnk
      FROM rsf
      JOIN item_links iln
        ON iln.id = rsf.prior_iln_id
     WHERE iln.id > 0
), item_map AS (
    SELECT itm.id item_id,
           itm.category_id,
           itm.item_name,
           itm.item_value,
           itm.item_price,
           Row_Number() OVER (ORDER BY crs.cat_rnk, itm.item_value DESC, itm.id) item_rnk
      FROM items itm
      JOIN category_rsums_v crs
        ON crs.id = itm.category_id
)
SELECT rsf.tot_value,
       rsf.tot_price,
       rsf.rnk,
       rsf.lev,
       imp.category_id,
       imp.item_id,
       imp.item_name,
       imp.item_value,
       imp.item_price
  FROM rsf
  JOIN items_ranked_link irk
    ON irk.item_rnk = rsf.item_rnk
  JOIN item_map imp
    ON imp.item_rnk = irk.item_rnk

↑ Item Path Compacting

The view PATHS_LINK_RANKED_PATH_V brings the results back to pahth level via the aggregated function ListAgg:

CREATE OR REPLACE VIEW paths_link_ranked_path_v AS
SELECT ListAgg(item_id) WITHIN GROUP (ORDER BY lev DESC) path,
       tot_value,
       tot_price,
       rnk
  FROM paths_link_ranked_item_v
 GROUP BY tot_value,
          tot_price,
          rnk

Results

↑ Item Path Compacting
↓ Timings
↓ Execution Plan

Timings

↑ Results

Here are the times taken on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952:

Timer                      Elapsed         CPU
----------------------  ----------  ----------
Pop_Table_Iterate            80.87       78.89
Pop_Table_Iterate_Base       87.28       82.65
Pop_Table_Iterate_Link       86.37       84.77

The new method has used slightly more time overall. The code timing line for the 10’th iteration were:

Timer                             Elapsed         CPU       Calls       Ela/Call       CPU/Call
-----------------------------  ----------  ----------  ----------  -------------  -------------
Insert paths 10 (389615)            35.84       35.50           1       35.83500       35.50000
Insert paths_Base 10 (389615)       42.01       39.00           1       42.00700       39.00000
Insert item_links  10 (864748)       0.31        0.30           1        0.31400        0.30000
Insert paths_Link 10 (389615)       36.63       36.31           1       36.63100       36.31000

The revised insert, with the new second insert, has taken about the same elapsed time as the first one. Let’s see how the execution plan has changed.

Execution Plan

↑ Results

Plan hash value: 441424830
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                    |      1 |        |      0 |00:00:36.39 |     903K|       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL | PATHS_LINK         |      1 |        |      0 |00:00:36.39 |     903K|       |       |          |         |
|   2 |   SEQUENCE               | ILN_SEQ            |      1 |        |    389K|00:00:36.13 |     879K|       |       |          |         |
|*  3 |    VIEW                  |                    |      1 |      1 |    389K|00:00:35.61 |     879K|       |       |          |         |
|   4 |     WINDOW SORT          |                    |      1 |      1 |    389K|00:00:35.41 |     879K|    68M|  2865K|   60M (0)|         |
|   5 |      NESTED LOOPS        |                    |      1 |      1 |    389K|00:00:07.89 |     879K|       |       |          |         |
|*  6 |       TABLE ACCESS FULL  | PATHS_LINK         |      1 |      1 |    864K|00:00:00.15 |   12088 |       |       |          |         |
|*  7 |       INDEX RANGE SCAN   | SYS_IOT_TOP_177217 |    864K|      1 |    389K|00:00:32.40 |     867K|       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(("PATH_RNK"<=:B8 OR :B8=0))
6 - filter("TRW"."LEV"<:B1)
7 - access("IRK"."ITEM_RNK">="TRW"."ITEM_RNK"+1 AND "IRK"."ITEM_RNK"<=:B2-(:B1-"TRW"."LEV"-1))
filter(("TRW"."TOT_PRICE"+"IRK"."ITEM_PRICE"+:B6<=:B5 AND "TRW"."TOT_VALUE"+"IRK"."ITEM_VALUE"+:B4>=:B3 AND
"IRK"."MAX_ITEMS">=CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN "TRW"."SAME_CATS"+1 ELSE 1 END  AND
"IRK"."MIN_REMAIN"<=:B1-("TRW"."LEV"+1)+LEAST(CASE "IRK"."CAT_ID" WHEN "TRW"."CAT_ID" THEN "TRW"."SAME_CATS"+1 ELSE 1 END
,"IRK"."MIN_ITEMS") AND ("IRK"."CAT_ID"="TRW"."CAT_ID" OR "TRW"."SAME_CATS">="TRW"."MIN_ITEMS") AND ("IRK"."CAT_ID"="TRW"."CAT_ID" OR
"IRK"."CAT_ID"=NVL("TRW"."NEXT_CAT_ID","IRK"."CAT_ID"))))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

The new plan shows the same steps and the same numbers of E-Rows and A-Rows at each step as for the original Pop_Table_Iterate. The numbers of Buffers have changed slightly, and all the Mem values have reduced, the largest reducing from 82M to 68M, that is by about 17%.

The changes are similar to those for Pop_Table_Iterate_Base.

Item and Category Path Compacting Combined

↑ 4 Tuning Memory Usage
↓ Affected Objects List
↓ Results

The final compacting method simply combines the changes made for the first two, so we will not describe them in detail, but just list the affected objects.

Affected Objects List

↑ Item and Category Path Compacting Combined

Tables
  • PATHS_BASE_LINK
  • ITEM_BASE_LINKS
  • ITEMS_RANKED_BASE_LINK
Views
  • PATHS_BASE_LINK_RANKED_ITEM_V
  • PATHS_BASE_LINK_RANKED_PATH_V
Procedures
  • pop_Items_Ranked_Base_Link
  • init_Base_Link
  • insert_Init_Path_Base_Link
  • insert_Paths_Base_Link
  • Pop_Table_Iterate_Base_Link

Results

↑ Item and Category Path Compacting Combined
↓ Timings
↓ Execution Plan

Timings

↑ Results

Here are the times taken on the England dataset with KEEP_NUM = 0 and MIN_VALUE = 1952:

Timer                      Elapsed         CPU
----------------------  ----------  ----------
Pop_Table_Iterate            80.87       78.89
Pop_Table_Iterate_Base       87.28       82.65
Pop_Table_Iterate_Link       86.37       84.77
Pop_Table_Iterate_Base_Link  84.90       83.68

The new method has used slightly more time overall. The code timing line for the 10’th iteration were:

Timer                                   Elapsed         CPU       Calls       Ela/Call       CPU/Call
-----------------------------------  ----------  ----------  ----------  -------------  -------------
Insert paths 10 (389615)                  35.84       35.50           1       35.83500       35.50000
Insert paths_Base 10 (389615)             42.01       39.00           1       42.00700       39.00000
Insert item_links  10 (864748)             0.31        0.30           1        0.31400        0.30000
Insert paths_Link 10 (389615)             36.63       36.31           1       36.63100       36.31000
Insert item_base_links  10 (864748)        0.30        0.30           1        0.29500        0.30000
Insert paths_base_Link 10 (389615)        37.15       36.83           1       37.14800       36.83000

The revised insert has taken about the same elapsed time as the third. Let’s see how the execution plan has changed.

Execution Plan

↑ Results

Plan hash value: 66749355
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |                    |      1 |        |      0 |00:00:36.92 |     895K|       |       |          |         |
|   1 |  LOAD TABLE CONVENTIONAL | PATHS_BASE_LINK    |      1 |        |      0 |00:00:36.92 |     895K|       |       |          |         |
|   2 |   SEQUENCE               | IBL_SEQ            |      1 |        |    389K|00:00:36.66 |     877K|       |       |          |         |
|*  3 |    VIEW                  |                    |      1 |      1 |    389K|00:00:36.10 |     877K|       |       |          |         |
|   4 |     WINDOW SORT          |                    |      1 |      1 |    389K|00:00:35.92 |     877K|    55M|  2592K|   48M (0)|         |
|   5 |      NESTED LOOPS        |                    |      1 |      1 |    389K|00:00:08.31 |     877K|       |       |          |         |
|*  6 |       TABLE ACCESS FULL  | PATHS_BASE_LINK    |      1 |      1 |    864K|00:00:00.13 |    9606 |       |       |          |         |
|*  7 |       INDEX RANGE SCAN   | SYS_IOT_TOP_177219 |    864K|      1 |    389K|00:00:33.01 |     867K|       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(("PATH_RNK"<=:B8 OR :B8=0))
6 - filter("TRW"."LEV"<:B1)
7 - access("IRK"."ITEM_RNK">="TRW"."ITEM_RNK"+1 AND "IRK"."ITEM_RNK"<=:B2-(:B1-"TRW"."LEV"-1))
filter(("TRW"."TOT_PRICE"+"IRK"."ITEM_PRICE"+:B6<=:B5 AND "TRW"."TOT_VALUE"+"IRK"."ITEM_VALUE"+:B4>=:B3 AND
"IRK"."MAX_ITEMS">=CASE "IRK"."CAT_BASE_ID" WHEN "TRW"."CAT_BASE_ID" THEN "TRW"."SAME_CATS"+1 ELSE 1 END  AND
"IRK"."MIN_REMAIN"<=:B1-("TRW"."LEV"+1)+LEAST(CASE "IRK"."CAT_BASE_ID" WHEN "TRW"."CAT_BASE_ID" THEN "TRW"."SAME_CATS"+1 ELSE 1 END
,"IRK"."MIN_ITEMS") AND ("IRK"."CAT_BASE_ID"="TRW"."CAT_BASE_ID" OR "TRW"."SAME_CATS">="TRW"."MIN_ITEMS") AND
("IRK"."CAT_BASE_ID"="TRW"."CAT_BASE_ID" OR "IRK"."CAT_BASE_ID"=NVL("TRW"."NEXT_CAT_BASE_ID","IRK"."CAT_BASE_ID"))))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

The new plan shows the same steps and the same numbers of E-Rows and A-Rows at each step as for the original Pop_Table_Iterate. The numbers of Buffers have changed slightly, and all the Mem values have reduced, the largest reducing from 82M to 55M, that is by about 33%.

Results Comparison

↑ 4 Tuning Memory Usage

We saw above that the elapsed time for the insert(s) at iteration 10 (with the largest row count) was about 36-37 seconds for all variants, except for Pop_Table_Iterate_Base, which took 42 seconds, so the compacting methods have not made a significant difference in timing. The execution plans were also very similar, with the two variants using item path compacting having an extra SEQUENCE step before the VIEW step. Only the WINDOW SORT step showed significant differences.

Here is a table extracting the right-most execution plan columns for the WINDOW SORT, using the paths table name to identify the variant:

------------------------------------------------------------------
| Paths Table     | Buffers |  OMem |  1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------
| PATHS           |     889K|    82M|  3117K|   73M (0)|         |
| PATHS_BASE      |     887K|    68M|  2869K|   61M (0)|         |
| PATHS_LINK      |     879K|    68M|  2865K|   60M (0)|         |
| PATHS_BASE_LINK |     877K|    55M|  2592K|   48M (0)|         |
------------------------------------------------------------------

The table shows that the variants with category path and item path compacted separately both show reductions in memory used, of about 17% for 0Mem. Combining the compacting methods increases the reduction to about 33%, suggesting the reductions are additive. The variants using item path compacting also show a small reduction in buffers.

As the path compacting variant solution methods did not reduce execution times, we’ll stick with the simpler original path concatenation approach in the next section.

5 Iterative Refinement Algorithms

↑ Contents
↓ Flow Diagram
↓ Iterative Refinement - Using Pop_Table_Iterate
↓ Iterative Refinement - Using Array_Recurse
↓ Iterative Refinement - Using RSF_IRK_IRS_TABS_V

The concept of iterative refinement algorithms was introduced in the third article in this series, OPICO 3: Algorithms for Item/Category Optimization: Using a combination of Value Rank Filtering and Value Bound Filtering techniques we can generate approximative solutions quickly, and refine them within a loop where the level of approximation is reduced steadily, to get the optimal solution set more quickly than with a one-level algorithm.

This section describes three implementations in PL/SQL, all using the same outer level iterative refinement algorithm, with the inner level solution method being each of:

  • Table / Iterate (Pop_Table_Iterate)
  • Array / Recurse (Array_Recurse)
  • Recursive SQL with pre-populated temporary tables for IRK and IRS, described in the previous article (RSF_IRK_IRS_TABS_V)

The code for each implementation is shown, results are given for the Brazil and England datasets, and a code timing report is included for the England dataset that breaks down times by iteration and step, in each case.

Flow Diagram

↑ 5 Iterative Refinement Algorithms

Iterative Refinement - Using Pop_Table_Iterate

↑ 5 Iterative Refinement Algorithms
↓ Code
↓ Results

The inner level solution method here was the best performing one-level method on the larger, England dataset.

Code

↑ Iterative Refinement - Using Pop_Table_Iterate
↓ PATHS Table Population and Query
↓ Procedure Iteratively_Refine_Iterate

PATHS Table Population and Query

↑ Code

For this solution method the PATHS temporary table is populated in a PL/SQL block, then the view that ranks the solutions is queried.

BEGIN
  Item_Cat_Seqs.Iteratively_Refine_Iterate(p_keep_start      => &KEEP_START,
                                           p_keep_factor     => &KEEP_FACTOR);
END;
/
PROMPT Iteratively_Refine_Iterate - path level
SELECT path,
       tot_value,
       tot_price,
       rnk
  FROM paths_ranked_v
 WHERE rnk <= :TOP_N
 ORDER BY rnk, tot_price
/
Procedure Iteratively_Refine_Iterate

↑ Code
The call to the procedure Pop_Table_Iterate has a boolean parameter p_do_pop_temp that is passed as TRUE only on the first iteration to ensure that the temporary tables are only populated once.

PROCEDURE Iteratively_Refine_Iterate(
            p_keep_start                   PLS_INTEGER,
            p_keep_factor                  PLS_INTEGER) IS
  l_top_n         PLS_INTEGER := recursion_Context('TOP_N');
  l_do_pop_temp   BOOLEAN := TRUE;
BEGIN
  g_min_value := 0;
  g_keep_num := p_keep_start;
  FOR i IN 1..3 LOOP
    Pop_Table_Iterate(p_keep_num      => g_keep_num,
                      p_min_value     => g_min_value,
                      p_do_pop_temp   => l_do_pop_temp);
    l_do_pop_temp := FALSE;
    BEGIN
      WITH ranking AS (
          SELECT tot_value,
                 Row_Number() OVER (ORDER BY tot_value DESC, tot_price) rnk
            FROM paths_Link
      )
      SELECT tot_value
        INTO g_min_value
        FROM ranking
       WHERE rnk = l_top_n;
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        g_min_value := 0;
    END;
    g_keep_num := g_keep_num * p_keep_factor;
    EXIT WHEN g_min_value > 0;
  END LOOP;
  Pop_Table_Iterate(p_keep_num      => g_keep_num,
                    p_min_value     => g_min_value,
                    p_do_pop_temp   => FALSE);
END Iteratively_Refine_Iterate;

Results

↑ Iterative Refinement - Using Pop_Table_Iterate
↓ Datasets Summary
↓ Code Timing - England Dataset

Datasets Summary

↑ Results

Both datasets were run with a KEEP_FACTOR value of 3, and exited the loop after one iteration with a MIN_VALUE found, then a final iteration was run using the KEEP_NUM of KEEP_START * 3. The table below shows the total execution times in seconds.

Dataset                       Elapsed         CPU
-------------------------  ----------  ----------
Brazil (KEEP_START = 10)          0.8         0.8
England (KEEP_START = 50)         2.3         2.2

The solution set was verified as optimal.

The combination of the Value Rank Filtering and Value Bound Filtering techniques within the iterative refinement algorithm has produced extremely large performance improvements, especially on the larger dataset.

Code Timing - England Dataset

↑ Results

Timer Set: Iteratively_Refine_Iterate, Constructed at 11 Jun 2024 03:35:40, written at 03:35:42
===============================================================================================
Timer                                              Elapsed         CPU       Calls       Ela/Call       CPU/Call
----------------------------------------------  ----------  ----------  ----------  -------------  -------------
PTI 1: MIN_VALUE / KEEP_NUM = 0 / 50                  1.44        1.36           1        1.43800        1.36000
SELECT 1: MIN_VALUE / KEEP_NUM = 1952 / 50            0.00        0.00           1        0.00200        0.00000
PTI (final): MIN_VALUE / KEEP_NUM = 1952 / 150        0.72        0.72           1        0.71800        0.72000
(Other)                                               0.00        0.00           1        0.00000        0.00000
----------------------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                                 2.16        2.08           4        0.53950        0.52000
----------------------------------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.00877, CPU: 0.00877]

The code timing table shows that the initial iteration, with MIN_VALUE = 0 and KEEP_NUM = 50 takes 1.4 seconds, and finds a new MIN_VALUE of 1952, while the final iteration with MIN_VALUE = 1952 and KEEP_NUM = 150 takes 0.7 seconds.

The difference in times between the two iterations is due to the different values of the filtering parameters: The positive MIN_VALUE on the second allows faster execution even with the KEEP_NUM parameter 3 times larger.

Iterative Refinement - Using Array_Recurse

↑ 5 Iterative Refinement Algorithms
↓ Code
↓ Results

The inner level solution method here was the worst performing one-level PL/SQL-driven method on the larger, England dataset.

Code

↑ Iterative Refinement - Using Array_Recurse
↓ View ITERATIVELY_REFINE_RECURSE_V
↓ Initialization and Query
↓ Procedure Init_Loop
↓ Function Iteratively_Refine_Recurse

View ITERATIVELY_REFINE_RECURSE_V

↑ Code

This view returns the records from the PL/SQL function Iteratively_Refine_Recurse, with a ranking function.

CREATE OR REPLACE VIEW iteratively_refine_recurse_v AS
SELECT path,
       tot_value,
       tot_price,
       Row_Number() OVER (ORDER BY tot_value DESC, tot_price) rnk
  FROM TABLE(Item_Cat_Seqs.Iteratively_Refine_Recurse)
/
Initialization and Query

↑ Code

Since the view that ranks the solutions calls the PL/SQL function Iteratively_Refine_Recurse, we need to populate the ITEMS_RANKED temporary table in advance, to avoid the Oracle error:

ORA-14551: cannot perform a DML operation inside a query

The Init_Loop procedure is therefore called in a PL/SQL block to do this, as well as some other initialization steps, before querying the view.

BEGIN
     Item_Cat_Seqs.Init_Loop(p_keep_start      => &KEEP_START,
                             p_keep_factor     => &KEEP_FACTOR);
END;
/
PROMPT iteratively_refine_recurse_v - path level
SELECT path,
       tot_value,
       tot_price,
       rnk
  FROM iteratively_refine_recurse_v
 WHERE rnk <= :TOP_N
ORDER BY rnk, tot_price
/
Procedure Init_Loop

↑ Code

PROCEDURE Init_Loop(
            p_keep_start                   PLS_INTEGER,
            p_keep_factor                  PLS_INTEGER) IS
BEGIN
  g_max_price   := recursion_Context('MAX_PRICE');
  g_seq_size    := recursion_Context('SEQ_SIZE');
  g_keep_start  := p_keep_start;
  g_keep_factor := p_keep_factor;
  pop_Item_Running_Sums;
  pop_Items_Ranked;
END Init_Loop;
Function Iteratively_Refine_Recurse

↑ Code

FUNCTION Iteratively_Refine_Recurse
            RETURN                         paths_arr PIPELINED IS
  l_n_rows                      PLS_INTEGER := 0;
  l_top_n                       PLS_INTEGER := recursion_Context('TOP_N');
BEGIN
  g_min_value := 0;
  g_keep_num := g_keep_start;
  FOR i IN 1..3 LOOP
    set_Globals(p_keep_num      => g_keep_num,
                p_min_value     => g_min_value);
    BEGIN
      SELECT tot_value
        INTO g_min_value
        FROM array_recurse_v
       WHERE rnk = l_top_n;
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        g_min_value := 0;
    END;
    g_keep_num := g_keep_num * g_keep_factor;
    EXIT WHEN g_min_value > 0;
  END LOOP;
  set_Globals(p_keep_num      => g_keep_num,
              p_min_value     => g_min_value);
  FOR rec IN (SELECT *
                FROM TABLE(Item_Cat_Seqs.Array_Recurse)
             ) LOOP
    PIPE ROW (rec);
    l_n_rows := l_n_rows + 1;
  END LOOP;
END Iteratively_Refine_Recurse;

Results

↑ Iterative Refinement - Using Array_Recurse
↓ Datasets Summary
↓ Code Timing - England Dataset

Datasets Summary

↑ Results

Both datasets were run with a KEEP_FACTOR value of 3, and exited the loop after one iteration with a MIN_VALUE found, then a final iteration was run using the KEEP_NUM of KEEP_START * 3. The table below shows the total execution times in seconds.

Dataset                       Elapsed         CPU
-------------------------  ----------  ----------
Brazil (KEEP_START = 10)          0.8         0.8
England (KEEP_START = 50)         2.8         2.7

The solution set was verified as optimal.

The combination of the Value Rank Filtering and Value Bound Filtering techniques within the iterative refinement algorithm has produced extremely large performance improvements, especially on the larger dataset.

Code Timing - England Dataset

↑ Results

Timer Set: Iteratively_Refine_Recurse, Constructed at 11 Jun 2024 03:35:37, written at 03:35:40
===============================================================================================
Timer                                               Elapsed         CPU       Calls       Ela/Call       CPU/Call
-----------------------------------------------  ----------  ----------  ----------  -------------  -------------
set_Globals 1: MIN_VALUE / KEEP_NUM = 0 / 50           0.00        0.00           1        0.00000        0.00000
SELECT 1: MIN_VALUE / KEEP_NUM = 1952 / 50             1.70        1.70           1        1.69800        1.70000
set_Globals: MIN_VALUE / KEEP_NUM = 1952 / 150         0.00        0.00           1        0.00000        0.00000
Pipe 21 rows: MIN_VALUE / KEEP_NUM = 1952 / 150        0.93        0.94           1        0.92500        0.94000
(Other)                                                0.00        0.00           1        0.00000        0.00000
-----------------------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                                  2.62        2.64           5        0.52460        0.52800
-----------------------------------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.00877, CPU: 0.00789]

The code timing table shows that the initial iteration, with MIN_VALUE = 0 and KEEP_NUM = 50 takes 1.8 seconds, and finds a new MIN_VALUE of 1952, while the final iteration with MIN_VALUE = 1952 and KEEP_NUM = 150 takes 0.9 seconds.

The difference in times between the two iterations is due to the different values of the filtering parameters: The positive MIN_VALUE on the second allows faster execution even with the KEEP_NUM parameter 3 times larger.

Iterative Refinement - Using RSF_IRK_IRS_TABS_V

↑ 5 Iterative Refinement Algorithms
↓ Code
↓ Results

The inner level solution method here was the best performing one-level recursive SQL method on the larger, England dataset.

Code

↑ Iterative Refinement - Using RSF_IRK_IRS_TABS_V
↓ View ITERATIVELY_REFINE_RSF_V
↓ Initialization and Query
↓ Function Iteratively_Refine_RSF

View ITERATIVELY_REFINE_RSF_V

↑ Code

This view returns the records from the PL/SQL function Iteratively_Refine_RSF, with a ranking function.

CREATE OR REPLACE VIEW iteratively_refine_rsf_v AS
SELECT path,
       tot_value,
       tot_price,
       Row_Number() OVER (ORDER BY tot_value DESC, tot_price) rnk
  FROM TABLE(Item_Cat_Seqs.Iteratively_Refine_RSF)
/
Initialization and Query

↑ Code

Since the view that ranks the solutions calls the PL/SQL function Iteratively_Refine_RSF, we need to populate the ITEMS_RANKED temporary table in advance, as for Iteratively_Refine_Recurse, using the Init_Loop procedure.

BEGIN
     Item_Cat_Seqs.Init_Loop(p_keep_start      => &KEEP_START,
                             p_keep_factor     => &KEEP_FACTOR);
END;
/
PROMPT iteratively_refine_rsf_v - path level
SELECT path,
       tot_value,
       tot_price,
       rnk
  FROM iteratively_refine_rsf_v
 WHERE rnk <= :TOP_N
ORDER BY rnk, tot_price
/
Function Iteratively_Refine_RSF

↑ Code

FUNCTION Iteratively_Refine_RSF
            RETURN                         paths_arr PIPELINED IS
  l_n_rows                      PLS_INTEGER := 0;
  l_top_n                       PLS_INTEGER := recursion_Context('TOP_N');
  l_rec                         paths%ROWTYPE;
BEGIN
  g_min_value := 0;
  g_keep_num := g_keep_start;
  FOR i IN 1..3 LOOP
    Set_Contexts(p_keep_num      => g_keep_num,
                 p_min_value     => g_min_value);
    BEGIN
      SELECT tot_value
        INTO g_min_value
        FROM rsf_irk_irs_tabs_v
       WHERE rnk = l_top_n;
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        g_min_value := 0;
    END;
    g_keep_num := g_keep_num * g_keep_factor;
    EXIT WHEN g_min_value > 0;
  END LOOP;
  Set_Contexts(p_keep_num      => g_keep_num,
               p_min_value     => g_min_value);
  FOR rec IN (SELECT *
                FROM rsf_irk_irs_tabs_v
             ) LOOP
    l_rec.path := rec.path;
    l_rec.tot_value := rec.tot_value;
    l_rec.tot_price := rec.tot_price;
    PIPE ROW (l_rec);
    l_n_rows := l_n_rows + 1;
  END LOOP;
END Iteratively_Refine_RSF;

Results

↑ Iterative Refinement - Using RSF_IRK_IRS_TABS_V
↓ Datasets Summary
↓ Code Timing - England Dataset

Datasets Summary

↑ Results

Both datasets were run with a KEEP_FACTOR value of 3, and exited the loop after one iteration with a MIN_VALUE found, then a final iteration was run using the KEEP_NUM of KEEP_START * 3. The table below shows the total execution times in seconds.

Dataset                       Elapsed         CPU
-------------------------  ----------  ----------
Brazil (KEEP_START = 10)          0.5         0.4
England (KEEP_START = 50)        10.5        10.4

The solution set was verified as optimal.

The combination of the Value Rank Filtering and Value Bound Filtering techniques within the iterative refinement algorithm has produced extremely large performance improvements, especially on the larger dataset.

The run time is, however, considerably larger than for the PL/SQL-driven base methods.

Code Timing - England Dataset

↑ Results

Timer Set: Iteratively_Refine_RSF, Constructed at 11 Jun 2024 03:35:42, written at 03:35:53
===========================================================================================
Timer                                               Elapsed         CPU       Calls       Ela/Call       CPU/Call
-----------------------------------------------  ----------  ----------  ----------  -------------  -------------
SELECT 1: MIN_VALUE / KEEP_NUM = 1952 / 50             8.55        8.47           1        8.54800        8.47000
Set_Contexts: MIN_VALUE / KEEP_NUM = 1952 / 150        0.00        0.00           1        0.00000        0.00000
Pipe 20 rows: MIN_VALUE / KEEP_NUM = 1952 / 150        1.62        1.61           1        1.62100        1.61000
(Other)                                                0.22        0.21           1        0.21800        0.21000
-----------------------------------------------  ----------  ----------  ----------  -------------  -------------
Total                                                 10.39       10.29           4        2.59675        2.57250
-----------------------------------------------  ----------  ----------  ----------  -------------  -------------
[Timer timed (per call in ms): Elapsed: 0.00877, CPU: 0.00789]

The code timing table shows that the initial iteration, with MIN_VALUE = 0 and KEEP_NUM = 50 takes 8.6 seconds, and finds a new MIN_VALUE of 1952, while the final iteration with MIN_VALUE = 1952 and KEEP_NUM = 150 takes 1.6 seconds.

The difference in times between the two iterations is due to the different values of the filtering parameters: The positive MIN_VALUE on the second allows faster execution even with the KEEP_NUM parameter 3 times larger.

6 Conclusion

↑ Contents
↓ Results
↓ Lessons Learned

To conclude this article, we will summarise the results, first for the four base PL/SQL-driven solution methods developed, then for two of them used within the iterative refinement scheme, and we’ll include the fastest recursive SQL method from the previous article for comparison; then we will note some lessons learned.

Results

↑ 6 Conclusion
↓ Base Solution Methods
↓ Iterative Refinement Solution Methods

Base Solution Methods

↑ Results

This table shows the different solution methods tested, and assigns a short code to each one for convenience.

Code Procedure / Function Table / Array Recurse / Iterate
PTR Pop_Table_Recurse Table Recurse
PTI Pop_Table_Iterate Table Iterate
ARE Array_Recurse Array Recurse
AIT Array_Iterate Array Iterate

For comparison we will include the fastest recursive SQL solution method in the timings listed.

Code View IRK Access IRS Access IRK/TRW-Conditions IRS/TRW-Conditions
IIT RSF_IRK_IRS_TABS_V - temporary tables for IRK and IRS Table Table Inline Inline

Timings in the right hand columns are in seconds.

Brazil

KEEP_NUM MIN_VALUE Solution Set PTR PTI ARE AIT IIT
10 0 B-B (suboptimal) 0.4 0.4 0.4 0.4 0.1
100 0 B-A (optimal) 0.6 0.6 0.6 0.6 1.1
100 10748 B-A (optimal) 0.4 0.4 0.4 0.4 0.1
0 10748 B-A (optimal) 0.4 0.4 0.4 0.4 0.1

England

KEEP_NUM MIN_VALUE Solution Set PTR PTI ARE AIT IIT
50 0 E-B (suboptimal) 1 1 2 2 9
300 0 E-A (optimal) 7 7 9 9 72
300 1952 E-A (optimal) 1 1 1 1 3
0 1952 E-A (optimal) 83 81 140 116 214

Iterative Refinement Solution Methods

↑ Results

Two of the base PL/SQL methods, along with the best recursive SQL method, were used within the iterative refinement scheme, with a single pair of parameter values each, as shown below.

Timings in the right hand columns are in seconds.

Dataset KEEP_START KEEP_FACTOR Solution Set ARE PTI IIT
Brazil 10 3 B-A (optimal) 0.8 0.8 0.5
England 50 3 E-A (optimal) 2.8 2.3 10.5

Lessons Learned

↑ 6 Conclusion

We can summarise the principle points we have learned in relation to solving these problems using PL/SQL-driven solution methods, with the Value Filtering techniques:

  • All variants of the PL/SQL-driven solution methods are much faster than the fastest recursive SQL method on the larger, England, dataset, across all tested pairs of the filtering parameters
  • All methods return results in less than a second on the smaller, Brazil, dataset; remember that, without our Value Filtering techniques, we could not solve for even the smaller dataset at all
  • For these problems, the solution methods using a temporary table for the intermediate paths proved significantly faster than those using arrays
  • Recursion seemed to be slightly slower than iteration on both array and table based solution methods
  • Applying the filtering techniques within an Iterative Refinement algorithm has allowed us to solve for the exact optimal solution set on the larger dataset in just a couple of seconds
  • Finding more efficient storage methods for paths reduced reported memory usage, although not run times

In the next article, we discuss how to verify correctness of solution methods for algorithmic problems of this kind, using a number of approaches, including unit testing.