Query Structure Diagramming
Last bank holiday Monday I posted a solution to an SQL problem on OTN, and I later thought that the SQL would make a nice example to illustrate my Query Structure Diagramming (QSD) technique. I published my first example of this in May 2009 on scribd,
A Structured Approach to SQL Query Design by Brendan Furey
and have continued to develop it in subsequent articles. I use the technique here to illustrate the SQL structure for the OTN example mentioned and also for a second OTN example that I posted shortly after. Both examples structure the queries using subquery factors.
SQL Subquery Factors
Subquery factors were introduced in Oracle Database v9.2 and have since become a key technique in developing queries of any complexity. They generalise the v8 inline view technique, allowing subqueries now to be declared with an alias (using the ‘WITH’ clause), then referenced as often as desired later in the query. When referenced multiple times, Oracle’s Cost Based Optimiser (CBO) normally executes the subquery once and writes the results to temporary space to aid performance (this can be seen in the Explain Plan as a LOAD AS SELECT action). Using subquery factors can make queries much easier to read, even when they are referenced only once, in which case CBO normally restructures them internally to incorporate them within another subquery or the main query. It is important to note, though, that subquery factors, when retained by CBO, will be joined by full scans when referenced later in the query and in some cases it is more efficient to retain the table references to allow indexed joins (my next blog post will include an example of this).
Subquery factors, along with inline views, are the building blocks of modern SQL, as subroutines are of other languages. My QSDs are intended to show how they allow a procedural flow at the structural level, while retaining the set-based logic within the subqueries.
Leave and Attendance Query
The problem here is that the poster has a daily attendance table and a leave table, where leave is stored as date ranges, but wants a single query that outputs data in daily form. The keys to this are:
- Realising that you cannot drive from the event tables but must generate a continuous set of days to drive from, for each employee (assuming you don’t have them in a separate reference table)
- Converting the leave ranges to leave days by joining to the generated days rowset
- Joining the leave daily rowset with the attendance table by a union
The original post, “Attendance and Leave table Join”, is no longer available [13 July 2025].
ERD
Note that tables were not provided for Employee and Day in the OTN post, but it is useful to include them as entities nonetheless.
SQL
Note that in the query below, both the date range and the employee set are generated from the transactional data, which is obviously an artificial feature arising from this being for a problem on a forum, but it’s no harm in terms of the purpose of this article.
WITH ext AS (
SELECT Min (att_date) min_date, Max (att_date) - Min (att_date) + 1 n_days
FROM attendance
), dys AS (
SELECT min_date + LEVEL - 1 day
FROM ext
CONNECT BY LEVEL < n_days + 1
), ems AS (
SELECT emp_id
FROM attendance
UNION
SELECT employee_number
FROM leave
), edy AS (
SELECT
dys.day,
ems.emp_id
FROM dys
CROSS JOIN ems
), ldy AS (
SELECT
edy.emp_id,
edy.day,
lve.leave_reason
FROM edy
JOIN leave lve
ON edy.day BETWEEN lve.date_start AND lve.date_end
AND lve.employee_number = edy.emp_id
), uni AS (
SELECT
emp_id,
att_date,
timein,
timeout,
late_in,
early_out,
reason
FROM attendance
UNION
SELECT
emp_id,
day,
NULL,
NULL,
NULL,
NULL,
leave_reason
FROM ldy
)
SELECT
edy.emp_id,
edy.day,
uni.timein,
uni.timeout,
uni.late_in,
uni.early_out,
uni.reason
FROM edy
LEFT JOIN uni
ON uni.att_date = edy.day
AND uni.emp_id = edy.emp_id
ORDER BY 1, 2
QSD
Counting Flight Statistics Query
The problem here is that the poster has three tables with data on events for frequent fliers and wants to show aggregate counts by year, but wants all years within a range to be included in the output, including years with no events. The key to this is realising that you cannot drive from the event tables but must generate a continuous set of years to drive from (assuming you don’t have them in a separate reference table). The original post, “Multiple Count aggregates from different sources grouped by Year”, is no longer available [13 July 2025].
ERD
Note that a table was not provided for Year in the OTN post, but it is useful to include it as an entity nonetheless.
SQL
WITH yrs AS (
SELECT Add_Months (To_Date ('01011989', 'ddmmyyyy'), 12*LEVEL) YEAR
FROM DUAL
CONNECT BY LEVEL < 24
), ffe AS (
SELECT Trunc (start_date, 'YEAR') year, Count(*) n_enr
FROM freq_flyer_enrollment
GROUP BY Trunc (start_date, 'YEAR')
), ffs AS (
SELECT Trunc (survey_date, 'YEAR') year, Count(*) n_sur
FROM freq_flyer_survey
GROUP BY Trunc (survey_date, 'YEAR')
), fff AS (
SELECT Trunc (flt_date, 'YEAR') year, Count(*) n_fly
FROM freq_flyer_flights
GROUP BY Trunc (flt_date, 'YEAR')
)
SELECT yrs.year,
Nvl (ffe.n_enr, 0) n_enr,
Nvl (ffs.n_sur, 0) n_sur,
Nvl (fff.n_fly, 0) n_fly
FROM yrs
LEFT JOIN ffe
ON ffe.year = yrs.year
LEFT JOIN ffs
ON ffs.year = yrs.year
LEFT JOIN fff
ON fff.year = yrs.year
ORDER BY 1