The objective of this SQL Server tutorial is to teach you how to use CUBE to aggregate different groups and the complete details about them.
What is CUBE in SQL Server?
CUBE is an extension of GROUPING SETS. It is almost the same as ROLLUP with one subtle difference. The difference is that it does an exhaustive grouping of the specified columns using all possible permutations.
For e.g. if we specify 3 table columns or fields (X, Y, Z) in rollup it will group as below. The total number of groups created and reported will be n+1 where n is the number of columns specified in the ROLLUP clause.
(X, Y, Z), (X, Y), (X), ()
But CUBE will group as below. The total number of groups created and reported in the resultset will be 2n where n is the number of columns specified in the CUBE clause.
(X, Y, Z), (X, Z), (X, Y), (X), (Y), ()
So, we can see it produces a larger resultset containing more rows of information.
SQL Server CUBE Syntax
The basic syntax of the CUBE is given below.
SELECT
column1, column2, aggregate_function (column 3)
FROM
table
GROUP BY CUBE (column1, column2);
In this syntax,
column – a column from the table in the SELECT list.
aggregate_function (column) – column on which aggregate function (i.e. SUM, COUNT, AVG etc.) is used.
GROUP BY – SQL keyword combination to specify a column or multiple columns to create a group on which the to apply the aggregate.
CUBE – SQL keyword combination which creates an exhaustive grouping of the specified columns and aggregates them.
SQL Server CUBE with Example
Let us try to understand when and how to use the CUBE operator in a SQL query. Suppose we have an employee table containing the information about employees in the company. The table is represented below. We will use this sample table for reference in our examples.
id
name
gender
salary
department
1
David Jackson
Male
5000
IT
2
Jim Jameson
Female
6000
HR
3
Kate Johnson
Female
7500
IT
4
Will Ray
Male
6500
Marketing
5
Shane Mathews
Female
5500
Finance
6
Shed Price
Male
8000
Marketing
7
Viktor Smith
Male
7200
HR
8
Vincent Smithson
Female
6600
IT
9
Janice Streep
Female
5400
Marketing
10
Laura Wells
Female
6300
Finance
11
Mac Bull
Male
5700
Marketing
12
Patrick Patterson
Male
7000
HR
13
Julie Orbison
Female
7100
IT
14
Elice Hemingway
Female
6800
Marketing
15
Wayne Johnson
Male
5000
Finance
Table: employee
Let us start by doing a ROLLUP on department and gender for employee salary. The following query does the same. The sum of employee salaries is aliased as ‘employee cost’.
SELECT
department, gender, sum(salary) AS 'employee cost'
FROM employee
GROUP BY
ROLLUP (department, gender);
The resultset of the above query is the following. It has 13 rows of information or records. They include the output of the following set of groups – (department, gender) with 8 records, (department) with 4 records and () 1 record which is the total cost of company of all employees of all genders and all departments i.e. 2+1=3 as mentioned in introduction above.
department
gender
employee cost
Finance
Female
11800
Finance
Male
5000
Finance
NULL
16800
HR
Female
6000
HR
Male
14200
HR
NULL
20200
IT
Female
21200
IT
Male
5000
IT
NULL
26200
Marketing
Female
12200
Marketing
Male
20200
Marketing
NULL
32400
NULL
NULL
95600
Now we will apply the CUBE operator on the same query and see the difference. The following query does the same.
SELECT
department, gender, sum(salary) AS 'employee cost'
FROM employee
GROUP BY
CUBE (department, gender);
The resultset of the above query is the following. It has 15 records i.e. 2 more than the output from ROLLUP. The additional records are highlighted in pink and it is the output of the (gender) group which was missing in the ROLLUP resultset i.e. the total cost to company of the male employees and female employees separately. The total number of groups created and reported is 2n as mentioned above in the introduction i.e. (department, gender), (gender), (department) and ().
department
gender
employee cost
Finance
Female
11800
HR
Female
6000
IT
Female
21200
Marketing
Female
12200
NULL
Female
51200
Finance
Male
5000
HR
Male
14200
IT
Male
5000
Marketing
Male
20200
NULL
Male
44400
NULL
NULL
95600
Finance
NULL
16800
HR
NULL
20200
IT
NULL
26200
Marketing
NULL
32400
The output can be formatted with the help of coalesce to address gap in values i.e. NULL values. The following query does the same making the resultset more readable and comprehensible.
SELECT
coalesce (department, 'All Department') AS department,
coalesce (gender, 'Both Genders') AS gender,
sum(salary) AS 'employee cost'
FROM employee
GROUP BY
CUBE (department, gender);
The formatted output is the following. The NULL values have been replaced with meaningful values.