SQL Server CUBE

Learning Objective

The objective of this SQL Server tutorial is to teach you how to use CUBE to aggregate different groups and the complete details about them.

What is CUBE in SQL Server?

CUBE is an extension of GROUPING SETS. It is almost the same as ROLLUP with one subtle difference. The difference is that it does an exhaustive grouping of the specified columns using all possible permutations. For e.g.  if we specify 3 table columns or fields (X, Y, Z) in rollup it will group as below. The total number of groups created and reported will be n+1 where n is the number of columns specified in the ROLLUP clause. (X, Y, Z), (X, Y), (X), () But CUBE will group as below. The total number of groups created and reported in the resultset will be 2n where n is the number of columns specified in the CUBE clause. (X, Y, Z), (X, Z), (X, Y), (X), (Y), () So, we can see it produces a larger resultset containing more rows of information.

SQL Server CUBE Syntax

The basic syntax of the CUBE is given below.
SELECT
column1, column2, aggregate_function (column 3)
FROM
table
GROUP BY CUBE (column1, column2);
In this syntax,
  • column – a column from the table in the SELECT list.
  • aggregate_function (column) – column on which aggregate function (i.e. SUM, COUNT, AVG etc.) is used.
  • GROUP BY – SQL keyword combination to specify a column or multiple columns to create a group on which the to apply the aggregate.
  • CUBE – SQL keyword combination which creates an exhaustive grouping of the specified columns and aggregates them.

SQL Server CUBE with Example

Let us try to understand when and how to use the CUBE operator in a SQL query. Suppose we have an employee table containing the information about employees in the company. The table is represented below. We will use this sample table for reference in our examples.
id name gender salary department
1 David Jackson Male 5000 IT
2 Jim Jameson Female 6000 HR
3 Kate Johnson Female 7500 IT
4 Will Ray Male 6500 Marketing
5 Shane Mathews Female 5500 Finance
6 Shed Price Male 8000 Marketing
7 Viktor Smith Male 7200 HR
8 Vincent Smithson Female 6600 IT
9 Janice Streep Female 5400 Marketing
10 Laura Wells Female 6300 Finance
11 Mac Bull Male 5700 Marketing
12 Patrick Patterson Male 7000 HR
13 Julie Orbison Female 7100 IT
14 Elice Hemingway Female 6800 Marketing
15 Wayne Johnson Male 5000 Finance
Table: employee
Let us start by doing a ROLLUP on department and gender for employee salary. The following query does the same. The sum of employee salaries is aliased as ‘employee cost’.
SELECT
department, gender, sum(salary) AS 'employee cost'
FROM employee
GROUP BY
ROLLUP (department, gender);
The resultset of the above query is the following. It has 13 rows of information or records. They include the output of the following set of groups – (department, gender) with 8 records, (department) with 4 records and () 1 record which is the total cost of company of all employees of all genders and all departments i.e. 2+1=3 as mentioned in introduction above.
department gender employee cost
Finance Female 11800
Finance Male 5000
Finance NULL 16800
HR Female 6000
HR Male 14200
HR NULL 20200
IT Female 21200
IT Male 5000
IT NULL 26200
Marketing Female 12200
Marketing Male 20200
Marketing NULL 32400
NULL NULL 95600
Now we will apply the CUBE operator on the same query and see the difference. The following query does the same.
SELECT
department, gender, sum(salary) AS 'employee cost'
FROM employee
GROUP BY
CUBE (department, gender);
The resultset of the above query is the following. It has 15 records i.e. 2 more than the output from ROLLUP. The additional records are highlighted in pink and it is the output of the (gender) group which was missing in the ROLLUP resultset i.e. the total cost to company of the male employees and female employees separately. The total number of groups created and reported is 2n as mentioned above in the introduction i.e. (department, gender), (gender), (department) and ().
department gender employee cost
Finance Female 11800
HR Female 6000
IT Female 21200
Marketing Female 12200
NULL Female 51200
Finance Male 5000
HR Male 14200
IT Male 5000
Marketing Male 20200
NULL Male 44400
NULL NULL 95600
Finance NULL 16800
HR NULL 20200
IT NULL 26200
Marketing NULL 32400
The output can be formatted with the help of coalesce to address gap in values i.e. NULL values. The following query does the same making the resultset more readable and comprehensible.
SELECT
coalesce (department, 'All Department') AS department,
coalesce (gender, 'Both Genders') AS gender,
sum(salary) AS 'employee cost'
FROM employee
GROUP BY
CUBE (department, gender);
The formatted output is the following. The NULL values have been replaced with meaningful values.
department gender employee cost
Finance Female 11800
HR Female 6000
IT Female 21200
Marketing Female 12200
All Department Female 51200
Finance Male 5000
HR Male 14200
IT Male 5000
Marketing Male 20200
All Department Male 44400
All Department Both Genders 95600
Finance Both Genders 16800
HR Both Genders 20200
IT Both Genders 26200
Marketing Both Genders 32400

Please get connected & share!

Advertisement