Thursday, June 23, 2016

MS SQL Server - Window Functions - Aggregate Functions

Aggregate functions perform a calculation on a set of values and return a single value, frequently used with the GROUP BY clause of the SELECT statement.

In this post, we will try using aggregate functions with and without OVER clause. I recommend you to read the previous post on Window Ranking Functions

Let create a sample table and insert some records.

declare @tblCustomerOrders table (
 CustomerName varchar(50),
 OrderAmount decimal(18, 2)
)

insert into @tblCustomerOrders (CustomerName, OrderAmount)
values 
('Imran', 100),
('Imran', 40),
('Kamran', 500),
('Salman', 350),
('Kamran', 430),
('Salman', 210),
('Imran', 70),
('Salman', 105),
('Imran', 750)

In this table, if we want to see the sum of OrderAmount for each customer, we use the following query.

SELECT 
  CustomerName
 ,SUM(OrderAmount) AS 'Sum'
FROM @tblCustomerOrders

But when you run this query you will get error:

Column '@tblCustomerOrders.CustomerName' is invalid in the select list 
because it is not contained in either an aggregate function or the 
GROUP BY clause.

Since we are missing the GROUP BY clause, we got this error. To work it as expected, we have to add GROUP BY clause with CustomerName field.

SELECT
  CustomerName
 ,SUM(OrderAmount) AS 'Sum'
FROM @tblCustomerOrders
GROUP BY CustomerName

Now you will get the expected result.

Lets see another scenario, if we want to list all customers with their SUM() of OrderAmounts, but we also need to list individual order amounts in separate column. Try this query:

SELECT
  CustomerName
 ,OrderAmount -- (individual OrderAmount)
 ,SUM(OrderAmount) AS 'Sum'
FROM @tblCustomerOrders
GROUP BY CustomerName

Since, OrderAmount field is not mentioned in GROUP BY clause, so you will get error message display:

Column '@tblCustomerOrders.OrderAmount' is invalid in the select list 
because it is not contained in either an aggregate function or the 
GROUP BY clause.

Now add OrderAmount field in GROUP BY clause along-with CustomerName field.

SELECT 
  CustomerName
 ,OrderAmount -- (individual OrderAmount)
 ,SUM(OrderAmount) AS 'Sum'
FROM @tblCustomerOrders
GROUP BY CustomerName, OrderAmount

This query will not generate error, but you get the following output, which is not the desired one.

Because each row is also grouped by OrderAmount, so the SUM(OrderAmount) is same as of OrderAmount.

How to fix this, there comes the OVER clause. Although there may be some other workarounds you can use to get the desired output, but the simpler one is using the OVER clause. You can put your aggregate functions in conjunction with OVER clause and there is no need to add GROUP BY clause.

Try getting SUM with OVER clause:

SELECT
  CustomerName
 ,SUM(OrderAmount) OVER() AS 'Sum'
FROM @tblCustomerOrders

This will generate the output:

But this is not the expected result, it is calculating the SUM of all records and not calculating SUM for individual Customers. As I also mentioned in my last post on Window Functions, add PARTITION BY clause in OVER clause.

The output will be:

As you might noticed, unlike the working query with GROUP BY clause discussed above, this query is displaying the list of all records along-with SUM, i.e. CustomerNames are repeating with no of occurrences exists in the table. So you can also add OrderAmount field to display along-with SUM filed.

SELECT
  CustomerName
 ,OrderAmount
 ,SUM(OrderAmount) OVER(PARTITION BY CustomerName) AS 'Sum'

FROM @tblCustomerOrders

Generates the output:

This gives us the solution that we can list down all customer records with SUM of OrderAmount and also showing individual OrderAmounts.

Similarly we can use other aggregate functions like COUNT(), AVG(), MIN(), MAX()

SELECT
  CustomerName
 ,OrderAmount
 ,COUNT(OrderAmount) OVER(PARTITION BY CustomerName) AS 'Count'   
 ,SUM(OrderAmount) OVER(PARTITION BY CustomerName) AS 'Sum'
 ,AVG(OrderAmount) OVER(PARTITION BY CustomerName) AS 'Avg'
 ,MIN(OrderAmount) OVER(PARTITION BY CustomerName) AS 'Min'
 ,MAX(OrderAmount) OVER(PARTITION BY CustomerName) AS 'Max'
 
FROM @tblCustomerOrders

Will generate the output as expected:

References:

No comments:

Post a Comment