Getting started with analytic functions in SQL Server 2012 Part-1

SQL Server 2012 comes with a lot of new feature. Window analytic function is one of them. Well as usual in my free time I generally active on stackoverflow.com. Well, a few days ago there was one question asked by one of the user in SO. I’m not going to explain this question here. Again on the same day another question was asked which was related to one of the feature that i covered in this post. After answering both the question, I thought why should not write the blog post for these analytic functions introduced in SQL server 2012.

Why Analytic functions

Analytic functions help you to get other information like access immediate next record, or next-to-next record or previous records etc. from the results without any self-joins, derived tables and subqueries and all. For example, let’s started with library management example. As a librarian in my college library, if I want to know that – That particular book was given to which student for the first time to read,  then who was the second student to read that book and who is the 3rd student and who is the last and so on. Well you can get all of these information in single SQL statement without any self join and all the stuff. Analytic functions really ease the developer’s life. We’ll see it one-by-one.

SQL Server 2012, comes with 8 analytic functions and its — PERCENT_RANK, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DISC, LEAD, LAG, FIRST_VALUE and LAST_VALUE. In this first part we’ll understand LEAD, LAG, FIRST_VALUE and LAST_VALUE functions. Okay guys let’s start our journey.

FIRST_VALUE / LAST_VALUE:

First_Value() returns the first value of the result or partition. Last_Value() function does the opposite, it returns the last value of the result or partition. Below is the skeleton or syntax of it.

FIRST_VALUE / LAST_VALUE (Column name)
    OVER ( (optional)partition by   order by   (optional)rows / range clause )

Now, Let’s understand each part.

COLUMN NAME:  which column value you want to return.

PARTITION BY:  Divide the results into partition by specified criteria. This is optional.

ROWS / RANGE Clause:  This clause limits the partition result by specifying the start and end point. For that new keyword introduced in sql server like – UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING and CURRENT ROW.

Before we go ahead, let’s take a small overview of each keyword because it’s also equally important to understand and how analytic functions use these keywords to limit the result.

  • UNBOUNDED PRECEDING: The range start at the first row of the partition. i.e. All previous rows within the result or partition.
  • UNBOUNDED FOLLOWING: The range ends at the last row of the partition. i.e. All subsequent rows within the result or partition.
  • CURRENT ROW: The range begins at the current row or ends at the current row.
  • n PRECEDING or n FOLLOWING: The range starts or ends n rows before or after the current row.

Okay, now let’s play with FIRST_VALUE() function. Below query gives you the first value from the result set.

SELECT DISTINCT Books.BookID,
BookName ,
FIRST_VALUE(st.StudentID) OVER(PARTITION BY st.BookID ORDER BY st. StudentID) AS FirstReader
FROM [BookAssignedToStudent] st
INNER JOIN dbo.Books ON Books.BookID = st.BookID
ORDER BY BookID;

First_Value

And below is the query for Last_Value() function.

SELECT DISTINCT Books.BookID,
BookName ,
LAST_VALUE (st.StudentID) OVER(PARTITION BY st.BookID ORDER BY st. StudentID) AS LastReader
FROM [BookAssignedToStudent] st
INNER JOIN dbo.Books ON Books.BookID = st.BookID
ORDER BY BookID; 

Last_Value

Now let’s try to limit the result by setting start and end point like below –

So first we’ll use – “ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING” like this –

SELECT DISTINCT Books.BookID,
BookName ,
FIRST_VALUE(st.StudentID) OVER(PARTITION BY st.BookID ORDER BY st.StudentID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CurRow_UnbFoll
FROM [BookAssignedToStudent] st
INNER JOIN dbo.Books ON Books.BookID = st.BookID
ORDER BY BookID; 

First_Value with Current row and Unbounded Following.

Now let’s use – “ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW” like this –

SELECT DISTINCT Books.BookID,
BookName ,
FIRST_VALUE(st.StudentID) OVER(PARTITION BY st.BookID ORDER BY st.StudentID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS UnbPrec_CurRow
FROM [BookAssignedToStudent] st
INNER JOIN dbo.Books ON Books.BookID = st.BookID
ORDER BY BookID; 

In the below image, above query’s result is at left side. I tried to compare left hand side result with the original records (right side) just for the sake of explanation.

First_Value with UnbPrece. and Current Row

Now let’s use – “ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW” with LAST_VALUE() function like this –

SELECT DISTINCT Books.BookID,
BookName ,
LAST_VALUE (st.StudentID) OVER(PARTITION BY st.BookID ORDER BY st.StudentID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS UnbPrec_CurRow
FROM [BookAssignedToStudent] st
INNER JOIN dbo.Books ON Books.BookID = st.BookID
ORDER BY BookID; 

Query result of Last_Value with UnbPrec. and CurRow.

Well, you can do all these experiment in [ sql fiddle ] I have created for you. Just open [ this link ] and do the experiment with “RANGE” keyword also.

LAG / LEAD

The LEAD function is used to read a value from the next row / subsequent row and The LAG function is used to read a value from the previous row of the result or partition. If there is no next / previous row exists, then this function return NULL. Other thing is LAG and LEAD functions allow you to specify the offset it means how many rows you want to look forward or backward. It also supports default value if null is returned.  In our case let say if you want to know that who is the second or third student to whom the particular book was assigned then these functions really helpful for us.

Below is the skeleton or syntax of it –

LEAD / LAG (Column name,  (optional) Offset,  (opt.) Default Val.)
    OVER ( (optional)partition by   order by …)

Okay now let’s play with it.

SELECT DISTINCT Books.BookID,
BookName,
AssignDate,
st.StudentID,
LEAD(st.StudentID) OVER(PARTITION BY st.BookID ORDER BY st.StudentID) AS NextStudentID,
LEAD(st.StudentID, 2, '-1') OVER(PARTITION BY st.BookID ORDER BY st.StudentID) AS NextToNextStudentIDWithDefaultValue,
LAG(st.StudentID) OVER(PARTITION BY st.BookID ORDER BY st.StudentID) AS PreviousStudentID,
LAG(st.StudentID, 2, '-1') OVER(PARTITION BY st.BookID ORDER BY st.StudentID) AS PreviousToPreviousStudentIDWithDefaultValue
FROM [BookAssignedToStudent] st
INNER JOIN dbo.Books ON Books.BookID = st.BookID
ORDER BY BookID; 

Just for the sake of explanation, I have divided the above query result into two parts in the below image.

Lead

and for LAG function –

Lag

 

That’s it for now. Hope you enjoyed this post. Leave your comments regarding this post. Thank you.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s