You are on page 1of 9

Featured Database Articles

MS SQL
October 24, 2008

Exploring SQL 2005s Ranking Functions NTILE() and ROW_NUMBER()


By Rob Garrison This is part 20 of a series. If you have not read part 19, I suggest starting here. That article is the first of this two-part coverage of ranking functions.

Table and Data for Illustration


This article will use the same table and data as Part 19, so I wont cover that again. The code to build and populate the table is here.

NTILE()
NTILE() breaks the results into groups. In the simplest case, if we pass the integer 2 to NTILE() using Weight, it will rank each entry as in either the first or second half based on Weight.

NTILE() with ORDER BY Only


Query: SELECT Category, Weight, Entrant, NTILE(2) OVER ( ORDER BY Weight DESC ) AS [Ntile] FROM dbo.ContestResults Partial Results: Category ---------Pumpkin Pumpkin ... Watermelon Watermelon Weight -----716 679 132 129 Entrant --------------Chad Johnson George Kopsell Meg Giry Joseph Buquet Ntile ----1 1 2 2

This is valid, but its not terribly interesting or valuable. Lets add a PARTITION BY clause so we get top- and bottom-half entries for each category.

NTILE() with PARTITION BY and ORDER BY


Query: SELECT Category, Weight, Entrant, NTILE(2) OVER ( PARTITION BY Category ORDER BY Weight DESC ) AS [Ntile] FROM dbo.ContestResults Partial Results: Category ---------Pumpkin ... Pumpkin Squash ... Squash Watermelon ... Watermelon Weight Entrant Ntile ------ --------------- -------------------716 Chad Johnson 1 229 462 405 146 129 Harvey Zale Dan Gardner Gary Spiel Mark Bardin Joseph Buquet 2 1 2 1 2

Now we have something useful.

Passing Different int Values to NTILE()


Passing 1 to NTILE() is worthless; it gives us 1 for each entry. But what if we pass a larger number to NTILE() than the number of result set rows? In our test table, we have four entries in the squash category. Lets try passing different values to NTILE(), starting with 2. Query: SELECT ... NTILE(2) OVER ( ORDER BY Weight DESC ) AS [Ntile] FROM dbo.ContestResults WHERE Category = 'Squash' Results:

Category -------Squash Squash Squash Squash

Weight -----462 462 435 405

Entrant -----------Dan Gardner Harvey Zale Terry Helmer Gary Spiel

Ntile ----1 1 2 2

Sure, this splits the entries into two halves. Now what about 3? Category -------Squash Squash Squash Squash Weight -----462 462 435 405 Entrant -----------Dan Gardner Harvey Zale Terry Helmer Gary Spiel Ntile ----1 1 2 3

Notice that the first two entries get tile 1 and have the same weight. This is interesting, but its not consistent. I will explain the uneven distribution shortly. If we switch to 4, we get one tile per entry as expected. Any integer above 4 will return the same results for this set as 4. Category -------Squash Squash Squash Squash Weight -----462 462 435 405 Entrant -----------Dan Gardner Harvey Zale Terry Helmer Gary Spiel Ntile ----1 2 3 4

The tiling will split records that have the same weight into different tiles. Here, we are using 7 as the tile integer and pulling the thirteen pumpkin entries. Category -------Pumpkin Pumpkin Pumpkin Pumpkin .... Pumpkin Weight -----716 679 679 481 229 Entrant -------------Chad Johnson George Kopsell Dan Gardner John Suydam Harvey Zale Ntile ----1 1 2 2 7

Note that two entries have the exact same weight (679) but are split between tile 1 and 2. I tried rebuilding the table and putting Dan Gardners entry in first and, as you might expect, the tiles were swapped. So, for entries with equal weight, the order of the base records determines the tiling. Another pattern is that if you have n records in your result set and n-1 as your ntile integer, the first two entries always have a tile of 1. Dealing Cards The distribution is actually very simple; its just like dealing cards. Think of the tiles like card players. If you dealt five cards to four players, the first player would have

an extra card. If you dealt one more card, it would go to the second player. If you think of the distribution of tiles like that, it will match the results. Using a Variable for the NTILE() int Value You can use a variable for the NTILE int value. This has clear value. Query: DECLARE @i SET @i = 5 int

SELECT ... NTILE(@i) OVER ( PARTITION BY Category ORDER BY Weight DESC ) AS [Ntile] FROM dbo.ContestResults

ROW_NUMBER()
ROW_NUMBER() numbers the rows of the result set. Here is the simplest case.

ROW_NUMBER() with ORDER BY Only


Query: SELECT ... ROW_NUMBER() OVER ( ORDER BY Weight DESC ) AS [RowNumber] FROM dbo.ContestResults Partial Results: Category ---------Pumpkin Pumpkin ... Watermelon Watermelon Weight -----716 679 132 129 Entrant ------------Chad Johnson Dan Gardner RowNumber --------1 2

Meg Giry 22 Joseph Buquet 23

Valid, yes, but fairly worthless. Adding a PARTITION BY clause starts to show the value.

ROW_NUMBER() with PARTITION BY and ORDER BY


Query:

SELECT ... ROW_NUMBER() OVER ( PARTITION BY Category ORDER BY Weight DESC ) AS [RowNumber] FROM dbo.ContestResults Partial Results: Category ---------Pumpkin Pumpkin ... Pumpkin Pumpkin Squash Squash Squash Squash Watermelon Watermelon ... Watermelon Watermelon Weight -----716 679 247 229 462 462 435 405 146 139 132 129 Entrant -------------Chad Johnson Dan Gardner Harvey Zale Harvey Zale Dan Gardner Harvey Zale Terry Helmer Gary Spiel Mark Bardin Christine Daa Meg Giry Joseph Buquet RowNumber -------------------1 2 12 13 1 2 3 4 1 2 5 6

Now we have row numbers starting at 1 for each category. This will allow us, for instance, to pull the top three entries from each category. This is really the best choice to fulfill our original task of picking who receives prizes.

Using ROW_NUMBER() with PARTITION BY and ORDER BY with a Common Table Expression
Just like RANK() and DENSE_RANK(), you cant do this: SELECT ... ROW_NUMBER() OVER ( PARTITION BY Category ORDER BY Weight DESC ) AS RowNumber FROM dbo.ContestResults WHERE RowNumber <= 3

You get Invalid column name 'RowNumber'. So, we turn again to Common Table Expressions (CTEs) or table variables. I think the CTEs are cleaner, so I will only illustrate that and not table variables. Common Table Expression: WITH t

AS (

SELECT ... ROW_NUMBER() OVER ( PARTITION BY Category ORDER BY Weight DESC ) AS RowNumber FROM dbo.ContestResults

) SELECT ... RowNumber FROM t WHERE RowNumber <= 3 ORDER BY Category, RowNumber Results: Category ---------Pumpkin Pumpkin Pumpkin Squash Squash Squash Watermelon Watermelon Watermelon Weight -----716 679 679 462 462 435 146 139 139 Entrant ------------------Chad Johnson Dan Gardner George Kopsell Dan Gardner Harvey Zale Terry Helmer Mark Bardin Christine Daa Carlotta Giudicelli RowNumber --------1 2 3 1 2 3 1 2 3

A Real-World ROW_NUMBER() Example: Most Recent Entry by Person


I have had to use ROW_NUMBER() multiple times recently for a specific requirement. We have a user action that happens multiple times, and we have to pull only the most recent entry for each person. ROW_NUMBER() is a huge winner here. Say were storing data for a pediatrician and need to pull information for the most recent visits by patients. This is a simple example, but you can see the power of using ROW_NUMBER() here. Query: DECLARE @records TABLE ( Person int, VisitDate datetime, Purpose varchar(10) ) INSERT INTO @records (Person, VisitDate, Purpose) VALUES ... (100, '1992-12-31', 'Birth') (100, '1993-01-12', 'CheckUp') (100, '1993-03-22', 'Shots') (100, '1993-04-19', 'EarInf')

(101, '2003-01-29', 'Birth') (101, '2003-02-03', 'CheckUp') (102, '2008-10-13', 'Birth') SELECT * FROM @records ;WITH t AS ( SELECT Person, VisitDate, Purpose, ROW_NUMBER() OVER ( PARTITION BY Person ORDER BY VisitDate DESC ) AS RowNumber FROM @records ) SELECT Person, VisitDate, Purpose FROM t WHERE RowNumber = 1 ORDER BY Person Result: Person -----100 100 100 100 101 101 102 Person -----100 101 102 VisitDate Purpose ---------- ------1992-12-31 Birth 1993-01-12 CheckUp 1993-03-22 Shots 1993-04-19 EarInf 2003-01-29 Birth 2003-02-03 CheckUp 2008-10-13 Birth VisitDate Purpose ---------- ------1993-04-19 EarInf 2003-02-03 CheckUp 2008-10-13 Birth

This gives us exactly what we want with very readable code.

Conclusion
NTILE() may have a sweet-spot in some applications, but I dont see it being as useful as ROW_NUMBER(). I think ROW_NUMBER() is very useful.

If you are a big fan of NTILE() and can explain where it is useful, please drop a comment in the forum. You can also add comments there about ROW_NUMBER() or any of the ranking functions. So far, no one has answered the challenge I posed in Part 19. Check it out and see if you can be the one that answers it. See All Articles by Columnist Rob Garrison

SqlCredit Developing a Complete SQL Server OLTP Database Project


Part 24: Performance Testing SQL 2008s Transparent Data Encryption Part 23: SQL Server 2008s Change Data Capture Tracking the Moving Parts Part 22: Performance Testing SQL Server 2008 versus SQL Server 2005 Part 21: Exploring SQL Server's Index INCLUDEs Part 20: Exploring SQL 2005s Ranking Functions NTILE() and ROW_NUMBER() Part 19: Exploring SQL 2005s Ranking Functions RANK() and DENSE_RANK() Part 18: Exploring the Performance of SQL 2005s OUTPUT Clause Part 17: Exploring SQL 2005s OUTPUT Clause Part 16: The Cost of Bloat Part 15: The Cost of Distribution Part 14: The Cost of Translation Part 13: More on Indexed Persisted Computed Columns Part 12: Exploring Indexed Persisted Computed Columns Part 11: Change Tracking Using History Records Part 10: MAC Performance and Updating SqlCredit Part 9: Message Authentication Codes Part 8: Comparing Encrypt/DecryptByCert and Encrypt/DecryptByKey Part 7: Performance Impact of EncryptByCert and DecryptByCert Part 6: Exploring EncryptByCert and DecryptByCert Part 5: Adding Card, Vendor, and Purchase, Plus Much Refactoring Part 4: Schema and Procedure Security Part 3: Schema/Proc Updates and Automated Unit Testing Part 2: Creating the Database, Tables, CRUD Procedures Introduction

How do I get How do I use the started with the SQL Server Row MDX query Number function? language? MS SQL How do I mirror FAQ my SQL Server How do I calculate database? running total How can I learn T- subtotals and grand SQL totals without a programming? cursor?

How do I store images and BLOB files in SQL Server? Where can I learn more about SQL scripts and get sample scripts? Where can I learn more about Microsoft SQL Server and stay

on?

You might also like