While working with data, many times I’ve had to extract unique rows out of a table, based on a subset of column values. Here I will show you a quick way to do that using the ROW_NUMBER() window function.
This time, the table I was working on stored files using the varbinary(max) data type in a column called [Attachment]. For some reason, this table had records that were duplicated on the file name (perhaps because of a bug in the application used to load them or because users where loading the same file from different modules in the application). I needed to extract a unique list of the files based on their names.
The table, called dbo.Comment, had several columns but [FileName] and [Attachment] are the only ones of concern to this example. My solution was this: group the rows based on their file names, number them, and then from each group choose the one that was numbered 1. The ROW_NUMBER() function can do this when the PARTITION BY clause is used. Here is my inner query:
And here are some partial results:
As you can see, the second and third rows have the same file name, The ROW_NUMBER function placed them in the same partition and numbered them 1 and 2. By the way, if I had to use more columns for uniqueness, the PARTITION BY clause allows a comma-separated list of column names.
Thus my final query is simple, using a common table expression: