lunedì 27 luglio 2015

SQL: How to select and eliminate duplicate records

This article will explain how to select (and then possibly delete) duplicate records in a SQL Server table.
It may happen that in a table there are duplicate records. Find them to delete them is always something boring.
To do this we can use line numbering function ROW_NUMBER () of MS SQL Server.
For example, this query

SELECT ROW_NUMBER() OVER(PARTITION BY VAT_NUMBER ORDER BY CUSTOMER_CODE) AS [RowNumber]
      , VAT_NUMBER
      , CUSTOMER_CODE
FROM  CUSTOMERS

will list all customers, numbering them progressively when have the same Vat ( … PARTITION BY VAT_NUMBER  … ).

To highlight duplicates, simply add a WHERE condition in this way:

SELECT * FROM (
 SELECT ROW_NUMBER() OVER(PARTITION BY VAT_NUMBER ORDER BY CUSTOMER_CODE) AS [RowNumber]
      , VAT_NUMBER
      , CUSTOMER_CODE
FROM  CUSTOMERS
   ) AS T1
WHERE RowNumber>1

Likewise, we can use the same trick to delete all duplicates:

 DELETE T1 FROM (
  SELECT ROW_NUMBER() OVER(PARTITION BY VAT_NUMBER ORDER BY CUSTOMER_CODE) AS [RowNumber]
      , VAT_NUMBER
      , CUSTOMER_CODE
FROM  CUSTOMERS
   ) AS T1
  WHERE T1.RowNumber >1