lunedì 27 luglio 2015

SQL: How to select and eliminate duplicate records

This article will explain how to select (and then possibly delete) duplicate records in a SQL Server table.
It may happen that in a table there are duplicate records. Find them to delete them is always something boring.
To do this we can use line numbering function ROW_NUMBER () of MS SQL Server.
For example, this query

SELECT ROW_NUMBER() OVER(PARTITION BY VAT_NUMBER ORDER BY CUSTOMER_CODE) AS [RowNumber]
      , VAT_NUMBER
      , CUSTOMER_CODE
FROM  CUSTOMERS

will list all customers, numbering them progressively when have the same Vat ( … PARTITION BY VAT_NUMBER  … ).

To highlight duplicates, simply add a WHERE condition in this way:

SELECT * FROM (
 SELECT ROW_NUMBER() OVER(PARTITION BY VAT_NUMBER ORDER BY CUSTOMER_CODE) AS [RowNumber]
      , VAT_NUMBER
      , CUSTOMER_CODE
FROM  CUSTOMERS
   ) AS T1
WHERE RowNumber>1

Likewise, we can use the same trick to delete all duplicates:

 DELETE T1 FROM (
  SELECT ROW_NUMBER() OVER(PARTITION BY VAT_NUMBER ORDER BY CUSTOMER_CODE) AS [RowNumber]
      , VAT_NUMBER
      , CUSTOMER_CODE
FROM  CUSTOMERS
   ) AS T1
  WHERE T1.RowNumber >1

venerdì 12 giugno 2015

Select rows with no matching entries on another table

Script:
select distinct T1.[Field1]
from [Table1] T1
left outer join [Table2] T2 on T1.[Field1] = T2.[Field2]
where T2.[Field2] is null

Key points:

  • Use Left Join to return all rows from Table1, regardless of whether or not there is a matching row in Table2.
  • The WHERE T2.Field2 IS NULL clause will restrict the results returned to only those rows where the Field2 returned from Table2 is null: in other words there is NO record in Table2 for that particular value from Table1.

How to create a Time Table

Key points

  • Time is a dimension that never has to miss in a Data Warehouse. 
  • This script generate a table with a daily granularity.
  • The key of the table must have the int format yyyymmdd.

Script:

-- Create dimDate table
CREATE TABLE [DWH].[DimDate]
     (
        DateId  INT NOT NULL,
Date DATETIME NOT NULL,
        Year VARCHAR(4) NULL,
        Quarter VARCHAR(20) NULL,
        Month VARCHAR(2) NULL,
        MonthName VARCHAR(20) NULL,
Day VARCHAR(2) NULL
        CONSTRAINT PK_Date PRIMARY KEY CLUSTERED (DateId ASC)
        ON [PRIMARY]
     )
GO

-- Creation of variables to set start and end period

DECLARE @StartDate datetime='01/01/2011'
DECLARE @EndDate datetime='01/01/2020'

--Loop to insert date in table

DECLARE @DateInProcess datetime
SET @DateInProcess=@StartDate

WHILE @DateInProcess<=@EndDate

BEGIN
-- Add a record in table for this date
INSERT INTO [DWH].[DimDate] 
(
[DateId]
, [Date]
, [Year]
, [Quarter]
, [Month]
, [MonthName]
, [Day]
)
VALUES (
-- DateId
   CAST(CONVERT(VARCHAR(8),@DateInProcess, 112) AS INT)
-- Date
,   @DateInProcess
-- Year
, Year ( @DateInProcess )
-- Quarter
, DateName ( quarter, @DateInProcess )
-- Month
, Month ( @DateInProcess )
-- MonthName
, Cast( Year( @DateInProcess ) as varchar(4) )+' - '
+ DateName( month, @DateInProcess )
-- Day
, Day ( @DateInProcess)
)

-- Add day to execute a new loop

SET @DateInProcess=DATEADD(d, 1, @DateInProcess)
END