🦙
🦙

TeraData Delete Duplicate Rows using VOLATILE TABLE

Published: 2012-10-09

David Edelstein

Recently we ran into a TeraData problem where we couldn’t uniquely select individual rows because somehow rows were inserted with duplicate primary keys. The table’s uniqueness contstraint was violated for each insert, but somehow whatever process put those rows in the db did not fail. Now, stuck with these duplicate rows that we can’t delete individually, we needed some fix! No simple updates would do because of this uniqueness violation.

Eventually, using info from this link: [forums.teradata.com] Delete duplicate rows from Table in Teradata database we learned about VOLATILE TABLE. This table is in memory for the duration of your session and allows you to build up an intermediate result. With this table populated with real data, we can delete the data from original table, then update the original table with new data from our volatile table.

Besides the neat feature of volatile table, we use row-number to number each of the rows that matched on the unique key. Then when inserted to the original table from the volatile table, we only insert the first unique record.


CREATE SET VOLATILE TABLE fix_duplicate6004 AS
(
    SELECT ROW_NUMBER() OVER (
        PARTITION BY 
           UNIQUEKEY_COLUMN
        ORDER BY 
            SOMEOTHER_COLUMN) as rown, tbl.*

    FROM TABLE tbl
) 
WITH DATA ON COMMIT PRESERVE ROWS
;


SELECT <span class="delimiter">'</span>volatile' as isvolatile, dpl.* 
FROM fix_duplicate6004 dpl 
ORDER BY UNIQUEKEY_COLUMN, SOMEOTHER_COLUMN
;


INSERT INTO TBL 
(   SELECT * 
    FROM fix_duplicate6004 dpl 
    where dpl.rown=1
)