Delete Duplicate Rows in MySQL

by damonp on January 17, 2006

in Snippets

1.5 million rows in a products_categories table. That’s 1,500,000 products to categories relations. I needed a quick way to remove products from multiple categories to test the performance hit of the HUGE products_categories table. A quick and dirty is to remove duplicate productids from the table. Here’s a quick howto.

Create a temporary table with only one row for each productid (the GROUP BY clause). Change this GROUP BY to be the field with duplicates to remove. You can also add some logic here to select which records to keep.

CREATE TABLE products_categories_temp AS SELECT * FROM products_categories GROUP BY productid

Empty the original table (you created a backup already right?).

TRUNCATE products_categories

Copy the ‘clean’ records back to the original table.

INSERT INTO products_categories SELECT * FROM products_categories_temp

This can be used to normalize a table to add a primary key, data mining, cleaning a database or just for testing.

Popularity: 22%

Most Popular Posts

Damon Parker is a freelance sysadmin and web developer in Texas. He specializes in server setup, server security and high performance server configurations. Need help setting up a web server or getting a server back online after a crash or hack? Email Damon

{ 7 comments… read them below or add one }

Arpit September 30, 2008 at 10:08 am

Its a good solution…for removing duplicates

Reply

viral August 28, 2009 at 7:47 am

other simple way is

create table temp as select * from t1 group by fields;

delete from t1 where id not in (select id from temp )

Reply

Agnel September 30, 2009 at 8:27 am

Its always has been a trouble to keep up with the database expecially when they have duplicate rows. Nice explanation to solve it.

Reply

m4 January 1, 2010 at 1:48 pm

thinks for this tip
it’s really useful
think’s for sharing it

Reply

snork3ls April 30, 2011 at 7:37 am

YOU ARE THE BEST!!!! I was looking for this code and i only obtain a full table delete.

I LOVE YOU!!!!

Reply

Bharathiraja December 22, 2011 at 12:45 am

Another solution is we can add UNIQUE KEY for the field productid.

When we are adding UNIQUE KEY, all the duplicate records will be automatically deleted.

“ALTER IGNORE TABLE products_categories ADD UNIQUE KEY(productid)”

Reply

jim.operion January 21, 2012 at 2:11 pm

Useful solution here… How to delete duplicate rows with SQL
http://www.xaprb.com/blog/2006/10/11/how-to-delete-duplicate-rows-with-sql/

Reply

Leave a Comment

Previous post:

Next post: