Thread: Removing duplicate rows in table

Removing duplicate rows in table

From
Rich Shepard
Date:
I've no idea how I entered multiple, identical rows in a table but I want to
delete all but one of these rows.

Here's an example:

bustrac=# select * from projects where proj_nbr = '4242.01';
  proj_nbr |   proj_name    | start_date |  end_date  |  description  | notes 
----------+----------------+------------+------------+---------------+-------
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 
(4 rows)

How do I clean this up so there's only a single row for this project number?

TIA,

Rich



Re: Removing duplicate rows in table

From
Christophe Pettus
Date:
If you don't mind taking the time to swap tables, you can always do an INSERT ... SELECT DISTINCT <fields> into a new
table,and then swap it with the existing table. 


> On Sep 10, 2024, at 08:07, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> I've no idea how I entered multiple, identical rows in a table but I want to
> delete all but one of these rows.
>
> Here's an example:
>
> bustrac=# select * from projects where proj_nbr = '4242.01';
> proj_nbr |   proj_name    | start_date |  end_date  |  description  | notes
----------+----------------+------------+------------+---------------+-------
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | (4 rows)
>
> How do I clean this up so there's only a single row for this project number?
>
> TIA,
>
> Rich
>
>




Re: Removing duplicate rows in table

From
Ron Johnson
Date:
On Tue, Sep 10, 2024 at 11:07 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
I've no idea how I entered multiple, identical rows in a table but I want to
delete all but one of these rows.

Here's an example:

bustrac=# select * from projects where proj_nbr = '4242.01';
  proj_nbr |   proj_name    | start_date |  end_date  |  description  | notes
----------+----------------+------------+------------+---------------+-------
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
(4 rows)

How do I clean this up so there's only a single row for this project number?

 
I'd use row_number to delete records where row_number > 1.

--
Death to America, and butter sauce.
Iraq lobster!

Re: Removing duplicate rows in table

From
Adrian Klaver
Date:
On 9/10/24 08:07, Rich Shepard wrote:
> I've no idea how I entered multiple, identical rows in a table but I 
> want to
> delete all but one of these rows.

Is there a Primary Key or Unique index on this table?

> 
> Here's an example:
> 
> bustrac=# select * from projects where proj_nbr = '4242.01';
>   proj_nbr |   proj_name    | start_date |  end_date  |  description  | 
> notes 
> ----------+----------------+------------+------------+---------------+-------
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 
> (4 rows)
> 
> How do I clean this up so there's only a single row for this project 
> number?
> 
> TIA,
> 
> Rich
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Removing duplicate rows in table

From
Andreas Kretschmer
Date:

Am 10.09.24 um 17:07 schrieb Rich Shepard:
> I've no idea how I entered multiple, identical rows in a table but I 
> want to
> delete all but one of these rows.
>
> Here's an example:
>
> bustrac=# select * from projects where proj_nbr = '4242.01';
>  proj_nbr |   proj_name    | start_date |  end_date  | description  | 
> notes 
> ----------+----------------+------------+------------+---------------+-------
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 
> (4 rows)
>
> How do I clean this up so there's only a single row for this project 
> number?
>
> TIA,
>
> Rich
>
>

you can use the hidden ctid-column:

postgres=# create table demo (id int, val text);
CREATE TABLE
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# select ctid, * from demo;
  ctid  | id |  val
-------+----+-------
  (0,1) |  1 | test1
  (0,2) |  1 | test1
  (0,3) |  1 | test1
  (0,4) |  1 | test1
(4 rows)

postgres=# with my_ctid as (select min(ctid) from demo where id = 1 and 
val = 'test1') delete from demo using my_ctid where id=1 and val='test1' 
and ctid != my_ctid.min;
DELETE 3
postgres=# select ctid, * from demo;
  ctid  | id |  val
-------+----+-------
  (0,1) |  1 | test1
(1 row)

postgres=#


-- 
Andreas Kretschmer
CYBERTEC PostgreSQL Services and Support




Re: Removing duplicate rows in table

From
Rich Shepard
Date:
On Tue, 10 Sep 2024, Adrian Klaver wrote:

> Is there a Primary Key or Unique index on this table?

Adrian,

No. It didn't occur to me to make the project number a PK as this table is
not related to others in the database.

But, yesterday it occurred to me to make the proj_nbr a PK to eliminate
future issues.

Thanks,

Rich



Re: Removing duplicate rows in table

From
Rich Shepard
Date:
On Tue, 10 Sep 2024, Francisco Olarte wrote:

> Do you have any kid of corruption (i.e, unique index violation) or is
> it just a duplicate problem?

Francisco,

Only a duplicate problem because when I created this table I didn't make the
proj_nbr column a PK.

> Also, if you do not have any uniqueness criteria consider adding an
> "id identity" column, it is useful when shit hits the fan.

Yep. that's what I will do.

Thanks,

Rich



Re: Removing duplicate rows in table

From
Rich Shepard
Date:
On Tue, 10 Sep 2024, Christophe Pettus wrote:

> If you don't mind taking the time to swap tables, you can always do an
> INSERT ... SELECT DISTINCT <fields> into a new table, and then swap it
> with the existing table.

Christophe,

I'll make the proj_nbr table the PK then do as you recommend.

Thank you,

Rich



Re: Removing duplicate rows in table

From
Rich Shepard
Date:
On Tue, 10 Sep 2024, Andreas Kretschmer wrote:

> you can use the hidden ctid-column:
>
> postgres=# create table demo (id int, val text);
> CREATE TABLE
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# select ctid, * from demo;
>  ctid  | id |  val
> -------+----+-------
>  (0,1) |  1 | test1
>  (0,2) |  1 | test1
>  (0,3) |  1 | test1
>  (0,4) |  1 | test1
> (4 rows)
>
> postgres=# with my_ctid as (select min(ctid) from demo where id = 1 and val = 
> 'test1') delete from demo using my_ctid where id=1 and val='test1' and ctid 
> != my_ctid.min;
> DELETE 3
> postgres=# select ctid, * from demo;
>  ctid  | id |  val
> -------+----+-------
>  (0,1) |  1 | test1
> (1 row)

Thanks, Andreas.

Rich



Re: Removing duplicate rows in table

From
Adrian Klaver
Date:
On 9/10/24 08:29, Rich Shepard wrote:
> On Tue, 10 Sep 2024, Adrian Klaver wrote:
> 
>> Is there a Primary Key or Unique index on this table?
> 
> Adrian,
> 
> No. It didn't occur to me to make the project number a PK as this table is
> not related to others in the database.
> 
> But, yesterday it occurred to me to make the proj_nbr a PK to eliminate
> future issues.
> 

You might want to do something like:

select proj_nbr, count(*) as ct from projects group by proj_nbr;

to see how big a problem it is. If it is only a few projects it could 
just a matter of manually deleting the extras.

Whatever you do:

1) Make sure you have a backup of at least that table.

2) Do the data changes as BEGIN; <changes> COMMIT; or ROLLBACK;

> Thanks,
> 
> Rich
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Removing duplicate rows in table

From
Francisco Olarte
Date:
Rich:

On Tue, 10 Sept 2024 at 17:32, Rich Shepard <rshepard@appl-ecosys.com> wrote:
> Only a duplicate problem because when I created this table I didn't make the
> proj_nbr column a PK.

Always report this is if you have future problems, so people know if
it is a case of pilot error or corruption, solutions differ and the
good for ones may harm others.


> > Also, if you do not have any uniqueness criteria consider adding an
> > "id identity" column, it is useful when shit hits the fan.
> Yep. that's what I will do.

Bear in mind the relational model on which relational databases are
modeled needs unique rows ( i.e., no two full rows should be equal on
all fields ). It is not enforced in SQL, but now having it normally
leads to problems. When it cannot be done in any other way, normally
adding an identity column is a cheap way to make them unique. I had
that problem with CDR tables ( it means call detail record, and until
I managed to add circuit identification I had the problem that you can
have two calls from and two the same two numbers with equal start and
end times ) and solved it using an identity column ( added just for
this purpose, after doing it a couple times using ctid in testing, it
is slightly more expensive, but a lot more civilized ).

Francisco Olarte.



Re: Removing duplicate rows in table

From
Rich Shepard
Date:
On Tue, 10 Sep 2024, Adrian Klaver wrote:

> You might want to do something like:
>
> select proj_nbr, count(*) as ct from projects group by proj_nbr;
>
> to see how big a problem it is. If it is only a few projects it could just a 
> matter of manually deleting the extras.

Adrian,

It's a small table, not updated in a while. Looking at the example I sent
how do I delete the extras while keeping one when each row has the same
content? Not knowing how to do that is why I wrote.

> Whatever you do:
> 1) Make sure you have a backup of at least that table.
> 2) Do the data changes as BEGIN; <changes> COMMIT; or ROLLBACK;

Yep. Learned that lesson.

Thanks,

Rich

P.S. Please reply to the mail list so I receive only one copy of your
message, not two.



Re: Removing duplicate rows in table

From
Rich Shepard
Date:
On Tue, 10 Sep 2024, Rich Shepard wrote:

>> to see how big a problem it is. If it is only a few projects it could just 
>> a matter of manually deleting the extras.

> Not knowing how to do that is why I wrote.

A web search (which I should have done before posting this thread) shows me
how to do this:
<https://www.postgresqltutorial.com/postgresql-tutorial/how-to-delete-duplicate-rows-in-postgresql/>

Rich



Re: Removing duplicate rows in table

From
Erik Wienhold
Date:
On 2024-09-10 18:38 +0200, Rich Shepard wrote:
> P.S. Please reply to the mail list so I receive only one copy of your
> message, not two.

You can configure your list subscription to not receive an extra copy.
That setting is under "Global configuration" on
https://lists.postgresql.org/manage/.

-- 
Erik



Re: Removing duplicate rows in table

From
Muhammad Usman Khan
Date:
Hi,
You can try the following CTE which removes all the identical rows and only leave single row

WITH CTE AS (
  SELECT ctid, ROW_NUMBER() OVER (PARTITION BY proj_nbr, proj_name, start_date, end_date, description, notes ORDER BY proj_nbr) AS rn
  FROM projects
  WHERE proj_nbr = '4242.02'
)
DELETE FROM projects
WHERE ctid IN (
  SELECT ctid FROM CTE WHERE rn > 1
);


On Tue, 10 Sept 2024 at 20:07, Rich Shepard <rshepard@appl-ecosys.com> wrote:
I've no idea how I entered multiple, identical rows in a table but I want to
delete all but one of these rows.

Here's an example:

bustrac=# select * from projects where proj_nbr = '4242.01';
  proj_nbr |   proj_name    | start_date |  end_date  |  description  | notes
----------+----------------+------------+------------+---------------+-------
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
(4 rows)

How do I clean this up so there's only a single row for this project number?

TIA,

Rich


Re: Removing duplicate rows in table

From
Rich Shepard
Date:
On Thu, 12 Sep 2024, Muhammad Usman Khan wrote:

> You can try the following CTE which removes all the identical rows and only
> leave single row

Thank you, Muhammed.

Rich