Thread: Removing duplicate rows in table
I've no idea how I entered multiple, identical rows in a table but I want to delete all but one of these rows. Here's an example: bustrac=# select * from projects where proj_nbr = '4242.01'; proj_nbr | proj_name | start_date | end_date | description | notes ----------+----------------+------------+------------+---------------+------- 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | (4 rows) How do I clean this up so there's only a single row for this project number? TIA, Rich
If you don't mind taking the time to swap tables, you can always do an INSERT ... SELECT DISTINCT <fields> into a new table,and then swap it with the existing table. > On Sep 10, 2024, at 08:07, Rich Shepard <rshepard@appl-ecosys.com> wrote: > > I've no idea how I entered multiple, identical rows in a table but I want to > delete all but one of these rows. > > Here's an example: > > bustrac=# select * from projects where proj_nbr = '4242.01'; > proj_nbr | proj_name | start_date | end_date | description | notes ----------+----------------+------------+------------+---------------+------- > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | (4 rows) > > How do I clean this up so there's only a single row for this project number? > > TIA, > > Rich > >
On Tue, Sep 10, 2024 at 11:07 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
I've no idea how I entered multiple, identical rows in a table but I want to
delete all but one of these rows.
Here's an example:
bustrac=# select * from projects where proj_nbr = '4242.01';
proj_nbr | proj_name | start_date | end_date | description | notes
----------+----------------+------------+------------+---------------+-------
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
(4 rows)
How do I clean this up so there's only a single row for this project number?
I'd use row_number to delete records where row_number > 1.
Death to America, and butter sauce.
Iraq lobster!
On 9/10/24 08:07, Rich Shepard wrote: > I've no idea how I entered multiple, identical rows in a table but I > want to > delete all but one of these rows. Is there a Primary Key or Unique index on this table? > > Here's an example: > > bustrac=# select * from projects where proj_nbr = '4242.01'; > proj_nbr | proj_name | start_date | end_date | description | > notes > ----------+----------------+------------+------------+---------------+------- > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > (4 rows) > > How do I clean this up so there's only a single row for this project > number? > > TIA, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
Am 10.09.24 um 17:07 schrieb Rich Shepard: > I've no idea how I entered multiple, identical rows in a table but I > want to > delete all but one of these rows. > > Here's an example: > > bustrac=# select * from projects where proj_nbr = '4242.01'; > proj_nbr | proj_name | start_date | end_date | description | > notes > ----------+----------------+------------+------------+---------------+------- > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > 4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | > (4 rows) > > How do I clean this up so there's only a single row for this project > number? > > TIA, > > Rich > > you can use the hidden ctid-column: postgres=# create table demo (id int, val text); CREATE TABLE postgres=# insert into demo values (1, 'test1'); INSERT 0 1 postgres=# insert into demo values (1, 'test1'); INSERT 0 1 postgres=# insert into demo values (1, 'test1'); INSERT 0 1 postgres=# insert into demo values (1, 'test1'); INSERT 0 1 postgres=# select ctid, * from demo; ctid | id | val -------+----+------- (0,1) | 1 | test1 (0,2) | 1 | test1 (0,3) | 1 | test1 (0,4) | 1 | test1 (4 rows) postgres=# with my_ctid as (select min(ctid) from demo where id = 1 and val = 'test1') delete from demo using my_ctid where id=1 and val='test1' and ctid != my_ctid.min; DELETE 3 postgres=# select ctid, * from demo; ctid | id | val -------+----+------- (0,1) | 1 | test1 (1 row) postgres=# -- Andreas Kretschmer CYBERTEC PostgreSQL Services and Support
On Tue, 10 Sep 2024, Adrian Klaver wrote: > Is there a Primary Key or Unique index on this table? Adrian, No. It didn't occur to me to make the project number a PK as this table is not related to others in the database. But, yesterday it occurred to me to make the proj_nbr a PK to eliminate future issues. Thanks, Rich
On Tue, 10 Sep 2024, Francisco Olarte wrote: > Do you have any kid of corruption (i.e, unique index violation) or is > it just a duplicate problem? Francisco, Only a duplicate problem because when I created this table I didn't make the proj_nbr column a PK. > Also, if you do not have any uniqueness criteria consider adding an > "id identity" column, it is useful when shit hits the fan. Yep. that's what I will do. Thanks, Rich
On Tue, 10 Sep 2024, Christophe Pettus wrote: > If you don't mind taking the time to swap tables, you can always do an > INSERT ... SELECT DISTINCT <fields> into a new table, and then swap it > with the existing table. Christophe, I'll make the proj_nbr table the PK then do as you recommend. Thank you, Rich
On Tue, 10 Sep 2024, Andreas Kretschmer wrote: > you can use the hidden ctid-column: > > postgres=# create table demo (id int, val text); > CREATE TABLE > postgres=# insert into demo values (1, 'test1'); > INSERT 0 1 > postgres=# insert into demo values (1, 'test1'); > INSERT 0 1 > postgres=# insert into demo values (1, 'test1'); > INSERT 0 1 > postgres=# insert into demo values (1, 'test1'); > INSERT 0 1 > postgres=# select ctid, * from demo; > ctid | id | val > -------+----+------- > (0,1) | 1 | test1 > (0,2) | 1 | test1 > (0,3) | 1 | test1 > (0,4) | 1 | test1 > (4 rows) > > postgres=# with my_ctid as (select min(ctid) from demo where id = 1 and val = > 'test1') delete from demo using my_ctid where id=1 and val='test1' and ctid > != my_ctid.min; > DELETE 3 > postgres=# select ctid, * from demo; > ctid | id | val > -------+----+------- > (0,1) | 1 | test1 > (1 row) Thanks, Andreas. Rich
On 9/10/24 08:29, Rich Shepard wrote: > On Tue, 10 Sep 2024, Adrian Klaver wrote: > >> Is there a Primary Key or Unique index on this table? > > Adrian, > > No. It didn't occur to me to make the project number a PK as this table is > not related to others in the database. > > But, yesterday it occurred to me to make the proj_nbr a PK to eliminate > future issues. > You might want to do something like: select proj_nbr, count(*) as ct from projects group by proj_nbr; to see how big a problem it is. If it is only a few projects it could just a matter of manually deleting the extras. Whatever you do: 1) Make sure you have a backup of at least that table. 2) Do the data changes as BEGIN; <changes> COMMIT; or ROLLBACK; > Thanks, > > Rich > > -- Adrian Klaver adrian.klaver@aklaver.com
Rich: On Tue, 10 Sept 2024 at 17:32, Rich Shepard <rshepard@appl-ecosys.com> wrote: > Only a duplicate problem because when I created this table I didn't make the > proj_nbr column a PK. Always report this is if you have future problems, so people know if it is a case of pilot error or corruption, solutions differ and the good for ones may harm others. > > Also, if you do not have any uniqueness criteria consider adding an > > "id identity" column, it is useful when shit hits the fan. > Yep. that's what I will do. Bear in mind the relational model on which relational databases are modeled needs unique rows ( i.e., no two full rows should be equal on all fields ). It is not enforced in SQL, but now having it normally leads to problems. When it cannot be done in any other way, normally adding an identity column is a cheap way to make them unique. I had that problem with CDR tables ( it means call detail record, and until I managed to add circuit identification I had the problem that you can have two calls from and two the same two numbers with equal start and end times ) and solved it using an identity column ( added just for this purpose, after doing it a couple times using ctid in testing, it is slightly more expensive, but a lot more civilized ). Francisco Olarte.
On Tue, 10 Sep 2024, Adrian Klaver wrote: > You might want to do something like: > > select proj_nbr, count(*) as ct from projects group by proj_nbr; > > to see how big a problem it is. If it is only a few projects it could just a > matter of manually deleting the extras. Adrian, It's a small table, not updated in a while. Looking at the example I sent how do I delete the extras while keeping one when each row has the same content? Not knowing how to do that is why I wrote. > Whatever you do: > 1) Make sure you have a backup of at least that table. > 2) Do the data changes as BEGIN; <changes> COMMIT; or ROLLBACK; Yep. Learned that lesson. Thanks, Rich P.S. Please reply to the mail list so I receive only one copy of your message, not two.
On Tue, 10 Sep 2024, Rich Shepard wrote: >> to see how big a problem it is. If it is only a few projects it could just >> a matter of manually deleting the extras. > Not knowing how to do that is why I wrote. A web search (which I should have done before posting this thread) shows me how to do this: <https://www.postgresqltutorial.com/postgresql-tutorial/how-to-delete-duplicate-rows-in-postgresql/> Rich
On 2024-09-10 18:38 +0200, Rich Shepard wrote: > P.S. Please reply to the mail list so I receive only one copy of your > message, not two. You can configure your list subscription to not receive an extra copy. That setting is under "Global configuration" on https://lists.postgresql.org/manage/. -- Erik
Hi,
You can try the following CTE which removes all the identical rows and only leave single row
WITH CTE AS (
SELECT ctid, ROW_NUMBER() OVER (PARTITION BY proj_nbr, proj_name, start_date, end_date, description, notes ORDER BY proj_nbr) AS rn
FROM projects
WHERE proj_nbr = '4242.02'
)
DELETE FROM projects
WHERE ctid IN (
SELECT ctid FROM CTE WHERE rn > 1
);
You can try the following CTE which removes all the identical rows and only leave single row
WITH CTE AS (
SELECT ctid, ROW_NUMBER() OVER (PARTITION BY proj_nbr, proj_name, start_date, end_date, description, notes ORDER BY proj_nbr) AS rn
FROM projects
WHERE proj_nbr = '4242.02'
)
DELETE FROM projects
WHERE ctid IN (
SELECT ctid FROM CTE WHERE rn > 1
);
On Tue, 10 Sept 2024 at 20:07, Rich Shepard <rshepard@appl-ecosys.com> wrote:
I've no idea how I entered multiple, identical rows in a table but I want to
delete all but one of these rows.
Here's an example:
bustrac=# select * from projects where proj_nbr = '4242.01';
proj_nbr | proj_name | start_date | end_date | description | notes
----------+----------------+------------+------------+---------------+-------
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
4242.01 | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
(4 rows)
How do I clean this up so there's only a single row for this project number?
TIA,
Rich
On Thu, 12 Sep 2024, Muhammad Usman Khan wrote: > You can try the following CTE which removes all the identical rows and only > leave single row Thank you, Muhammed. Rich