BUG #18166: 100 Gb 18000000 records table update - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18166: 100 Gb 18000000 records table update
Date
Msg-id 18166-8d0f3718572475ef@postgresql.org
Whole thread Raw
Responses Re: BUG #18166: 100 Gb 18000000 records table update  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18166
Logged by:          Ruslan Ganeev
Email address:      ruslan.ganeev@list.ru
PostgreSQL version: 12.6
Operating system:   CentOS 8
Description:

Greetings
Could you help us with the following 

We have a table with name «svjul», that contains 18 million records, 100 Gb
disk space occupied 

The table «svjul» contains 3 columns 
-  ID (Type: Integer) – it’s a key
-  enddate (Type: Date) 
-  Data (Type: JSONB). That column contains fields «OGRN» and «DataVip». The
value in a row «OGRN» is not unique: several records may contain the same
OGRN.

We tried to make a script, which sets enddate = '2022-12-31' for all
records, having the value in «DataVip» that is not maximal.  For other
records the script set s enddate = null
The problem is that the script is running for 6 hours, the main percentage
of time is taken by the rebuilding of indexes.

Would you be so kind to help us with the information whether there is any
possibility to update record without rebuilding indexes. Or may be there is
some programming technique to solve the problem of this kind.

Sorry for my English, I’m not a native speaker

Best regards

--Our script--

-- 0 minute
drop table if exists tempsvjul1;
create table tempsvjul1 (id bigint not null, code text not null, datapostuch
text, datavip text, constraint tempsvjul1_pkey primary key (id));

-- 30 minute
insert into tempsvjul1(id, code, datapostuch, datavip) select id,
data->>'OGRN', data -> 'SvUchetNO' ->> 'DataPostUch', data ->> 'DataVip'
from svjul;

-- 1 minute
create index idx_deduplicate_tempsvjul1_btree_codedpostdvipid on tempsvjul1
using btree (code, datapostuch desc, datavip desc, id desc);

-- 0 minute
drop table if exists tempsvjul2;
create table tempsvjul2(id bigint not null, constraint tempsvjul2_pkey
primary key (id));

-- 2 minute
insert into tempsvjul2(id) select distinct on (code) id from tempsvjul1
order by code, datapostuch desc, datavip desc, id desc;

-- 4 minute without indexes 6 hour with indexes
update svjul set
    code = coalesce((select tempsvjul1.code from tempsvjul1 where
tempsvjul1.id = svjul.id), '???'),
    enddate = case when exists(select 1 from tempsvjul2 where tempsvjul2.id
= svjul.id) then null else '2023-10-05'::date end;

-- 0 minute
drop table tempsvjul1;
drop table tempsvjul2;


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18165: Could not duplicate handle for "Global/PostgreSQL.xxxxxxxxxx": Bad file descriptor
Next
From: Tom Lane
Date:
Subject: Re: BUG #18165: Could not duplicate handle for "Global/PostgreSQL.xxxxxxxxxx": Bad file descriptor