Thread: Moving data from huge table slow, min() query on indexed column taking 38s

Moving data from huge table slow, min() query on indexed column taking 38s

From
Dimitrios Apostolou
Date:
I have a huge (10G rows) table "tbl_legacy" named test_runs that I want to
PARTITION BY RANGE on an indexed integer column.

I have created the new partitioned table "tbl" and 1000 partitions in it,
so that it can take my existing data and have the partitions ready for the
next year (so most of the new partitions will be empty for now).

On the topic of bulk-transferring the data from tbl_legacy to tbl:

+ First I chose the method to DELETE...INSERT everything.

   I noticed that the postgres process was growing without bounds, up to
   the point that it consumed almost all of swap space and I had to kill
   the command. Any idea why that happens?

+ Then I broke the command above in many small chunks

       WITH rows AS (
             DELETE FROM tbl_legacy AS t
                 WHERE (partition_key_column < $1)
                 RETURNING t.*
         ) INSERT INTO tbl SELECT * FROM rows;
       COMMIT;

   I increase the parameter $1 and keep going in a loop.  At first this
   goes OK, after one day though I notice that it has slowed down
   significantly. My investigation shows that the simple query

     SELECT min(partition_key_column) from tbl_legacy;

   takes 38s, despite having an index on the column! A VACUUM fixes that,
   so I guess the index has a ton of dead tuples. I guess autovacuum does
   not do its job because the table is constantly busy.

   Unfortunately VACUUM takes long (several hours) on this huge table, so I
   can't add in the loop after the DELETE command.

   Is there a better way to avoid the bloat in the loop above?  Why can't
   the DELETE command update the index by pointing the minimum element
   beyond the dead tuples?

+ Finally I resorted to just copying the data:

     INSERT INTO tbl SELECT * FROM tbl_legacy;

   This took half a day or so but worked without blowing up the memory
   usage like the 1st query.  Why is that?

   By the way, I don't consider the last solution ideal, since the data is
   duplicated between the two tables and it's harder to implement
   workarounds to do it without taking downtime. But it worked for me for
   this time. Any other ideas for partitioning an existing huge table?


Thanks in advance,
Dimitris



Re: Moving data from huge table slow, min() query on indexed column taking 38s

From
Dimitrios Apostolou
Date:
On Thu, 6 Jul 2023, Dimitrios Apostolou wrote:

> + First I chose the method to DELETE...INSERT everything.

Just to clarify, the query looked more like

   WITH rows AS ( DELETE FROM tbl_legacy RETURNING * )
     INSERT INTO tbl
       SELECT * FROM rows;

>   I noticed that the postgres process was growing without bounds, up to
>   the point that it consumed almost all of swap space and I had to kill
>   the command. Any idea why that happens?

Also note that my settings for work_mem, temp_buffers, shared_buffers etc
are all well below the RAM size and postgres has never shown unbound
growth again. Postgres version is 15.2.


Dimitris




Re: Moving data from huge table slow, min() query on indexed column taking 38s

From
Lorusso Domenico
Date:
Hello,
this is a standard problem during bulk copy.

here some suggestions; for example disable indexes.

The main issue is related to index, lock escalation and log writing.
In other dbms you should set log off on the table, but postgresql does not seem to have this feature.

Anyway, using an explicit lock table exclusive should prevent lock escalation.

So: disable indexes in target table
lock exclusive both table
insert data
truncate old table

If this doesn't work you can consider using the copy command.







Il giorno gio 6 lug 2023 alle ore 18:12 Dimitrios Apostolou <jimis@gmx.net> ha scritto:
On Thu, 6 Jul 2023, Dimitrios Apostolou wrote:

> + First I chose the method to DELETE...INSERT everything.

Just to clarify, the query looked more like

   WITH rows AS ( DELETE FROM tbl_legacy RETURNING * )
     INSERT INTO tbl
       SELECT * FROM rows;

>   I noticed that the postgres process was growing without bounds, up to
>   the point that it consumed almost all of swap space and I had to kill
>   the command. Any idea why that happens?

Also note that my settings for work_mem, temp_buffers, shared_buffers etc
are all well below the RAM size and postgres has never shown unbound
growth again. Postgres version is 15.2.


Dimitris





--
Domenico L.

per stupire mezz'ora basta un libro di storia,
io cercai di imparare la Treccani a memoria... [F.d.A.]

Re: Moving data from huge table slow, min() query on indexed column taking 38s

From
Dimitrios Apostolou
Date:
Thanks! I have completed the transfer by taking down the database and
working exclusively on it, but I still wonder how one could do it in
production without exclusive locks. The loop with small DELETE...INSERT
transactions I posted on the parent post bloats the table fast.

The link you posted contains very useful info, I was not expecting that
the constraints could blow up the memory like that. Comments from me:

Disabling and then re-enabling the foreign key constraints is easily done
with ALTER TABLE.

Unfortunately it doesn't seem to be the same case for indices. One has to
create the table without indices, and then create the indices separately.
With such a process there is a risk of ending up with non-identical
table schemas.

By the way, with COPY one must use an intermediate file, right? There is
no way to COPY from table to table directly?


Thanks,
Dimitris

On Thu, 6 Jul 2023, Lorusso Domenico wrote:

> Hello,
> this is a standard problem during bulk copy.
>
> here some suggestions; for example disable indexes.
>
> The main issue is related to index, lock escalation and log writing.
> In other dbms you should set log off on the table, but postgresql does not seem to have this feature.
>
> Anyway, using an explicit lock table exclusive should prevent lock escalation.
>
> So: disable indexes in target table
> lock exclusive both table
> insert data
> truncate old table
>
> If this doesn't work you can consider using the copy command.
>
>
>
>
>
>
>
> Il giorno gio 6 lug 2023 alle ore 18:12 Dimitrios Apostolou <jimis@gmx.net> ha scritto:
>       On Thu, 6 Jul 2023, Dimitrios Apostolou wrote:
>
>       > + First I chose the method to DELETE...INSERT everything.
>
>       Just to clarify, the query looked more like
>
>          WITH rows AS ( DELETE FROM tbl_legacy RETURNING * )
>            INSERT INTO tbl
>              SELECT * FROM rows;
>
>       >   I noticed that the postgres process was growing without bounds, up to
>       >   the point that it consumed almost all of swap space and I had to kill
>       >   the command. Any idea why that happens?
>
>       Also note that my settings for work_mem, temp_buffers, shared_buffers etc
>       are all well below the RAM size and postgres has never shown unbound
>       growth again. Postgres version is 15.2.
>
>
>       Dimitris
>
>
>
>
>
> --
> Domenico L.
>
> per stupire mezz'ora basta un libro di storia,
> io cercai di imparare la Treccani a memoria... [F.d.A.]
>
>

Re: Moving data from huge table slow, min() query on indexed column taking 38s

From
Lorusso Domenico
Date:
Hello Dimitry,
at the end, a table is a file with many other functionalities, these functionalities consume resources.

If the DBMS (like oracle or db2) allow us to disable the functionalities so we can perform a copy between tables, otherwise (and often also for oracle and db2) the best approach is to use an export.

because export /import functionalities are very optimized to do their job.

Anyway, when you approach as DBA you must block the db or at least a table.
Don't try to  reorg schema or db with connected users.

Il giorno lun 10 lug 2023 alle ore 17:58 Dimitrios Apostolou <jimis@gmx.net> ha scritto:
Thanks! I have completed the transfer by taking down the database and
working exclusively on it, but I still wonder how one could do it in
production without exclusive locks. The loop with small DELETE...INSERT
transactions I posted on the parent post bloats the table fast.

The link you posted contains very useful info, I was not expecting that
the constraints could blow up the memory like that. Comments from me:

Disabling and then re-enabling the foreign key constraints is easily done
with ALTER TABLE.

Unfortunately it doesn't seem to be the same case for indices. One has to
create the table without indices, and then create the indices separately.
With such a process there is a risk of ending up with non-identical
table schemas.

By the way, with COPY one must use an intermediate file, right? There is
no way to COPY from table to table directly?


Thanks,
Dimitris

On Thu, 6 Jul 2023, Lorusso Domenico wrote:

> Hello,
> this is a standard problem during bulk copy.
>
> here some suggestions; for example disable indexes.
>
> The main issue is related to index, lock escalation and log writing.
> In other dbms you should set log off on the table, but postgresql does not seem to have this feature.
>
> Anyway, using an explicit lock table exclusive should prevent lock escalation.
>
> So: disable indexes in target table
> lock exclusive both table
> insert data
> truncate old table
>
> If this doesn't work you can consider using the copy command.
>
>
>
>
>
>
>
> Il giorno gio 6 lug 2023 alle ore 18:12 Dimitrios Apostolou <jimis@gmx.net> ha scritto:
>       On Thu, 6 Jul 2023, Dimitrios Apostolou wrote:
>
>       > + First I chose the method to DELETE...INSERT everything.
>
>       Just to clarify, the query looked more like
>
>          WITH rows AS ( DELETE FROM tbl_legacy RETURNING * )
>            INSERT INTO tbl
>              SELECT * FROM rows;
>
>       >   I noticed that the postgres process was growing without bounds, up to
>       >   the point that it consumed almost all of swap space and I had to kill
>       >   the command. Any idea why that happens?
>
>       Also note that my settings for work_mem, temp_buffers, shared_buffers etc
>       are all well below the RAM size and postgres has never shown unbound
>       growth again. Postgres version is 15.2.
>
>
>       Dimitris
>
>
>
>
>
> --
> Domenico L.
>
> per stupire mezz'ora basta un libro di storia,
> io cercai di imparare la Treccani a memoria... [F.d.A.]
>
>


--
Domenico L.

per stupire mezz'ora basta un libro di storia,
io cercai di imparare la Treccani a memoria... [F.d.A.]

Re: Moving data from huge table slow, min() query on indexed column taking 38s

From
Dimitrios Apostolou
Date:
On Thu, 6 Jul 2023, Dimitrios Apostolou wrote:
> + Then I broke the command above in many small chunks
>
>       WITH rows AS (
>             DELETE FROM tbl_legacy AS t
>                 WHERE (partition_key_column < $1)
>                 RETURNING t.*
>        )  INSERT INTO tbl SELECT * FROM rows;
>       COMMIT;
>
>   I increase the parameter $1 and keep going in a loop.  At first this
>   goes OK, after one day though I notice that it has slowed down
>   significantly. My investigation shows that the simple query
>
>    SELECT min(partition_key_column) from tbl_legacy;
>
>   takes 38s, despite having an index on the column! A VACUUM fixes that,
>   so I guess the index has a ton of dead tuples. I guess autovacuum does
>   not do its job because the table is constantly busy.
>
>   Unfortunately VACUUM takes long (several hours) on this huge table, so I
>   can't add in the loop after the DELETE command.
>
>   Is there a better way to avoid the bloat in the loop above?  Why can't
>   the DELETE command update the index by pointing the minimum element
>   beyond the dead tuples?

Any comments on this one? It annoys me that a simple loop deteriorated so
much and kept filling the table with bloat. What is that VACUUM does that
DELETE can't do, to keep the index fresh?