Home > mailing lists

Moving data from huge table slow, min() query on indexed column taking 38s - Mailing list pgsql-general

From	Dimitrios Apostolou
Subject	Moving data from huge table slow, min() query on indexed column taking 38s
Date	July 6, 2023 15:00:19
Msg-id	096aa4b3-892e-7fa5-7370-3a1895e7d6a0@gmx.net Whole thread Raw
Responses	Re: Moving data from huge table slow, min() query on indexed column taking 38s Re: Moving data from huge table slow, min() query on indexed column taking 38s
List	pgsql-general

Tree view

I have a huge (10G rows) table "tbl_legacy" named test_runs that I want to
PARTITION BY RANGE on an indexed integer column.

I have created the new partitioned table "tbl" and 1000 partitions in it,
so that it can take my existing data and have the partitions ready for the
next year (so most of the new partitions will be empty for now).

On the topic of bulk-transferring the data from tbl_legacy to tbl:

+ First I chose the method to DELETE...INSERT everything.

   I noticed that the postgres process was growing without bounds, up to
   the point that it consumed almost all of swap space and I had to kill
   the command. Any idea why that happens?

+ Then I broke the command above in many small chunks

       WITH rows AS (
             DELETE FROM tbl_legacy AS t
                 WHERE (partition_key_column < $1)
                 RETURNING t.*
         ) INSERT INTO tbl SELECT * FROM rows;
       COMMIT;

   I increase the parameter $1 and keep going in a loop.  At first this
   goes OK, after one day though I notice that it has slowed down
   significantly. My investigation shows that the simple query

     SELECT min(partition_key_column) from tbl_legacy;

   takes 38s, despite having an index on the column! A VACUUM fixes that,
   so I guess the index has a ton of dead tuples. I guess autovacuum does
   not do its job because the table is constantly busy.

   Unfortunately VACUUM takes long (several hours) on this huge table, so I
   can't add in the loop after the DELETE command.

   Is there a better way to avoid the bloat in the loop above?  Why can't
   the DELETE command update the index by pointing the minimum element
   beyond the dead tuples?

+ Finally I resorted to just copying the data:

     INSERT INTO tbl SELECT * FROM tbl_legacy;

   This took half a day or so but worked without blowing up the memory
   usage like the 1st query.  Why is that?

   By the way, I don't consider the last solution ideal, since the data is
   duplicated between the two tables and it's harder to implement
   workarounds to do it without taking downtime. But it worked for me for
   this time. Any other ideas for partitioning an existing huge table?


Thanks in advance,
Dimitris

pgsql-general by date:

From: Pavel Stehule
Date: 06 July 2023, 14:42:38
Subject: Re: function to_char(iso-8859-1) is not unique at character 8

From: Dimitrios Apostolou
Date: 06 July 2023, 16:11:53
Subject: Re: Moving data from huge table slow, min() query on indexed column taking 38s

Moving data from huge table slow, min() query on indexed column taking 38s - Mailing list pgsql-general

Previous

Next