Thread: memory problems and crash of db when deleting data from table withthousands of partitions

memory problems and crash of db when deleting data from table withthousands of partitions

From

Josef Machytka

Date:

23 October 2019, 11:46:23

Hi people,

I know this is actually known problem (https://stackoverflow.com/questions/49291451/postgres-10-3-heavily-partitioned-table-and-cannot-delete-any-records/58521850#58521850).

I would just like to add my voice and description of use case to this topic. If this could be repaired it would be amazing because we use new native partitioning really a lot in our company and we like it - well, not counting this problem into it....

I have this problem on PostgreSQL 11 (PostgreSQL 11.5 (Debian 11.5-3.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit) with partitioned table having more levels of partitioning. Main table is partitioned by shops and each shops months (year-month) over several past years. Together several thousands of partitions and number is still growing.

When we simply insert new data (which is usual operation we do) all is perfect. But lately we needed to delete some wrong data over all partitions and PostgreSQL started to crash during this operation.

Crashes are always the same - PostgreSQL starts to use more and more memory and eventually is killed by OOM killer. I tried to fiddle with work_mem and other settings - nothing, database just crashes a bit later but crashes anyway.

Of course workaround works - I can use script to do deletion or update over each shop partition separately. There are only several dozens of monthly partitions for each shop so it work perfectly. But anyway if problem would be repaired and simple delete/update over top main table would be possible this would be much better.

Best regards

Josef Machytka

Fit Analytics, Berlin

Re: memory problems and crash of db when deleting data from tablewith thousands of partitions

From

Tomas Vondra

Date:

23 October 2019, 12:47:51

On Wed, Oct 23, 2019 at 01:46:23PM +0200, Josef Machytka wrote:
>Hi people,
>
>I know this is actually known problem (

>https://stackoverflow.com/questions/49291451/postgres-10-3-heavily-partitioned-table-and-cannot-delete-any-records/58521850#58521850).
>
>
>I would just like to add my voice and description of use case to this
>topic. If this could be repaired it would be amazing because we use new
>native partitioning really a lot in our company and we like it - well, not
>counting this problem into it....
>
>I have this problem on PostgreSQL 11 (PostgreSQL 11.5 (Debian
>11.5-3.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
>6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit) with partitioned table having
>more levels of partitioning. Main table is partitioned by shops and each
>shops months (year-month) over several past years. Together several
>thousands of partitions and number is still growing.
>
>When we simply insert new data (which is usual operation we do) all is
>perfect. But lately we needed to delete some wrong data over all partitions
>and PostgreSQL started to crash during this operation.
>
>Crashes are always the same - PostgreSQL starts to use more and more memory
>and eventually is killed by OOM killer. I tried to fiddle with work_mem and
>other settings - nothing, database just crashes a bit later but crashes
>anyway.
>

Yeah, I think this is a known issue - there are cases where we're not
smart enough and end up opening/locking all the partitions, resulting in
excessive memory consumption (and OOM). I suppose this is one of those
cases, but I'd have to see memory context stats to know for sure.

Unfortunately, that's a design issue, and it's not going to be fixed in
backbranches. We're improving this - perhaps PostgreSQL 12 would
improve the behavior in your case, and hopefully 13 will do even better.

>Of course workaround works - I can use script to do deletion or update over
>each shop partition separately. There are only several dozens of monthly
>partitions for each shop so it work perfectly. But anyway if problem would
>be repaired and simple delete/update over top main table would be possible
>this would be much better.
>

Right, that's a reasonable workaround.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: memory problems and crash of db when deleting data from tablewith thousands of partitions

From

Josef Machytka

Date:

28 October 2019, 15:18:59

Thank you for your email, FYI - we now did tests with PostgreSQL 12 and unfortunately it is also not able to handle to our case. Only difference is that PG 12 is not killed by OOM killer and even does not crash - which is good. But it reports error "out of memory" and stops the statement. So at least it looks like much more stable then PG 11.

Regards

Josef Machytka

from postgresql log:

TopMemoryContext: 6685688 total in 213 blocks; 1229064 free (32 chunks); 5456624 used
pgstat TabStatusArray lookup hash table: 1048576 total in 8 blocks; 230568 free (17 chunks); 818008 used
TopTransactionContext: 8192 total in 1 blocks; 7744 free (2 chunks); 448 used
Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used
HandleParallelMessages: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used
Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks); 13816 used
RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used
MessageContext: 52197329840 total in 8274 blocks; 75904 free (9 chunks); 52197253936 used
partition directory: 32768 total in 3 blocks; 8544 free (7 chunks); 24224 used
Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
smgr relation table: 4194304 total in 10 blocks; 1597192 free (37 chunks); 2597112 used
TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used
Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
TopPortalContext: 8192 total in 1 blocks; 7936 free (1 chunks); 256 used
Relcache by OID: 1048576 total in 8 blocks; 208888 free (16 chunks); 839688 used
CacheMemoryContext: 136577192 total in 29 blocks; 5037352 free (18 chunks); 131539840 used
index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: test_out_of_stock_default_v2__start_date_end_date_country_g_idx
index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: test_out_of_stock_zoot_defaul_start_date_end_date_country_g_idx

....

index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: test_out_of_stock_zalora_2018_start_date_end_date_country__idx1
19998 more child contexts containing 51415472 total in 39616 blocks; 12386152 free (10423 chunks); 39029320 used
WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400 used
PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used
MdSmgr: 524288 total in 7 blocks; 42464 free (2 chunks); 481824 used
LOCALLOCK hash: 2097152 total in 9 blocks; 100416 free (30 chunks); 1996736 used
Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used
ErrorContext: 8192 total in 1 blocks; 7936 free (3 chunks); 256 used
Grand total: 52401570368 bytes in 48395 blocks; 21121960 free (10700 chunks); 52380448408 used

2019-10-28 15:11:44.276 UTC [2746] upcload@queries ERROR: out of memory
2019-10-28 15:11:44.276 UTC [2746] upcload@queries DETAIL: Failed on request of size 16 in memory context "MessageContext".
2019-10-28 15:11:44.276 UTC [2746] upcload@queries STATEMENT: ......

On Wed, 23 Oct 2019 at 14:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Wed, Oct 23, 2019 at 01:46:23PM +0200, Josef Machytka wrote:
>Hi people,
>
>I know this is actually known problem (
>https://stackoverflow.com/questions/49291451/postgres-10-3-heavily-partitioned-table-and-cannot-delete-any-records/58521850#58521850).
>
>
>I would just like to add my voice and description of use case to this
>topic. If this could be repaired it would be amazing because we use new
>native partitioning really a lot in our company and we like it - well, not
>counting this problem into it....
>
>I have this problem on PostgreSQL 11 (PostgreSQL 11.5 (Debian
>11.5-3.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
>6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit) with partitioned table having
>more levels of partitioning. Main table is partitioned by shops and each
>shops months (year-month) over several past years. Together several
>thousands of partitions and number is still growing.
>
>When we simply insert new data (which is usual operation we do) all is
>perfect. But lately we needed to delete some wrong data over all partitions
>and PostgreSQL started to crash during this operation.
>
>Crashes are always the same - PostgreSQL starts to use more and more memory
>and eventually is killed by OOM killer. I tried to fiddle with work_mem and
>other settings - nothing, database just crashes a bit later but crashes
>anyway.
>

Yeah, I think this is a known issue - there are cases where we're not
smart enough and end up opening/locking all the partitions, resulting in
excessive memory consumption (and OOM). I suppose this is one of those
cases, but I'd have to see memory context stats to know for sure.

Unfortunately, that's a design issue, and it's not going to be fixed in
backbranches. We're improving this - perhaps PostgreSQL 12 would
improve the behavior in your case, and hopefully 13 will do even better.

>Of course workaround works - I can use script to do deletion or update over
>each shop partition separately. There are only several dozens of monthly
>partitions for each shop so it work perfectly. But anyway if problem would
>be repaired and simple delete/update over top main table would be possible
>this would be much better.
>

Right, that's a reasonable workaround.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: memory problems and crash of db when deleting data from tablewith thousands of partitions

From

Tomas Vondra

Date:

28 October 2019, 16:24:46

On Mon, Oct 28, 2019 at 04:18:59PM +0100, Josef Machytka wrote:
>Thank you for your email, FYI - we now did tests with PostgreSQL 12 and
>unfortunately it is also not able to handle to our case. Only difference is
>that PG 12 is not killed by OOM killer and even does not crash - which is
>good. But it reports error "out of memory" and stops the statement. So at
>least it looks like much more stable then PG 11.
>

Hmmm, this seems a bit weird to me:

  MessageContext: 52197329840 total in 8274 blocks; 75904 free (9 chunks);
52197253936 used

That context is generally meant for parse trees and other long-lived
stuff, and I wouldn't expect it to grow to 52GB of data, even if there
are many many partitions.

I wonder if this might be just another manifestation of the memory leak
from [1]. Can you provide a self-contained reproducer, i.e. a script I
could use to reproduce the issue?

[1] https://www.postgresql.org/message-id/20191024221758.vfv2enubnwmy3deu@development

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: memory problems and crash of db when deleting data from tablewith thousands of partitions

From

Josef Machytka

Date:

30 October 2019, 14:35:24

Here are scripts which you can use to simulate problem:

- create_tables.sql - creates all partitions

- generate_data.sql - generates some data (technically you need only a few records, delete command will fail anyway)

and try command:

DELETE FROM bi.test_multilevel WHERE period_name = '....';

PostgreSQL 12 will start to use more and more memory and will stop operation with "out of memory" (PostgreSQL 11 would crash)

Regards

Josef Machytka

On Mon, 28 Oct 2019 at 17:24, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Mon, Oct 28, 2019 at 04:18:59PM +0100, Josef Machytka wrote:
>Thank you for your email, FYI - we now did tests with PostgreSQL 12 and
>unfortunately it is also not able to handle to our case. Only difference is
>that PG 12 is not killed by OOM killer and even does not crash - which is
>good. But it reports error "out of memory" and stops the statement. So at
>least it looks like much more stable then PG 11.
>

Hmmm, this seems a bit weird to me:

MessageContext: 52197329840 total in 8274 blocks; 75904 free (9 chunks);
52197253936 used

That context is generally meant for parse trees and other long-lived
stuff, and I wouldn't expect it to grow to 52GB of data, even if there
are many many partitions.

I wonder if this might be just another manifestation of the memory leak
from [1]. Can you provide a self-contained reproducer, i.e. a script I
could use to reproduce the issue?

[1] https://www.postgresql.org/message-id/20191024221758.vfv2enubnwmy3deu@development

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: memory problems and crash of db when deleting data from tablewith thousands of partitions

From

Tomas Vondra

Date:

30 October 2019, 18:16:12

On Wed, Oct 30, 2019 at 03:35:24PM +0100, Josef Machytka wrote:
>Here are scripts which you can use to simulate problem:
>
>- create_tables.sql - creates all partitions
>- generate_data.sql - generates some data (technically you need only a few
>records, delete command will fail anyway)
>
>and try command:
>DELETE FROM bi.test_multilevel WHERE period_name = '....';
>
>PostgreSQL 12 will start to use more and more memory and will stop
>operation with "out of memory" (PostgreSQL 11 would crash)
>

Thanks for the scripts, I'm able to reproduce the issue. It does seem
most of the memory is allocated in inheritance_planner, where we do this
(around line 1500)

    foreach(lc, child_appinfos)
    {
        ...

        /*
         * Generate modified query with this rel as target.  We first apply
         * adjust_appendrel_attrs, which copies the Query and changes
         * references to the parent RTE to refer to the current child RTE,
         * then fool around with subquery RTEs.
         */
        subroot->parse = (Query *)
        adjust_appendrel_attrs(subroot,
                                   (Node *) parent_parse,
                                   1, &appinfo);

        ...
    }


This unfortunately allocates a Query struct that is about ~4.3MB *per
partition*, and you have ~10000 of them, so 43GB in total.

Unfortunately, this does not seem like a memory leak - we actually do
need the structure, and it happens to be pretty large :-( So IMHO it's
working as designed, it just wasn't optimized for cases with many
partitions yet :-(

Chances are we'll improve this in future releases (13+), but I very much
doubt we can do much in existing releases - we tend not to make big
changes there, and I don't see a simple change addressing this.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services