Thread: memory problems and crash of db when deleting data from table withthousands of partitions

Hi people,

I know this is actually known problem (https://stackoverflow.com/questions/49291451/postgres-10-3-heavily-partitioned-table-and-cannot-delete-any-records/58521850#58521850).

I would just like to add my voice and description of use case to this topic. If this could be repaired it would be amazing because we use new native partitioning really a lot in our company and we like it - well, not counting this problem into it....

I have this problem on PostgreSQL 11 (PostgreSQL 11.5 (Debian 11.5-3.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit) with partitioned table having  more levels of partitioning. Main table is partitioned by shops and each shops months (year-month) over several past years. Together several thousands of partitions and number is still growing.

When we simply insert new data (which is usual operation we do) all is perfect. But lately we needed to delete some wrong data over all partitions and PostgreSQL started to crash during this operation.

Crashes are always the same - PostgreSQL starts to use more and more memory and eventually is killed by OOM killer. I tried to fiddle with work_mem and other settings - nothing, database just crashes a bit later but crashes anyway.

Of course workaround works - I can use script to do deletion or update over each shop partition separately. There are only several dozens of monthly partitions for each shop so it work perfectly. But anyway if problem would be repaired and simple delete/update over top main table would be possible this would be much better.

Best regards

Josef Machytka
Fit Analytics, Berlin

On Wed, Oct 23, 2019 at 01:46:23PM +0200, Josef Machytka wrote:
>Hi people,
>
>I know this is actually known problem (

>https://stackoverflow.com/questions/49291451/postgres-10-3-heavily-partitioned-table-and-cannot-delete-any-records/58521850#58521850).
>
>
>I would just like to add my voice and description of use case to this
>topic. If this could be repaired it would be amazing because we use new
>native partitioning really a lot in our company and we like it - well, not
>counting this problem into it....
>
>I have this problem on PostgreSQL 11 (PostgreSQL 11.5 (Debian
>11.5-3.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
>6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit) with partitioned table having
>more levels of partitioning. Main table is partitioned by shops and each
>shops months (year-month) over several past years. Together several
>thousands of partitions and number is still growing.
>
>When we simply insert new data (which is usual operation we do) all is
>perfect. But lately we needed to delete some wrong data over all partitions
>and PostgreSQL started to crash during this operation.
>
>Crashes are always the same - PostgreSQL starts to use more and more memory
>and eventually is killed by OOM killer. I tried to fiddle with work_mem and
>other settings - nothing, database just crashes a bit later but crashes
>anyway.
>

Yeah, I think this is a known issue - there are cases where we're not
smart enough and end up opening/locking all the partitions, resulting in
excessive memory consumption (and OOM). I suppose this is one of those
cases, but I'd have to see memory context stats to know for sure.

Unfortunately, that's a design issue, and it's not going to be fixed in
backbranches. We're improving this - perhaps PostgreSQL 12 would
improve the behavior in your case, and hopefully 13 will do even better.

>Of course workaround works - I can use script to do deletion or update over
>each shop partition separately. There are only several dozens of monthly
>partitions for each shop so it work perfectly. But anyway if problem would
>be repaired and simple delete/update over top main table would be possible
>this would be much better.
>

Right, that's a reasonable workaround.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Thank you for your email, FYI - we now did tests with PostgreSQL 12 and unfortunately it is also not able to handle to our case. Only difference is that PG 12 is not killed by OOM killer and even does not crash - which is good. But it reports error "out of memory" and stops the statement. So at least it looks like much more stable then PG 11.

Regards

Josef Machytka

from postgresql log:

TopMemoryContext: 6685688 total in 213 blocks; 1229064 free (32 chunks); 5456624 used
  pgstat TabStatusArray lookup hash table: 1048576 total in 8 blocks; 230568 free (17 chunks); 818008 used
  TopTransactionContext: 8192 total in 1 blocks; 7744 free (2 chunks); 448 used
  Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used
  HandleParallelMessages: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
  TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used
  Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks); 13816 used
  RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used
  MessageContext: 52197329840 total in 8274 blocks; 75904 free (9 chunks); 52197253936 used
    partition directory: 32768 total in 3 blocks; 8544 free (7 chunks); 24224 used
  Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
  smgr relation table: 4194304 total in 10 blocks; 1597192 free (37 chunks); 2597112 used
  TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used
  Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
  TopPortalContext: 8192 total in 1 blocks; 7936 free (1 chunks); 256 used
  Relcache by OID: 1048576 total in 8 blocks; 208888 free (16 chunks); 839688 used
  CacheMemoryContext: 136577192 total in 29 blocks; 5037352 free (18 chunks); 131539840 used
    index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: test_out_of_stock_default_v2__start_date_end_date_country_g_idx
    index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: test_out_of_stock_zoot_defaul_start_date_end_date_country_g_idx
....
    index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: test_out_of_stock_zalora_2018_start_date_end_date_country__idx1
    19998 more child contexts containing 51415472 total in 39616 blocks; 12386152 free (10423 chunks); 39029320 used
  WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400 used
  PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used
  MdSmgr: 524288 total in 7 blocks; 42464 free (2 chunks); 481824 used
  LOCALLOCK hash: 2097152 total in 9 blocks; 100416 free (30 chunks); 1996736 used
  Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used
  ErrorContext: 8192 total in 1 blocks; 7936 free (3 chunks); 256 used
Grand total: 52401570368 bytes in 48395 blocks; 21121960 free (10700 chunks); 52380448408 used
2019-10-28 15:11:44.276 UTC [2746] upcload@queries ERROR:  out of memory
2019-10-28 15:11:44.276 UTC [2746] upcload@queries DETAIL:  Failed on request of size 16 in memory context "MessageContext".
2019-10-28 15:11:44.276 UTC [2746] upcload@queries STATEMENT:  ......

On Wed, 23 Oct 2019 at 14:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Wed, Oct 23, 2019 at 01:46:23PM +0200, Josef Machytka wrote:
>Hi people,
>
>I know this is actually known problem (
>https://stackoverflow.com/questions/49291451/postgres-10-3-heavily-partitioned-table-and-cannot-delete-any-records/58521850#58521850).
>
>
>I would just like to add my voice and description of use case to this
>topic. If this could be repaired it would be amazing because we use new
>native partitioning really a lot in our company and we like it - well, not
>counting this problem into it....
>
>I have this problem on PostgreSQL 11 (PostgreSQL 11.5 (Debian
>11.5-3.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian
>6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit) with partitioned table having
>more levels of partitioning. Main table is partitioned by shops and each
>shops months (year-month) over several past years. Together several
>thousands of partitions and number is still growing.
>
>When we simply insert new data (which is usual operation we do) all is
>perfect. But lately we needed to delete some wrong data over all partitions
>and PostgreSQL started to crash during this operation.
>
>Crashes are always the same - PostgreSQL starts to use more and more memory
>and eventually is killed by OOM killer. I tried to fiddle with work_mem and
>other settings - nothing, database just crashes a bit later but crashes
>anyway.
>

Yeah, I think this is a known issue - there are cases where we're not
smart enough and end up opening/locking all the partitions, resulting in
excessive memory consumption (and OOM). I suppose this is one of those
cases, but I'd have to see memory context stats to know for sure.

Unfortunately, that's a design issue, and it's not going to be fixed in
backbranches. We're improving this - perhaps PostgreSQL 12 would
improve the behavior in your case, and hopefully 13 will do even better.

>Of course workaround works - I can use script to do deletion or update over
>each shop partition separately. There are only several dozens of monthly
>partitions for each shop so it work perfectly. But anyway if problem would
>be repaired and simple delete/update over top main table would be possible
>this would be much better.
>

Right, that's a reasonable workaround.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 28, 2019 at 04:18:59PM +0100, Josef Machytka wrote:
>Thank you for your email, FYI - we now did tests with PostgreSQL 12 and
>unfortunately it is also not able to handle to our case. Only difference is
>that PG 12 is not killed by OOM killer and even does not crash - which is
>good. But it reports error "out of memory" and stops the statement. So at
>least it looks like much more stable then PG 11.
>

Hmmm, this seems a bit weird to me:

  MessageContext: 52197329840 total in 8274 blocks; 75904 free (9 chunks);
52197253936 used

That context is generally meant for parse trees and other long-lived
stuff, and I wouldn't expect it to grow to 52GB of data, even if there
are many many partitions.

I wonder if this might be just another manifestation of the memory leak
from [1]. Can you provide a self-contained reproducer, i.e. a script I
could use to reproduce the issue?


[1] https://www.postgresql.org/message-id/20191024221758.vfv2enubnwmy3deu@development


regards


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Here are scripts which you can use to simulate problem:

- create_tables.sql - creates all partitions
- generate_data.sql - generates some data (technically you need only a few records, delete command will fail anyway)

and try command:
DELETE FROM bi.test_multilevel WHERE period_name = '....';

PostgreSQL 12 will start to use more and more memory and will stop operation with "out of memory" (PostgreSQL 11 would crash)

Regards

Josef Machytka


On Mon, 28 Oct 2019 at 17:24, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Mon, Oct 28, 2019 at 04:18:59PM +0100, Josef Machytka wrote:
>Thank you for your email, FYI - we now did tests with PostgreSQL 12 and
>unfortunately it is also not able to handle to our case. Only difference is
>that PG 12 is not killed by OOM killer and even does not crash - which is
>good. But it reports error "out of memory" and stops the statement. So at
>least it looks like much more stable then PG 11.
>

Hmmm, this seems a bit weird to me:

  MessageContext: 52197329840 total in 8274 blocks; 75904 free (9 chunks);
52197253936 used

That context is generally meant for parse trees and other long-lived
stuff, and I wouldn't expect it to grow to 52GB of data, even if there
are many many partitions.

I wonder if this might be just another manifestation of the memory leak
from [1]. Can you provide a self-contained reproducer, i.e. a script I
could use to reproduce the issue?


[1] https://www.postgresql.org/message-id/20191024221758.vfv2enubnwmy3deu@development


regards


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Oct 30, 2019 at 03:35:24PM +0100, Josef Machytka wrote:
>Here are scripts which you can use to simulate problem:
>
>- create_tables.sql - creates all partitions
>- generate_data.sql - generates some data (technically you need only a few
>records, delete command will fail anyway)
>
>and try command:
>DELETE FROM bi.test_multilevel WHERE period_name = '....';
>
>PostgreSQL 12 will start to use more and more memory and will stop
>operation with "out of memory" (PostgreSQL 11 would crash)
>

Thanks for the scripts, I'm able to reproduce the issue. It does seem
most of the memory is allocated in inheritance_planner, where we do this
(around line 1500)

    foreach(lc, child_appinfos)
    {
        ...

        /*
         * Generate modified query with this rel as target.  We first apply
         * adjust_appendrel_attrs, which copies the Query and changes
         * references to the parent RTE to refer to the current child RTE,
         * then fool around with subquery RTEs.
         */
        subroot->parse = (Query *)
        adjust_appendrel_attrs(subroot,
                                   (Node *) parent_parse,
                                   1, &appinfo);

        ...
    }


This unfortunately allocates a Query struct that is about ~4.3MB *per
partition*, and you have ~10000 of them, so 43GB in total.

Unfortunately, this does not seem like a memory leak - we actually do
need the structure, and it happens to be pretty large :-( So IMHO it's
working as designed, it just wasn't optimized for cases with many
partitions yet :-(

Chances are we'll improve this in future releases (13+), but I very much
doubt we can do much in existing releases - we tend not to make big
changes there, and I don't see a simple change addressing this.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services