Thread: Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records

At Tue, 4 Oct 2022 17:15:31 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> On Tue, Oct 04, 2022 at 07:53:11PM -0400, Tom Lane wrote:
> > I wrote:
> >> PFA a quick-hack fix that solves this issue by making per-transaction
> >> subsidiary hash tables.  That's overkill perhaps; I'm a little worried
> >> about whether this slows down normal cases more than it's worth.
> >> But we ought to do something about this, because aside from the
> >> duplication aspect the current storage of these lists seems mighty
> >> space-inefficient.
> > 
> > After further thought, maybe it'd be better to do it as attached,
> > with one long-lived hash table for all the locks.  This is a shade
> > less space-efficient than the current code once you account for
> > dynahash overhead, but the per-transaction overhead should be lower
> > than the previous patch since we only need to create/destroy a hash
> > table entry not a whole hash table.

First one is straight forward outcome from the current implement but I
like the new one.  I agree that it is natural and that the expected
overhead per (typical) transaction is lower than both the first one
and doing the same operation on a list. I don't think that space
inefficiency in that extent doesn't matter since it is the startup
process.

> This feels like a natural way to solve this problem.  I saw several cases
> of the issue that was fixed with 6301c3a, so I'm inclined to believe this
> usage pattern is actually somewhat common.

So releasing locks becomes somewhat slower? But it seems to still be
far faster than massively repetitive head-removal in a list.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Tue, 4 Oct 2022 17:15:31 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in
>> On Tue, Oct 04, 2022 at 07:53:11PM -0400, Tom Lane wrote:
>>> After further thought, maybe it'd be better to do it as attached,
>>> with one long-lived hash table for all the locks.

> First one is straight forward outcome from the current implement but I
> like the new one.  I agree that it is natural and that the expected
> overhead per (typical) transaction is lower than both the first one
> and doing the same operation on a list. I don't think that space
> inefficiency in that extent doesn't matter since it is the startup
> process.

To get some hard numbers about this, I made a quick hack to collect
getrusage() numbers for the startup process (patch attached for
documentation purposes).  I then ran the recovery/t/027_stream_regress.pl
test a few times and collected the stats (also attached).  This seems
like a reasonably decent baseline test, since the core regression tests
certainly take lots of AccessExclusiveLocks what with all the DDL
involved, though they shouldn't ever take large numbers at once.  Also
they don't run long enough for any lock list bloat to occur, so these
numbers don't reflect a case where the patches would provide benefit.

If you look hard, there's maybe about a 1% user-CPU penalty for patch 2,
although that's well below the run-to-run variation so it's hard to be
sure that it's real.  The same comments apply to the max resident size
stats.  So I'm comforted that there's not a significant penalty here.

I'll go ahead with patch 2 if there's not objection.

One other point to discuss: should we consider back-patching?  I've
got mixed feelings about that myself.  I don't think that cases where
this helps significantly are at all mainstream, so I'm kind of leaning
to "patch HEAD only".

            regards, tom lane

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index f99186eab7..8fcbf898f9 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -32,6 +32,7 @@
 #include "storage/pmsignal.h"
 #include "storage/procsignal.h"
 #include "storage/standby.h"
+#include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/timeout.h"
@@ -219,6 +220,8 @@ StartupProcExit(int code, Datum arg)
     /* Shutdown the recovery environment */
     if (standbyState != STANDBY_DISABLED)
         ShutdownRecoveryTransactionEnvironment();
+
+    ShowUsage("STARTUP PROCESS STATISTICS");
 }


@@ -229,6 +232,8 @@ StartupProcExit(int code, Datum arg)
 void
 StartupProcessMain(void)
 {
+    ResetUsage();
+
     /* Arrange to clean up at startup process exit */
     on_shmem_exit(StartupProcExit, 0);

Startup process stats from 027_stream_regress_standby_1.log

HEAD, --disable-cassert:

2022-10-05 10:57:15.561 EDT [2390709] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 10:57:15.561 EDT [2390709] DETAIL:  ! system usage stats:
        !       0.935814 s user, 1.019952 s system, 7.226729 s elapsed
        !       [0.936104 s user, 1.019952 s system total]
        !       9088 kB max resident size
        !       0/430608 [0/430608] filesystem blocks in/out
        !       0/916 [0/1026] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9925/180 [9925/180] voluntary/involuntary context switches

2022-10-05 11:02:58.542 EDT [2406908] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 11:02:58.542 EDT [2406908] DETAIL:  ! system usage stats:
        !       0.957075 s user, 1.029002 s system, 7.206675 s elapsed
        !       [0.957075 s user, 1.029260 s system total]
        !       9124 kB max resident size
        !       0/430512 [0/430512] filesystem blocks in/out
        !       0/919 [0/1028] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9575/174 [9575/175] voluntary/involuntary context switches

2022-10-05 11:03:35.436 EDT [2409026] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 11:03:35.436 EDT [2409026] DETAIL:  ! system usage stats:
        !       0.936716 s user, 1.117313 s system, 7.208555 s elapsed
        !       [0.936716 s user, 1.117606 s system total]
        !       9220 kB max resident size
        !       0/429712 [0/429712] filesystem blocks in/out
        !       0/917 [0/1028] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9509/178 [9509/178] voluntary/involuntary context switches

with Patch 1:

2022-10-05 10:58:50.088 EDT [2393477] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 10:58:50.088 EDT [2393477] DETAIL:  ! system usage stats:
        !       0.938248 s user, 1.103261 s system, 7.269850 s elapsed
        !       [0.938498 s user, 1.103261 s system total]
        !       9288 kB max resident size
        !       0/430912 [0/430912] filesystem blocks in/out
        !       0/936 [0/1046] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9649/176 [9649/176] voluntary/involuntary context switches

2022-10-05 10:59:29.687 EDT [2395617] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 10:59:29.687 EDT [2395617] DETAIL:  ! system usage stats:
        !       0.887786 s user, 1.147573 s system, 7.205528 s elapsed
        !       [0.888080 s user, 1.147573 s system total]
        !       9172 kB max resident size
        !       0/430704 [0/430704] filesystem blocks in/out
        !       0/931 [0/1041] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9366/188 [9366/188] voluntary/involuntary context switches

2022-10-05 10:59:54.584 EDT [2397710] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 10:59:54.584 EDT [2397710] DETAIL:  ! system usage stats:
        !       0.966489 s user, 0.903671 s system, 7.148393 s elapsed
        !       [0.966489 s user, 0.903919 s system total]
        !       9064 kB max resident size
        !       0/429376 [0/429376] filesystem blocks in/out
        !       0/932 [0/1042] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       10067/164 [10067/164] voluntary/involuntary context switches

with Patch 2:

2022-10-05 11:01:10.405 EDT [2400381] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 11:01:10.405 EDT [2400381] DETAIL:  ! system usage stats:
        !       0.964738 s user, 1.111669 s system, 7.258135 s elapsed
        !       [0.965012 s user, 1.111669 s system total]
        !       9232 kB max resident size
        !       0/430752 [0/430752] filesystem blocks in/out
        !       0/923 [0/1034] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9443/206 [9443/206] voluntary/involuntary context switches

2022-10-05 11:01:41.423 EDT [2402517] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 11:01:41.423 EDT [2402517] DETAIL:  ! system usage stats:
        !       0.941057 s user, 1.069866 s system, 7.194619 s elapsed
        !       [0.941318 s user, 1.069866 s system total]
        !       9140 kB max resident size
        !       0/430240 [0/430240] filesystem blocks in/out
        !       0/918 [0/1029] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9627/194 [9627/195] voluntary/involuntary context switches

2022-10-05 11:02:11.136 EDT [2404608] LOG:  STARTUP PROCESS STATISTICS
2022-10-05 11:02:11.136 EDT [2404608] DETAIL:  ! system usage stats:
        !       0.956157 s user, 1.041602 s system, 7.187873 s elapsed
        !       [0.956424 s user, 1.041602 s system total]
        !       9224 kB max resident size
        !       0/430448 [0/430448] filesystem blocks in/out
        !       0/923 [0/1033] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       9583/171 [9583/171] voluntary/involuntary context switches

On Wed, Oct 05, 2022 at 11:30:22AM -0400, Tom Lane wrote:
> One other point to discuss: should we consider back-patching?  I've
> got mixed feelings about that myself.  I don't think that cases where
> this helps significantly are at all mainstream, so I'm kind of leaning
> to "patch HEAD only".

+1.  It can always be back-patched in the future if there are additional
reports.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Wed, 5 Oct 2022 at 16:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > At Tue, 4 Oct 2022 17:15:31 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in
> >> On Tue, Oct 04, 2022 at 07:53:11PM -0400, Tom Lane wrote:
> >>> After further thought, maybe it'd be better to do it as attached,
> >>> with one long-lived hash table for all the locks.
>
> > First one is straight forward outcome from the current implement but I
> > like the new one.  I agree that it is natural and that the expected
> > overhead per (typical) transaction is lower than both the first one
> > and doing the same operation on a list. I don't think that space
> > inefficiency in that extent doesn't matter since it is the startup
> > process.
>
> To get some hard numbers about this, I made a quick hack to collect
> getrusage() numbers for the startup process (patch attached for
> documentation purposes).  I then ran the recovery/t/027_stream_regress.pl
> test a few times and collected the stats (also attached).  This seems
> like a reasonably decent baseline test, since the core regression tests
> certainly take lots of AccessExclusiveLocks what with all the DDL
> involved, though they shouldn't ever take large numbers at once.  Also
> they don't run long enough for any lock list bloat to occur, so these
> numbers don't reflect a case where the patches would provide benefit.
>
> If you look hard, there's maybe about a 1% user-CPU penalty for patch 2,
> although that's well below the run-to-run variation so it's hard to be
> sure that it's real.  The same comments apply to the max resident size
> stats.  So I'm comforted that there's not a significant penalty here.
>
> I'll go ahead with patch 2 if there's not objection.

Happy to see this change.

> One other point to discuss: should we consider back-patching?  I've
> got mixed feelings about that myself.  I don't think that cases where
> this helps significantly are at all mainstream, so I'm kind of leaning
> to "patch HEAD only".

It looks fine to eventually backpatch, since StandbyReleaseLockTree()
was optimized to only be called when the transaction had actually done
some AccessExclusiveLocks.

So the performance loss is minor and isolated to the users of such
locks, so I see no problems with it.

-- 
Simon Riggs                http://www.EnterpriseDB.com/