Thread: error "can only drop stats once" brings down database

error "can only drop stats once" brings down database

From
Floris Van Nee
Date:

Hi,

 

On a database we have we’ve recently seen a fatal error occur twice. The error happened on two different physical replicas (of the same cluster) during a WAL redo action in the recovery process. They’re running Postgres 15.5.

 

Occurrence 1:

2024-02-01 06:55:54.476 CET,,,70290,,65a29b60.11292,6,,2024-01-13 15:17:04 CET,1/0,0,FATAL,XX000,"can only drop stats once",,,,,"WAL redo at A7BD1/D6F9B6C0 for Transaction/COMMIT: 2024-02-01 06:55:54.395851+01; dropped stats: 2/16405/2991517839 2/16405/2991517838 2/16405/2991517835; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 2991517835 snapshot 1214 relcache 2991517838 relcache 2991517839 relcache 2991517838 snapshot 2608 relcache 2991517838 relcache 2991517839 relcache 2991517835 snapshot 2608 relcache 2991517839 relcache 2991517838 snapshot 2608 relcache 2991517838 snapshot 2608 snapshot 2608 snapshot 2608 relcache 2991517835 snapshot 2608 snapshot 1214",,,,"","startup",,0

2024-02-01 06:55:56.793 CET,,,70282,,65a29b5f.1128a,8,,2024-01-13 15:17:03 CET,,0,LOG,00000,"startup process (PID 70290) exited with exit code 1",,,,,,,,,"","postmaster",,0 2024-02-01 06:55:56.793 CET,,,70282,,65a29b5f.1128a,9,,2024-01-13 15:17:03 CET,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,"","postmaster",,0 2024-02-01 06:55:57.145 CET,,,26624,"100.104.20.59:37478",65bb326d.6800,1,"",2024-02-01 06:55:57

 

Occurrence 2:

2024-02-09 19:15:41.353 CET,,,88714,,65a29b5e.15a8a,6,,2024-01-13 15:17:02 CET,1/0,0,FATAL,XX000,"can only drop stats once",,,,,"WAL redo at A95F1/3C9D3D88 for Transaction/COMMIT: 2024-02-09 19:15:41.33755+01; dropped stats: 2/16405/3843195112 2/16405/3843195111 2/16405/3843195107; inval msgs: catcache 80 catcache 79 catcache 80 catcache 79 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 55 catcache 54 catcache 55 catcache 54 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 catcache 80 catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 snapshot 2608 relcache 3843195107 snapshot 1214 relcache 3843195111 relcache 3843195112 relcache 3843195111 snapshot 2608 relcache 3843195111 relcache 3843195112 relcache 3843195107 snapshot 2608 relcache 3843195107 snapshot 2608 relcache 3843195107 snapshot 2608 relcache 3843195112 relcache 3843195111 snapshot 2608 relcache 3843195111 snapshot 2608 snapshot 2608 snapshot 2608 relcache 3843195107 snapshot 2608 snapshot 1214",,,,"","startup",,0

 

Googling the error only shows one conversation from last year: https://www.postgresql.org/message-id/flat/17947-b9554521ad963c9c%40postgresql.org

However, in that thread it is caused by logical replication slots, which this cluster does not have.

 

Does anyone have an idea what may be causing this?

 

Separately, it seems like Postgres did not restart by itself when this error occurred (normally if a backend crashes Postgres would kill all connections and initiate a restart), but in this case because it’s the recovery process that stops, the whole database just shuts down and needs to be manually restarted. Is that intended?

 

After manually starting the database again, recovery would continue as normal. The database would be in normal operation again.

 

-Floris

 

Re: error "can only drop stats once" brings down database

From
Tom Lane
Date:
Floris Van Nee <florisvannee@Optiver.com> writes:
> Hi,
> On a database we have we've recently seen a fatal error occur twice. The error happened on two different physical
replicas(of the same cluster) during a WAL redo action in the recovery process. They're running Postgres 15.5. 

> Occurrence 1:
> 2024-02-01 06:55:54.476 CET,,,70290,,65a29b60.11292,6,,2024-01-13 15:17:04 CET,1/0,0,FATAL,XX000,"can only drop stats
once",,,,,"WALredo at A7BD1/D6F9B6C0 for Transaction/COMMIT: 2024-02-01 06:55:54.395851+01; ... 

Hmm.  This must be coming from pgstat_drop_entry_internal.
I suspect the correct fix is in pgstat_drop_entry, along
the lines of

-    if (shent)
+    if (shent && !shent->dropped)

but it's not clear to me how the already-dropped case ought to affect
the function's bool result.  Also, how are we getting into a
concurrent-drop situation in recovery?

            regards, tom lane



RE: error "can only drop stats once" brings down database

From
Floris Van Nee
Date:
> Floris Van Nee <florisvannee@Optiver.com> writes:
> > Hi,
> > On a database we have we've recently seen a fatal error occur twice. The
> error happened on two different physical replicas (of the same cluster)
> during a WAL redo action in the recovery process. They're running Postgres
> 15.5.
>
> > Occurrence 1:
> > 2024-02-01 06:55:54.476 CET,,,70290,,65a29b60.11292,6,,2024-01-13 15:17:04
> CET,1/0,0,FATAL,XX000,"can only drop stats once",,,,,"WAL redo at
> A7BD1/D6F9B6C0 for Transaction/COMMIT: 2024-02-01 06:55:54.395851+01;
> ...
>
> Hmm.  This must be coming from pgstat_drop_entry_internal.
> I suspect the correct fix is in pgstat_drop_entry, along the lines of
>
> -    if (shent)
> +    if (shent && !shent->dropped)
>
> but it's not clear to me how the already-dropped case ought to affect the
> function's bool result.  Also, how are we getting into a concurrent-drop
> situation in recovery?

Anyone has further thoughts on this? This still happens occasionally.

-Floris




Re: error "can only drop stats once" brings down database

From
Tom Lane
Date:
Floris Van Nee <florisvannee@Optiver.com> writes:
>> Hmm.  This must be coming from pgstat_drop_entry_internal.
>> I suspect the correct fix is in pgstat_drop_entry, along the lines of
>> 
>> -    if (shent)
>> +    if (shent && !shent->dropped)
>> 
>> but it's not clear to me how the already-dropped case ought to affect the
>> function's bool result.  Also, how are we getting into a concurrent-drop
>> situation in recovery?

> Anyone has further thoughts on this? This still happens occasionally.

I was expecting Andres to comment on it eventually.  I don't know
that code well enough to want to modify it on my own authority,
especially not just before a release.

One thing that would be safe enough and perhaps useful is to
modify the "can only drop stats once" message to provide a
little more detail, like the hash key of the problematic
entry.  That might help us understand what's triggering this.
The OIDs would be opaque perhaps, but the PgStat_Kind value
seems useful to know.

            regards, tom lane



Re: error "can only drop stats once" brings down database

From
Alexander Lakhin
Date:
Hello Floris,

03.05.2024 21:10, Floris Van Nee wrote:
>> Floris Van Nee <florisvannee@Optiver.com> writes:
>>> Hi,
>>> On a database we have we've recently seen a fatal error occur twice. The
>> error happened on two different physical replicas (of the same cluster)
>> during a WAL redo action in the recovery process. They're running Postgres
>> 15.5.
>>
>>> Occurrence 1:
>>> 2024-02-01 06:55:54.476 CET,,,70290,,65a29b60.11292,6,,2024-01-13 15:17:04
>> CET,1/0,0,FATAL,XX000,"can only drop stats once",,,,,"WAL redo at
>> A7BD1/D6F9B6C0 for Transaction/COMMIT: 2024-02-01 06:55:54.395851+01;
>> ...
>>
>> Hmm.  This must be coming from pgstat_drop_entry_internal.
>> I suspect the correct fix is in pgstat_drop_entry, along the lines of
>>
>> -    if (shent)
>> +    if (shent && !shent->dropped)
>>
>> but it's not clear to me how the already-dropped case ought to affect the
>> function's bool result.  Also, how are we getting into a concurrent-drop
>> situation in recovery?
> Anyone has further thoughts on this? This still happens occasionally.
>

Please take a look at bug #17947 (maybe you encounter the same?):
https://www.postgresql.org/message-id/17947-b9554521ad963c9c%40postgresql.org

Best regards,
Alexander



Re: error "can only drop stats once" brings down database

From
Andres Freund
Date:
Hi,

On 2024-05-03 18:10:05 +0000, Floris Van Nee wrote:
> > Floris Van Nee <florisvannee@Optiver.com> writes:
> > > Hi,
> > > On a database we have we've recently seen a fatal error occur twice. The
> > error happened on two different physical replicas (of the same cluster)
> > during a WAL redo action in the recovery process. They're running Postgres
> > 15.5.
> >
> > > Occurrence 1:
> > > 2024-02-01 06:55:54.476 CET,,,70290,,65a29b60.11292,6,,2024-01-13 15:17:04
> > CET,1/0,0,FATAL,XX000,"can only drop stats once",,,,,"WAL redo at
> > A7BD1/D6F9B6C0 for Transaction/COMMIT: 2024-02-01 06:55:54.395851+01;
> > ...
> >
> > Hmm.  This must be coming from pgstat_drop_entry_internal.
> > I suspect the correct fix is in pgstat_drop_entry, along the lines of
> >
> > -    if (shent)
> > +    if (shent && !shent->dropped)
> >
> > but it's not clear to me how the already-dropped case ought to affect the
> > function's bool result.

I don't think that'd be quite right - just ignoring that we're confused about
tracking "stats object" liveliness seems likely to hide bugs.

Elsewhere in this thread you suggested adding more details about the error -
let's do that. Something like the attached might already be an improvement?


> Also, how are we getting into a concurrent-drop situation in recovery?

I'd like to know how we get into the situation too. It's perhaps worth noting
that stats can be generated on a standby, albeit not by the replay
process. But locking should prevent active use of the stats entry when it's
being dropped...


> Anyone has further thoughts on this? This still happens occasionally.

Do you have any more details about the workload leading to this issue? Is the
standby used for queries?  Given the "high value" your oids/relfilenodes have,
I assume there are a lot of created/dropped/truncated relations?

Greetings,

Andres Freund

Attachment

Re: error "can only drop stats once" brings down database

From
Andres Freund
Date:
Hi,

On 2024-05-05 09:09:15 -0700, Andres Freund wrote:
> Do you have any more details about the workload leading to this issue? Is the
> standby used for queries?  Given the "high value" your oids/relfilenodes have,
> I assume there are a lot of created/dropped/truncated relations?

I suspect that the "high value" oids are related to the issue. I bet you're
running into OID wraparound for objects at a much higher frequency than most
users.  I suspect the reason this causes problems is the problem I just found
and described here:
https://www.postgresql.org/message-id/20240505183741.5tefii3razzefvtc%40awork3.anarazel.de

Could you confirm that you have a) a lot of oid assignments b) your startup
process was running for a long time by the time of the crash?

The fact that we're not triggering "stats object garbage collection" for most
drops also explains why you're much more likely to see this on a standby than
on a primary. An a primary it's going to be rare to have a single backend live
long enough to observe an oid wraparound leading to one backend accessing
stats for the same object type with the same oid after that object previously
having been dropped. But most stats accesses on a standby are going to be by
the same process, the startup process.  You'd still need some other accesses
to prevent the object from being dropped "immediately", but that could be due
to shorter lived processes.

Greetings,

Andres Freund



RE: error "can only drop stats once" brings down database

From
Floris Van Nee
Date:
>
> Could you confirm that you have a) a lot of oid assignments b) your startup
> process was running for a long time by the time of the crash?
>

Thanks Andres. Both higher than average I guess, although it depends on what is considered 'a lot' and 'a long time'.
Thestartup process was running for a few months. There are ~500k entries in pg_class, of which most are (Timescale)
partitions.However, even with this number of items in pg_class I would't expect wraparound to happen frequently? These
arenot dropped/recreated. I've monitored "select count(*) from pg_class" for a while to see if it changes often, and
whilethere are changes during the day (likely temporary tables being created), it also doesn't nearly happen at a
frequencythat will get us to a wraparound quickly. 

Oids aren't just consumed by pg_class though. And I do see system oid growing quickly (when doing CREATE TABLE twice
witha minute in between, then checking the oid difference of the table). I don't know how to investigate the cause of
this.What would be the best way to check what could be consuming these oids so quickly? 

-Floris




RE: error "can only drop stats once" brings down database

From
Floris Van Nee
Date:
> I suspect that the "high value" oids are related to the issue. I bet you're
> running into OID wraparound for objects at a much higher frequency than
> most users.  I suspect the reason this causes problems is the problem I just
> found and described here:

I definitely suspect oid wraparound has something to do with it. However, given what I found on the other thread (
https://www.postgresql.org/message-id/flat/17947-b9554521ad963c9c%40postgresql.org ), I don't think the two have the
exactsame root cause. The other thread seems a clear case of 'forgot to call gc' when it should. 

>
> The fact that we're not triggering "stats object garbage collection" for most
> drops also explains why you're much more likely to see this on a standby
> than on a primary. An a primary it's going to be rare to have a single backend
> live long enough to observe an oid wraparound leading to one backend
> accessing stats for the same object type with the same oid after that object
> previously having been dropped. But most stats accesses on a standby are
> going to be by the same process, the startup process.  You'd still need some
> other accesses to prevent the object from being dropped "immediately", but
> that could be due to shorter lived processes.
>

As mentioned on other thread, I do think garbage collection gets called as part of pgstat_execute_transactional_drops
inrecovery. 

There's still the case of the "skip gc of entry if pending" though, however I struggle to understand how this can lead
tosuch an error in the recovery process after wraparound. I'd expect the gc to happen relatively frequently (at least
muchmore frequently than a wraparound) as tables get created/dropped quite frequently. Next to that, because the
recoveryprocess is the only one creating/dropping relations on standby, it should be impossible to get in a state where
therecovery "local" stats cache has a non-gc'd entry that should be dropped? Because when dropping, at least it removes
thelocal cache entry (even if it cannot remove the shared one). So later, when creating it again in
pgstat_get_entry_ref,it should always hit the pgstat_reinit_entry path? 

Would it make sense to at least commit your patch to enhance the error message a bit?


-Floris




Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Sun, Jun 02, 2024 at 10:01:53PM +0000, Floris Van Nee wrote:
> Would it make sense to at least commit your patch to enhance the
> error message a bit?

FWIW, I am +1 for the improvement of this error message.

Note that there is a very similar proposal here:
https://www.postgresql.org/message-id/ZkM30paAD8Cr/Bix@ip-10-97-1-34.eu-west-3.compute.internal

The only reason why I did not do one or the other yet is that this is
an improvement, hence I was waiting for v18 to open.  If somebody
feels differently, feel free to proceed before that with the format
you like.  Still I'd vote for the addition of the refcount to the
patch proposed on this thread, to provide more debugging info to the
information reported.
--
Michael

Attachment

RE: error "can only drop stats once" brings down database

From
Floris Van Nee
Date:
>
> The only reason why I did not do one or the other yet is that this is an
> improvement, hence I was waiting for v18 to open.  If somebody feels
> differently, feel free to proceed before that with the format you like.  Still I'd
> vote for the addition of the refcount to the patch proposed on this thread, to
> provide more debugging info to the information reported.

This patch would help in getting more info about this specific issue if it reoccurs, but in that case it's only helpful
ifit's backpatched. 
Agree the refcount addition makes sense.

-Floris




Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Wed, Jun 05, 2024 at 10:31:04AM +0000, Floris Van Nee wrote:
> This patch would help in getting more info about this specific issue
> if it reoccurs, but in that case it's only helpful if it's
> backpatched.
> Agree the refcount addition makes sense.

Adding Bertrand in CC as he was similarly playing with this area.

What would both of you think about the attached.  This is a mix of
what's been proposed on the two threads, with the addition of the
refcount.
--
Michael

Attachment

Re: error "can only drop stats once" brings down database

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jun 06, 2024 at 01:44:26PM +0900, Michael Paquier wrote:
> On Wed, Jun 05, 2024 at 10:31:04AM +0000, Floris Van Nee wrote:
> > This patch would help in getting more info about this specific issue
> > if it reoccurs, but in that case it's only helpful if it's
> > backpatched.
> > Agree the refcount addition makes sense.
> 
> Adding Bertrand in CC as he was similarly playing with this area.
> 
> What would both of you think about the attached.  This is a mix of
> what's been proposed on the two threads, with the addition of the
> refcount.

I like the idea of adding refcount in the error message. OTOH I think it's more
appropriate to make use of errdetail_internal() as proposed in [1], but that's
a nit.

[1]: https://www.postgresql.org/message-id/ZkRsoOMrSrNI945i%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: error "can only drop stats once" brings down database

From
Kyotaro Horiguchi
Date:
At Thu, 6 Jun 2024 06:05:11 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in 
> > What would both of you think about the attached.  This is a mix of
> > what's been proposed on the two threads, with the addition of the
> > refcount.
> 
> I like the idea of adding refcount in the error message. OTOH I think it's more
> appropriate to make use of errdetail_internal() as proposed in [1], but that's
> a nit.

To me, this is something that is not expected to happen, but if it
does, we would want detailed information. In that sense, it might be
better not to hide it in the DETAILS field.

regards.

> [1]: https://www.postgresql.org/message-id/ZkRsoOMrSrNI945i%40ip-10-97-1-34.eu-west-3.compute.internal

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Fri, Jun 07, 2024 at 10:15:40AM +0900, Kyotaro Horiguchi wrote:
> To me, this is something that is not expected to happen, but if it
> does, we would want detailed information. In that sense, it might be
> better not to hide it in the DETAILS field.

Same arguments here.  A simple elog() makes easier to get to this
data.
--
Michael

Attachment

Re: error "can only drop stats once" brings down database

From
Andres Freund
Date:
On 2024-06-07 11:46:04 +0900, Michael Paquier wrote:
> On Fri, Jun 07, 2024 at 10:15:40AM +0900, Kyotaro Horiguchi wrote:
> > To me, this is something that is not expected to happen, but if it
> > does, we would want detailed information. In that sense, it might be
> > better not to hide it in the DETAILS field.
> 
> Same arguments here.  A simple elog() makes easier to get to this
> data.

+1



Re: error "can only drop stats once" brings down database

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jun 06, 2024 at 08:12:58PM -0700, Andres Freund wrote:
> On 2024-06-07 11:46:04 +0900, Michael Paquier wrote:
> > On Fri, Jun 07, 2024 at 10:15:40AM +0900, Kyotaro Horiguchi wrote:
> > > To me, this is something that is not expected to happen, but if it
> > > does, we would want detailed information. In that sense, it might be
> > > better not to hide it in the DETAILS field.
> > 
> > Same arguments here.  A simple elog() makes easier to get to this
> > data.
> 
> +1

That was a nit, so fine by me to move on with a simple elog().

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Fri, Jun 07, 2024 at 09:32:13AM +0000, Bertrand Drouvot wrote:
> That was a nit, so fine by me to move on with a simple elog().

Okay, I've just applied the elog() version down to 15 then.  Now,
about the entry getting dropped twice..
--
Michael

Attachment

RE: error "can only drop stats once" brings down database

From
Floris Van Nee
Date:
> Okay, I've just applied the elog() version down to 15 then.  Now, about the
> entry getting dropped twice..

Thanks Michael.

I've got an update about the bug. I managed to reproduce it locally after a lot of digging.

How to repro:
- Setup primary + replica
- Open a psql session on both
- On primary session: create table t (a int); select 't'::regclass::oid;
- On replica session: select * from t;
- On primary session: drop table t; vacuum pg_class; checkpoint;
- Gdb attach to the backend for your primary, set a breakpoint for
catalog.c:GetNewOidWithIndex, just before it calls GetNewObjectId()
- On primary session: create table t (a int);
- When it hits breakpoint, simulate oid wraparound by setting:
ShmemVariableCache->nextOid = <the output value of the select earlier>
This will make pg create the new table with the same oid as the previous one.
- On primary session: drop table t; -- this triggers the replica to go down

The reason it crashes on replica is that the recovery process is responsible for dropping
stats on commit, but it's not creating them on table creation. Thus, on the second create
table call, the old shared stats entry still exists (due to a backend still have a ref to it),
but it is never reinitialized by the logic in pgstat_reinit_entry(). On primary it's not possible
to reach this state, because heap_create() creates the stats entry immediately when the
table is created.

I wonder what's the best way to fix this though.
Should redo process call pgstat_create_relation somewhere, just like heap_create does?
Should we just ignore this 'drop stats twice' error on standby?

-Floris



Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Sat, Jun 08, 2024 at 11:52:43AM +0000, Floris Van Nee wrote:
> I've got an update about the bug. I managed to reproduce it locally
> after a lot of digging.
>
> How to repro:
> - Setup primary + replica
> - Open a psql session on both
> - On primary session: create table t (a int); select 't'::regclass::oid;
> - On replica session: select * from t;
> - On primary session: drop table t; vacuum pg_class; checkpoint;
> - Gdb attach to the backend for your primary, set a breakpoint for
> catalog.c:GetNewOidWithIndex, just before it calls GetNewObjectId()
> - On primary session: create table t (a int);
> - When it hits breakpoint, simulate oid wraparound by setting:
> ShmemVariableCache->nextOid = <the output value of the select earlier>
> This will make pg create the new table with the same oid as the previous one.
> - On primary session: drop table t; -- this triggers the replica to go down

Okay, this stuff makes the beginning of a week fun.

> The reason it crashes on replica is that the recovery process is responsible for dropping
> stats on commit, but it's not creating them on table creation. Thus, on the second create
> table call, the old shared stats entry still exists (due to a backend still have a ref to it),
> but it is never reinitialized by the logic in pgstat_reinit_entry(). On primary it's not possible
> to reach this state, because heap_create() creates the stats entry immediately when the
> table is created.
>
> I wonder what's the best way to fix this though.
> Should redo process call pgstat_create_relation somewhere, just like heap_create does?
> Should we just ignore this 'drop stats twice' error on standby?

Nah, ignoring the double-drop error does not seem right to me.
Wouldn't it make the most sense to ensure that the stats are dropped
on the standby instead on the first DROP replayed even if there are
still references to it hold, making sure that the stats entry with
this OID is gone before reusing it after wraparound?
--
Michael

Attachment

RE: error "can only drop stats once" brings down database

From
Floris Van Nee
Date:
>
> Nah, ignoring the double-drop error does not seem right to me.
> Wouldn't it make the most sense to ensure that the stats are dropped on the
> standby instead on the first DROP replayed even if there are still references
> to it hold, making sure that the stats entry with this OID is gone before
> reusing it after wraparound?

Agree it doesn't seem right to just ignore. However I don't believe we can just
drop the stats entry on replay of the first drop, right? That's why all this logic
with pgstat_request_entry_refs_gc was invented in the first place, to get
backends to clean up their refs first. If we free it, backends can't see the
'dropped' attribute anymore. This can only be changed if the whole gc
mechanism gets revamped to use a different mechanism to notify backends
that their local reference in pgStatEntryRefHash is suddenly invalid. I doubt
that's backpatchable.

On primary this doesn't happen because it's guaranteed that 'drop stats' can
only ever happen after a 'create stats' call. This is the thing that's different on
a replica: it is possible to get 'drop stats' twice, without a 'create stats' in between.
Thing is that even on primary there are already some edge cases that it handles
explicitly (eg. the create-drop-create case where on the second 'create' the old stats
entry hasn't been removed yet). If the create-drop-create has special logic to handle
the case that the old entry still exists, why wouldn't the create-drop-drop case have
the special handling as well?

Andres, would you be able to chime in on this?

-Floris




Re: error "can only drop stats once" brings down database

From
Kyotaro Horiguchi
Date:
At Mon, 10 Jun 2024 15:13:16 +0000, Floris Van Nee <florisvannee@Optiver.com> wrote in 
> > 
> > Nah, ignoring the double-drop error does not seem right to me.
> > Wouldn't it make the most sense to ensure that the stats are dropped on the
> > standby instead on the first DROP replayed even if there are still references
> > to it hold, making sure that the stats entry with this OID is gone before
> > reusing it after wraparound?
> 
> Agree it doesn't seem right to just ignore. However I don't believe we can just
> drop the stats entry on replay of the first drop, right? That's why all this logic
> with pgstat_request_entry_refs_gc was invented in the first place, to get
> backends to clean up their refs first. If we free it, backends can't see the
> 'dropped' attribute anymore. This can only be changed if the whole gc
> mechanism gets revamped to use a different mechanism to notify backends
> that their local reference in pgStatEntryRefHash is suddenly invalid. I doubt
> that's backpatchable. 
> 
> On primary this doesn't happen because it's guaranteed that 'drop stats' can
> only ever happen after a 'create stats' call. This is the thing that's different on
> a replica: it is possible to get 'drop stats' twice, without a 'create stats' in between.

We don't simply allow double-drop; instead, we permit this kind of
behavior only during recovery by signaling the redo-execution
functions with the parameter isRedo, like smgr_unlink().

> Thing is that even on primary there are already some edge cases that it handles
> explicitly (eg. the create-drop-create case where on the second 'create' the old stats
> entry hasn't been removed yet). If the create-drop-create has special logic to handle
> the case that the old entry still exists, why wouldn't the create-drop-drop case have
> the special handling as well?

The two things look completely different to me. The former is an
indisputably legit operation, but the latter is indisputably
illegitimate as internal logic.

> Andres, would you be able to chime in on this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Tue, Jun 11, 2024 at 04:44:57PM +0900, Kyotaro Horiguchi wrote:
> We don't simply allow double-drop; instead, we permit this kind of
> behavior only during recovery by signaling the redo-execution
> functions with the parameter isRedo, like smgr_unlink().

Still, the case is different here, no?  Recovery forgives unlinks as
long as we are in crash recovery, or consistency has not been reached
on a standby.  This case is different: we are in a consistent state
and a backend on the standby still holds references to the stats entry
that is transparently reused after a wraparound.

I am wondering if we should tackle this problem by adding a notion of
"age" in the stats entries for what's stored in the central dshash and
in the hold copies in the backends, so as we dump the "age" of an
existing stats entry if we find out that it gets reused in a create
path.  That requires some design and would not be backpatchable, but
it does not seem that invasive at quick glance (I had my eyes on this
code the full week).  Other thoughts are welcome.
--
Michael

Attachment

Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Fri, Jun 14, 2024 at 09:37:35AM +0900, Michael Paquier wrote:
> I am wondering if we should tackle this problem by adding a notion of
> "age" in the stats entries for what's stored in the central dshash and
> in the hold copies in the backends, so as we dump the "age" of an
> existing stats entry if we find out that it gets reused in a create
> path.  That requires some design and would not be backpatchable, but
> it does not seem that invasive at quick glance (I had my eyes on this
> code the full week).  Other thoughts are welcome.

This still sounds like a good idea after sleeping on it over the
weekend.  Please note that I am moving on with the problem on this
thread where this has been reported, and that is the point of origin:
https://www.postgresql.org/message-id/ZmvVudpT8z6hbY9t@paquier.xyz

Let's continue the discussion there.
--
Michael

Attachment

Re: error "can only drop stats once" brings down database

From
Kyotaro Horiguchi
Date:
At Mon, 17 Jun 2024 08:28:10 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, Jun 14, 2024 at 09:37:35AM +0900, Michael Paquier wrote:
> > I am wondering if we should tackle this problem by adding a notion of
> > "age" in the stats entries for what's stored in the central dshash and
> > in the hold copies in the backends, so as we dump the "age" of an
> > existing stats entry if we find out that it gets reused in a create
> > path.  That requires some design and would not be backpatchable, but
> > it does not seem that invasive at quick glance (I had my eyes on this
> > code the full week).  Other thoughts are welcome.
> 
> This still sounds like a good idea after sleeping on it over the
> weekend.  Please note that I am moving on with the problem on this
> thread where this has been reported, and that is the point of origin:
> https://www.postgresql.org/message-id/ZmvVudpT8z6hbY9t@paquier.xyz
> 
> Let's continue the discussion there.

Is it something like this? I forgot the reason why this was excluded
from the vinal version, though.

https://www.postgresql.org/message-id/20201001.090722.322196928507744460.horikyota.ntt%40gmail.com

> Dshash scans are used for dropping and resetting stats entries. Entry
> dropping is performed in the following steps.
> 
> (delete_current_stats_entry())
> - Drop the dshash entry (needs exlock of dshash partition).
> 
> - If refcount of the stats entry body is already zero, free the memory
>    immediately .
> 
> - If not, set the "dropped" flag of the body. No lock is required
>   because the "dropped" flag won't be even referred to by other
>   backends until the next step is done.
> 
> - Increment deletion count of the shared hash. (this is used as the
>   "age" of local pointer cache hash (pgstat_cache).
> 
> (get_stat_entry())
> 
> - If dshash deletion count is different from the local cache age, scan
>   over the local cache hash to find "dropped" entries.
> 
> - Decrements refcount of the dropped entry and free the shared entry
>   if it is no longer referenced. Apparently no lock is required.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: error "can only drop stats once" brings down database

From
Michael Paquier
Date:
On Tue, Jun 25, 2024 at 09:22:48AM +0900, Kyotaro Horiguchi wrote:
> Is it something like this? I forgot the reason why this was excluded
> from the vinal version, though.
>
> https://www.postgresql.org/message-id/20201001.090722.322196928507744460.horikyota.ntt%40gmail.com

That's a bit different than the gc_count presented in 0004, because
there is no relationship to autovacuum with the cleanup of the
entries, still there is some overlap in the thought process: this
suggestion just makes sure that any backend holding local references
to a shared entry with a different age does not attempt to manipulate
it after switching locally to a dropped state.
--
Michael

Attachment