Home > mailing lists

Re: VM corruption on standby - Mailing list pgsql-hackers

From	Aleksander Alekseev
Subject	Re: VM corruption on standby
Date	August 9 23:54:42
Msg-id	CAJ7c6TMpt9Cr+M2_G97iKp_-TfLNm7ZOtHWyTVpdQKmocxchHw@mail.gmail.com Whole thread Raw
In response to	Re: VM corruption on standby (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses	Re: VM corruption on standby Re: VM corruption on standby
List	pgsql-hackers

Tree view

Hi Andrey,

> 0. checkpointer is going to flush a heap buffer but waits on content lock
> 1. client is resetting PD_ALL_VISIBLE from page
> 2. postmaster is killed and command client to go down
> 3. client calls LWLockReleaseAll() at ProcKill() (?)
> 4. checkpointer flushes buffer with reset PG_ALL_VISIBLE that is not WAL-logged to standby
> 5. subsequent deletes do not log resetting this bit
> 6. deleted data is observable on standby with IndexOnlyScan

Thanks for investigating this in more detail. If this is indeed what
happens it is a violation of the "log before changing" approach. For
this reason we have PageHeaderData.pd_lsn for instance - to make sure
pages are evicted only *after* the record that changed it is written
to disk (because WAL records can't be applied to pages from the
future).

I guess the intent here could be to do an optimization of some sort
but the facts that 1. the instance can be killed at any time and 2.
there might be replicas - were not considered.

> Any idea how to fix this?

IMHO: logging the changes first, then allowing to evict the page.

pgsql-hackers by date:

From: Kirill Reshke
Date: 09 August, 22:54:58
Subject: Re: VM corruption on standby

From: Noah Misch
Date: 10 August, 01:23:38
Subject: Re: A few patches to clarify snapshot management

Re: VM corruption on standby - Mailing list pgsql-hackers

Previous

Next