Home > mailing lists

Re: Incorrect handling of OOM in WAL replay leading to data loss - Mailing list pgsql-hackers

From	Kyotaro Horiguchi
Subject	Re: Incorrect handling of OOM in WAL replay leading to data loss
Date	August 9, 2023 10:13:53
Msg-id	20230809.161353.1075613419067875207.horikyota.ntt@gmail.com Whole thread Raw
In response to	Re: Incorrect handling of OOM in WAL replay leading to data loss (Michael Paquier <michael@paquier.xyz>)
Responses	Re: Incorrect handling of OOM in WAL replay leading to data loss
List	pgsql-hackers

Tree view

At Wed, 9 Aug 2023 15:03:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> I have spent more time on 0001, polishing it and fixing a few bugs
> that I have found while reviewing the whole.  Most of them were
> related to mistakes in resetting the error state when expected.  I
> have also expanded DecodeXLogRecord() to use an error structure
> instead of only an errmsg, giving more consistency.  The error state
> now relies on two structures:
> +typedef enum XLogReaderErrorCode
> +{
> +   XLOG_READER_NONE = 0,
> +   XLOG_READER_OOM,            /* out-of-memory */
> +   XLOG_READER_INVALID_DATA,   /* record data */
> +} XLogReaderErrorCode;
> +typedef struct XLogReaderError
> +{
> +   /* Buffer to hold error message */
> +   char       *message;
> +   bool        message_deferred;
> +   /* Error code when filling *message */
> +   XLogReaderErrorCode code;
> +} XLogReaderError;
> 
> I'm kind of happy with this layer, now.

I'm not certain if message_deferred is a property of the error
struct. Callers don't seem to need that information.

The name "XLOG_RADER_NONE" seems too generic. XLOG_READER_NOERROR will
be clearer.

> I have also spent some time on finding a more elegant solution for the
> WAL replay, relying on the new facility from 0001.  And it happens
> that it is easy enough to loop if facing an out-of-memory failure when
> reading a record when we are in crash recovery, as the state is
> actually close to what a standby does.  The trick is that we should
> not change the state and avoid tracking a continuation record.  This
> is done in 0002, making replay more robust.  With the addition of the

0002 shifts the behavior for the OOM case from ending recovery to
retrying at the same record.  If the last record is really corrupted,
the server won't be able to finish recovery. I doubt we are good with
this behavior change.

> error injection tweak in 0003, I am able to finish recovery while the
> startup process loops if under memory pressure.  As mentioned
> previously, there are more code paths to consider, but that's a start
> to fix the data loss problems.

(The file name "xlogreader_oom" is a bit trickeier to type than "hoge"
 or "foo"X( )

> Comments are welcome.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: Laetitia Avrot
Date: 09 August 2023, 09:42:42
Subject: Re: Adding a pg_servername() function

From: Juan José Santamaría Flecha
Date: 09 August 2023, 10:22:45
Subject: Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Re: Incorrect handling of OOM in WAL replay leading to data loss - Mailing list pgsql-hackers

Previous

Next