Re: Incorrect handling of OOM in WAL replay leading to data loss - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Incorrect handling of OOM in WAL replay leading to data loss
Date
Msg-id 20230809.161353.1075613419067875207.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Incorrect handling of OOM in WAL replay leading to data loss  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Incorrect handling of OOM in WAL replay leading to data loss
List pgsql-hackers
At Wed, 9 Aug 2023 15:03:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> I have spent more time on 0001, polishing it and fixing a few bugs
> that I have found while reviewing the whole.  Most of them were
> related to mistakes in resetting the error state when expected.  I
> have also expanded DecodeXLogRecord() to use an error structure
> instead of only an errmsg, giving more consistency.  The error state
> now relies on two structures:
> +typedef enum XLogReaderErrorCode
> +{
> +   XLOG_READER_NONE = 0,
> +   XLOG_READER_OOM,            /* out-of-memory */
> +   XLOG_READER_INVALID_DATA,   /* record data */
> +} XLogReaderErrorCode;
> +typedef struct XLogReaderError
> +{
> +   /* Buffer to hold error message */
> +   char       *message;
> +   bool        message_deferred;
> +   /* Error code when filling *message */
> +   XLogReaderErrorCode code;
> +} XLogReaderError;
> 
> I'm kind of happy with this layer, now.

I'm not certain if message_deferred is a property of the error
struct. Callers don't seem to need that information.

The name "XLOG_RADER_NONE" seems too generic. XLOG_READER_NOERROR will
be clearer.

> I have also spent some time on finding a more elegant solution for the
> WAL replay, relying on the new facility from 0001.  And it happens
> that it is easy enough to loop if facing an out-of-memory failure when
> reading a record when we are in crash recovery, as the state is
> actually close to what a standby does.  The trick is that we should
> not change the state and avoid tracking a continuation record.  This
> is done in 0002, making replay more robust.  With the addition of the

0002 shifts the behavior for the OOM case from ending recovery to
retrying at the same record.  If the last record is really corrupted,
the server won't be able to finish recovery. I doubt we are good with
this behavior change.

> error injection tweak in 0003, I am able to finish recovery while the
> startup process loops if under memory pressure.  As mentioned
> previously, there are more code paths to consider, but that's a start
> to fix the data loss problems.

(The file name "xlogreader_oom" is a bit trickeier to type than "hoge"
 or "foo"X( )

> Comments are welcome.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Laetitia Avrot
Date:
Subject: Re: Adding a pg_servername() function
Next
From: Juan José Santamaría Flecha
Date:
Subject: Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?