Re: Error "initial slot snapshot too large" in create replication slot - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Error "initial slot snapshot too large" in create replication slot
Date
Msg-id 20220912215156.zi52zu6n7od3k3v3@awork3.anarazel.de
Whole thread Raw
In response to Re: Error "initial slot snapshot too large" in create replication slot  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Error "initial slot snapshot too large" in create replication slot
List pgsql-hackers
Hi,

Thanks for working on this!


I think this should include a test that fails without this change and succeeds
with it...


On 2022-07-19 11:55:06 +0900, Kyotaro Horiguchi wrote:
> From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Date: Tue, 19 Jul 2022 11:50:29 +0900
> Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT

This sees a tad misleading - the previous snapshot wasn't borken, right?


> +/*
> + * ReorderBufferXidIsKnownSubXact
> + *        Returns true if the xid is a known subtransaction.
> + */
> +bool
> +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
> +{
> +    ReorderBufferTXN *txn;
> +
> +    txn = ReorderBufferTXNByXid(rb, xid, false,
> +                                NULL, InvalidXLogRecPtr, false);
> +
> +    /* a known subtxn? */
> +    if (txn && rbtxn_is_known_subxact(txn))
> +        return true;
> +
> +    return false;
> +}

The comments here just seem to restate the code....


It's not obvious to me that it's the right design (or even correct) to ask
reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
why would reorderbuffer even be guaranteed to know about all these subxids?


> @@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
>
>      MyProc->xmin = snap->xmin;
>
> -    /* allocate in transaction context */
> +    /*
> +     * Allocate in transaction context.
> +     *
> +     * We could use only subxip to store all xids (takenduringrecovery
> +     * snapshot) but that causes useless visibility checks later so we hasle to
> +     * create a normal snapshot.
> +     */

I can't really parse this comment at this point, and I seriously doubt I could
later on.


> @@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
>
>          if (test == NULL)
>          {
> -            if (newxcnt >= GetMaxSnapshotXidCount())
> -                ereport(ERROR,
> -                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> -                         errmsg("initial slot snapshot too large")));
> -
> -            newxip[newxcnt++] = xid;
> +            /* Store the xid to the appropriate xid array */
> +            if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
> +            {
> +                if (!overflowed)
> +                {
> +                    if (newsubxcnt >= GetMaxSnapshotSubxidCount())
> +                        overflowed = true;
> +                    else
> +                        newsubxip[newsubxcnt++] = xid;
> +                }
> +            }
> +            else
> +            {
> +                if (newxcnt >= GetMaxSnapshotXidCount())
> +                    elog(ERROR,
> +                         "too many transactions while creating snapshot");
> +                newxip[newxcnt++] = xid;
> +            }
>          }

Hm, this is starting to be pretty deeply nested...


I wonder if a better fix here wouldn't be to allow importing a snapshot with a
larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
need to be in a transactional snapshot anyway, which is copied anyway?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pgsql: Prefetch data referenced by the WAL, take II.
Next
From: Andres Freund
Date:
Subject: Re: Error "initial slot snapshot too large" in create replication slot