Re: [HACKERS] snapbuild woes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] snapbuild woes
Date
Msg-id 20170501005921.qcqd7w2njpv3av6s@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] snapbuild woes  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Responses Re: [HACKERS] snapbuild woes  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> +    /*
> +     * c) we already seen the xl_running_xacts and tried to do the above.
> +     * However because of race condition in LogStandbySnapshot() there might
> +     * have been transaction reported as running but in reality has already
> +     * written commit record before the xl_running_xacts so decoding has
> +     * missed it. We now see xl_running_xacts that suggests all transactions
> +     * from the original one were closed but the consistent state wasn't
> +     * reached which means the race condition has indeed happened.
> +     *
> +     * Start tracking again as if this was the first xl_running_xacts we've
> +     * seen, with the advantage that because decoding was already running,
> +     * any transactions committed before the xl_running_xacts record will be
> +     * known to us so we won't hit with the same issue again.
> +     */

Unfortunately I don't think that's true, as coded.  You're using
information about committed transactions:

> +    else if (TransactionIdFollows(running->oldestRunningXid,
> +                                  builder->running.xmax))
> +    {
> +        int off;
> +
> +        SnapBuildStartXactTracking(builder, running);
> +
>          /*
> +         * Nark any transactions that are known to have committed before the
> +         * xl_running_xacts as finished to avoid the race condition in
> +         * LogStandbySnapshot().
>           *
> +         * We can use SnapBuildEndTxn directly as it only does the
> +         * transaction running check and handling without any additional
> +         * side effects.
>           */
> +        for (off = 0; off < builder->committed.xcnt; off++)
> +            SnapBuildEndTxn(builder, lsn, builder->committed.xip[off]);

but a transaction might just have *aborted* before the new snapshot, no?
Since we don't keep track of those, I don't think this guarantees anything?

ISTM, we need a xip_status array in SnapBuild->running.  Then, whenever
a xl_running_xacts is encountered while builder->running.xcnt > 0, loop
for i in 0 .. xcnt_space, and end SnapBuildEndTxn() if xip_status[i] but
the xid is not xl_running_xacts?  Does that make sense?

- Andres



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [HACKERS] Re: logical replication and PANIC during shutdowncheckpoint in publisher
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table