Re: Logical decoding & exported base snapshot - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Logical decoding & exported base snapshot
Date
Msg-id 20121212110228.GA8027@awork2.anarazel.de
Whole thread Raw
In response to Re: Logical decoding & exported base snapshot  (Joachim Wieland <joe@mcknight.de>)
List pgsql-hackers
On 2012-12-11 21:05:51 -0500, Joachim Wieland wrote:
> On Tue, Dec 11, 2012 at 6:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > One problem I see is that while exporting a snapshot solves the
> > visibility issues of the table's contents it does not protect against
> > schema changes. I am not sure whether thats a problem.
> >
> > If somebody runs a CLUSTER or something like that, the table's contents
> > will be preserved including MVCC semantics. That's fine.
> > The more problematic cases I see are TRUNCATE, DROP and ALTER
> > TABLE.
>
> This is why the pg_dump master process executes a
>
> lock table <table> in access share mode
>
> for every table, so your commands would all block.
>
> In fact it's even more complicated because the pg_dump worker
> processes also need to lock the table. They try to get a similar lock
> in "NOWAIT" mode right before dumping the table. If they don't get the
> lock that means that somebody else is waiting for an exclusive lock
> (this is the case you describe) and the backup will fail.

[Tom explains why this is problematic way better than I do, see his
email]

A trivial example - you can play nastier games than that though:

S1: CREATE TABLE test(id int);
S1: INSERT INTO test VALUES(1), (2);
S1: BEGIN;
S1: ALTER TABLE test RENAME COLUMN id TO id2;
S2: pg_dump
S2: waits
S1: COMMIT;
pg_dump: [archiver (db)] query failed: ERROR:  column "id" of relation "test" does not exist
pg_dump: [archiver (db)] query was: COPY public.test (id) TO stdout;

> > Problem 2:
> >
> > To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction
> > needs to be alive. That's currently solved by exporting the snapshot in
> > the walsender connection that did the INIT_LOGICAL_REPLICATION. The
> > question is how long should we preserve that snapshot?
>
> You lost me on this one after the first sentence... But in general the
> snapshot isn't so much a magical thing: As long the exporting
> transaction is open, it guarantees that there is a transaction out
> there that is holding off vacuum from removing data and it's also
> guaranteeing that the snapshot as is has existed at some time in the
> past.
>
> Once it is applied to another transaction you now have two
> transactions that will hold off vacuum because they share the same
> xmin,xmax values. You could also just end the exporting transaction at
> that point.

I should probably have given a bit more context here... All this is in
the context of decoding WAL back into something useful for the purpose
of replication. This is done in the already existing walsender process,
using the commands I explained in the original post.
What we do there is walk the WAL from some point until we could assemble
a snapshot for exactly that LSN. Several preconditions need to be met
there (like a low enough xmin horizon, so the old catalog entries are
still arround). Using that snapshot we can now decode the entries stored
in the WAL.
If you setup a new replica though, getting all changes from one point
where you're consistent isn't all that interesting if you cannot also
get a backup from exactly that point in time because otherwise its very
hard to start with something sensible on the standby without locking
everything down. So the idea is that once we found that consistent
snapshot we also export it. Only that we are in a walsender connection
which doesn't expose transactional semantics, so the snapshot cannot be
"deallocated" by COMMIT; ing.

So the question is basically about how to design the user interface of
that deallocation.

Makes slightly more sense now?

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [WIP PATCH] for Performance Improvement in Buffer Management
Next
From: David Gould
Date:
Subject: bulk_multi_insert infinite loops with large rows and small fill factors