On Thursday, June 21, 2012 04:05:54 PM Florian Pflug wrote:
> On Jun21, 2012, at 13:41 , Andres Freund wrote:
> > 5.)
> > The actually good idea. Yours?
>
> What about a mixure of (3b) and (4), which writes the data not to the WAL
> but to a separate logical replication log. More specifically:
>
> There's a per-backend queue of change notifications.
>
> Whenever a non-catalog tuple is modified, we queue a TUPLE_MODIFIED
> record containing (xid, databaseoid, tableoid, old xmin, old ctid, new
> ctid)
>
> Whenever a table (or something that a table depends on) is modified we
> wait until all references to that table's oid have vanished from the queue,
> then queue a DDL record containing (xid, databaseoid, tableoid, text).
> Other backend cannot concurrently add further TUPLE_MODIFIED records since
> we alreay hold an exclusive lock on the table at that point.
>
> A background process continually processes these queues. If the front of
> the queue is a TUPLE_MODIFIED record, it fetches the old and the new tuple
> based on their ctids and writes the old tuple's PK and the full new tuple
> to the logical replication log. Since table modifications always wait for
> all previously queued TUPLE_MODIFIED records referencing that table to be
> processes *before* altering the catalog, tuples can always be interpreted
> according to the current (SnapshotNow) catalog contents.
>
> Upon transaction COMMIT and ROLLBACK, we queue COMMIT and ROLLBACK records,
> which are also written to the log by the background process. The background
> process may decide to wait until a backend commits before processing that
> backend's log. In that case, rolled back transaction don't leave a trace in
> the logical replication log. Should a backend, however, issue a DDL
> statement, the background process *must* process that backend's queue
> immediately, since otherwise there's a dead lock.
>
> The background process also maintains a value in shared memory which
> contains the oldest value in any of the queue's xid or "old xmin" fields.
> VACUUM and the like must not remove tuples whose xmin is >= that value.
> Hit bits *may* be set for newest tuples though, provided that the
> background process ignores hint bits when fetching the old and new tuples.
I think thats too complicated to fly. Getting that to recover cleanly in case
of crash would mean you'd need another wal.
I think if it comes to that going for 1) is more realistic...
Andres
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services