Re: changeset generation v5-01 - Patches & git tree - Mailing list pgsql-hackers

From Andres Freund
Subject Re: changeset generation v5-01 - Patches & git tree
Date
Msg-id 20130827175834.GB9061@awork2.anarazel.de
Whole thread Raw
In response to Re: changeset generation v5-01 - Patches & git tree  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On 2013-08-27 11:32:30 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
> > The git tree is at:
> > git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4
> >
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4
> 
> I gave this recently rebased branch a skim.  In general, the separation 
> between decode.c/reorderbuffer.c/snapbuild.c seems a lot nicer now than
> on previous iterations -- good job there.

Thanks for having a look!

> Here are some quick notes I took while reading the patch itself.  I
> haven't gone through it really carefully, yet.
> 
> 
> - I wonder whether DecodeCommit and DecodeAbort should really be a single
>   routine.  Right now, the former might call the later; and the latter is
>   aware of this.  Seems awkward.

Yes, I am not happy with that either. I'll play with combining them and
check whether that looks beter.

> - We skip insert/update/delete if not my database Id; however, we don't skip
>   commit in the same case.  If there are two walrecvrs on a cluster, on
>   different databases, does this lead to us trying to remove files
>   twice, if a xact commits which deleted some files?  Is this a problem?
>   Should we try to skip such database-specific actions in global
>   WAL records?

Hm. We should be able to skip it for long commit records at least. I
think I lost that code along the unification.

There's no danger of removing anything global afaics since we're not
replaying using the original replay routines and all the slot/sender
specific stuff has unique names.

> - There's rmgr-specific knowledge in decode.c.  I wonder if, similar to
>   redo and desc routines, that shouldn't instead be pluggable functions
>   for each rmgr.

I don't think that's a good idea. I've quickly played with it before and
it doesn't seem to end happy. It would require opening up more
semi-public interfaces and in the end, we're only interested of in-core
stuff. Even if it were possible to add new indexes by plugging new
rmgrs, we wouldn't care.

> - What's with ReorderBufferRestoreCleanup()?  Shouldn't it be in logical.c?

No, that's just for removing ondisk data at the end of a
transaction. I'll improve the comment.

> - reorderbuffer.c does several different things.  Can it be split?
>   Perhaps in pieces such as
>   * stuff to manage memory (slab cache thingies)
>   * TXN iterator
>   * other logically separate parts?
>   * the rest

Hm. I don't really see much point in splitting it along those
lines. None of those really makes sense without the other parts and the
file isn't *that* huge.

> - Having to expose LocalExecuteInvalidationMessage() looks awkward.  Is there
>   another way?

Hm. I don't immediately see any way. We need to execute invalidation
messages just within one backend. There just is no exposed functionality
for that yet since it wasn't needed so far. We could expose something
like LocalExecuteInvalidationMessage*s*() instead of doing the loop in
reorderbuffer.c, but that's about it.

> - I think we need a better name for "treat_as_catalog_table" (and
>   RelationIsTreatedAsCatalogTable).  Maybe replication_catalog or
>   something similar?

I think we're going to end up needing that for more than just
replication, so I'd like to keep replication out of the name. I don't
like the current name either though, so any other ideas?

> - Don't do this:
>   + * RecentGlobal(Data)?Xmin is initialized to InvalidTransactionId, to ensure that no
>   because later greps for RecentGlobalDataXmin and RecentGlobalXmin will
>   fail to find it.  It seems better to spell both names, so
>   "RecentGlobalDataXmin and RecentGlobalXmin are initialized to ..."

Ok.

> - the pg_receivellog command line is strange.  Apparently I need one or
>   more of --start,--init,--stop, but if stop, then the other two must
>   not be present; and if startpos, then init and stop cannot be
>   specified.  (There's a typo there that says "cannot combine with
>   --start" when it really means "cannot combine with --stop", BTW).  I
>   think this would make more sense to have init,start,stop be commands,
>   in pg_ctl's spirit; so there would be no double-dash.  IOW
>     SOMEPATH/pg_receivellog --startpos=123 start
>   and so on.

The reasoning here is somewhat complex and I am not happy with the
status quo, so I like getting input here.

The individual verbs mean:
* init: create a replication slot
* start: continue streaming in an existing replication slot
* stop: remove replication slot

The reason you cannot specify anything with --stop is that a) --start
streams until you abort the utility. So there's no chance of running
--stop after it. b) --init and --stop seems like a pointless combination
since you cannot actually do anything with the slot.
--init and --start combined, on the other hand are useful for testing,
which is why I allow them so far, but I wouldn't have problems removing
that capability.

The reason you cannot combine --init or --init --start with --startpos
is that --startpos has to refer to a location that could have actually
streamed to the client. Before a replication slot is established the
client doesn't know anything about such an address, so --init --start
cannot know any useful --startpos, that's why it's forbidden to pass
one.

The idea behind startpos is that you can tell the server "I have
replayed transactions up to this LSN" and the server will only give you
only transactions that have commited after this.

>  Also, we need SGML docs for this new utility.

And a lot more than only for this utility :(

> Any particular reason for removing this line:
> -/* Get a new XLogReader */
> +
>  extern XLogReaderState *XLogReaderAllocate(XLogPageReadCB pagereadfunc,
>                    void *private_data);

Hrmpf. Merge error. I've integrated too many different versions of too
different xlogreaders ;)

> I don't see the point of XLogRecordBuffer.record_data; we already have a
> pointer to the XLogRecord, and the data can readily be obtained using
> XLogRecGetData.  So why provide the same thing twice?  It seems to me
> that if instead of passing the XLogRecordBuffer we just provide the
> XLogRecord, and separately the "origptr" where needed, we could avoid
> having to expose the XLogRecordBuffer stuff unnecessarily.

By now we also need the end location of a wal record. So we have to pass
three addresses around for everything which isn't very convenient. If
you vastly prefer passing around three parameters I can do that, but I'd
rather not.
The original reason for doing so was, to be honest, that my own
xlogreader's API was different...

> In this comment:
> + * FIXME: We need something resembling the real SnapshotNow to handle things
> + * like enum lookups from indices correctly.
> what do we need consider in light of the new comment proposed by Robert
> CA+TgmobvTjRj_doXxQ0wgA1a1JLYPVYqtR3m+Cou_ousabnmXg@mail.gmail.com

I did most of the code changes for this, but this made me realize that
there are quite some more comments and even a function name to be
adapted. Will work on that.

Thanks!

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: pg_restore multiple --function options
Next
From: David Fetter
Date:
Subject: Re: pg_restore multiple --function options