Home > mailing lists
Re: Changeset Extraction v7.7 - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Changeset Extraction v7.7
Date	February 25, 2014 01:07:00
Msg-id	CA+TgmoZkqSdQ=pN3NHr07t_U8WV99ieDULAOcjfHYE7nK=i=5w@mail.gmail.com Whole thread Raw
In response to	Re: Changeset Extraction v7.7 (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: Changeset Extraction v7.7 (Andres Freund <andres@2ndquadrant.com>) Re: Changeset Extraction v7.7 (Andres Freund <andres@2ndquadrant.com>)
List	pgsql-hackers
Tree view
On Mon, Feb 24, 2014 at 10:11 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Changes in this version include:
> * changed slot error handling log by introducing "ephermal" slots which
>   get dropped on errors. This is the biggest change.
> * added quoting in the test_decoding output plugin
> * closing of a tight race condition during slot creation where WAL could
>   have been removed
> * comment and other adjustments, many of them noticed by robert

I did another read-through of this this afternoon, focusing on the
stuff you changed and parts I hadn't looked at carefully yet.
Comments below.

Documentation needs to be updated for pg_stat_replication view.

I still think pg_create_logical_replication_slot should be in slotfuncs.c.
/* Size of an indirect datum that contains an indirect TOAST pointer */#define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL
+sizeof(struct
 
varatt_indirect))

+/* Size of an indirect datum that contains a standard TOAST pointer */
+#define INDIRECT_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(struct
varatt_indirect))

Isn't the new hunk a duplicate of the existing definition, except for
a one-word change to the comment?

I don't think the completely-unsecured directory operations in
test_decoding_regsupport.c are acceptable.  Tom fought tooth and nail
to make sure that similar capabilities in adminpack carried meaningful
security restrictions.
       /*
+        * Check whether there are, possibly unconnected, logical
slots that refer
+        * to the to-be-dropped database. The database lock we are holding
+        * prevents the creation of new slots using the database.
+        */
+       if (ReplicationSlotsCountDBSlots(db_id, &nslots, &nslots_active))
+               ereport(ERROR,
+                               (errcode(ERRCODE_OBJECT_IN_USE),
+                                errmsg("database \"%s\" is used in a
logical decoding slot",
+                                               dbname),
+                                errdetail("There are %d slot(s), %d
of them active",
+                                                  nslots, nslots_active)));

What are you going to do when we get around to supporting this on a
standby?  Whatever the answer is, maybe add a TODO comment.

+ * loop for now..
+                * more than twice..

Extra periods.

+        * The replicatio slot mechanism is used to prevent removal of required

Typo.

+
+       /*
+        * GetRunningTransactionData() acquired ProcArrayLock, we must release
+        * it. We can do that before inserting the WAL record because
+        * ProcArrayApplyRecoveryInfo can recheck the commit status using the
+        * clog. If we're doing logical replication we can't do that though, so
+        * hold the lock for a moment longer.
+        */
+       if (wal_level < WAL_LEVEL_LOGICAL)
+               LWLockRelease(ProcArrayLock);
+       recptr = LogCurrentRunningXacts(running);

+       /* Release lock if we kept it longer ... */
+       if (wal_level >= WAL_LEVEL_LOGICAL)
+               LWLockRelease(ProcArrayLock);
+

This seems unfortunate.  The comment should clearly explain why it's necessary.

+       /*
+        * Startup logical state, needs to be setup now so we have proper data
+        * during restore.
+        */
+       StartupReorderBuffer();

Should add blank line before this.

+       CheckPointSnapBuild();
+       CheckpointLogicalRewriteHeap();

Shouldn't the capitalization be consistent?

-       heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+       if (IsSystemRelation(scan->rs_rd)
+               || RelationIsAccessibleInLogicalDecoding(scan->rs_rd))
+               heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalXmin);
+       else
+               heap_page_prune_opt(scan->rs_rd, buffer, RecentGlobalDataXmin);

Instead of changing the callers of heap_page_prune_opt() in this way,
I think it might be better to change heap_page_prune_opt() to take
only the first two of its current three parameters; everybody's just
passing RecentGlobalXmin right now anyway. Then, we could change the
first check in heap_page_prune_opt() to check first whether
PageIsPrunable(page, RecentGlobalDataXmin).  If not, give up.  If so,
then check that (!IsSystemRelation(scan->rs_rd) &&
!RelationIsAccessibleInLogicalDecoding(scan->rs_rd)) ||
PageIsPrunable(page, RecentGlobalXmin)).  The advantage of this is
that we avoid code duplication, and we avoid checking a couple of
conditions if pd_prune_xmin is very recent.

-               if (nrels > 0 || nmsgs > 0 || RelcacheInitFileInval ||
forceSyncCommit)
+               if (nrels > 0 || nmsgs > 0 || RelcacheInitFileInval ||
forceSyncCommit ||
+                       XLogLogicalInfoActive())

Mmph.  Is this really necessary?  If so, why?  The comments could elucidate.

+               bool fail_softly = slot->data.persistency == RS_EPHEMERAL;

This should be contingent on whether we're being called in the error
pathway, not the slot type.  I think you should pass a bool.

There are a bunch of places where you're testing IsSystemRelation() ||
RelationIsAccessibleInLogicalDecoding().  Maybe we need a macro
encapsulating that test, with a name chose to explain the point of it.It seems to be indicating, roughly, whether the
relationshould
 
participate in RecentGlobalXmin or RecentGlobalDataXmin.  But is there
any point at all of separating those when !XLogLogicalInfoActive()?
The test expands to:

IsSystemRelation() || (XLogLogicalInfoActive() &&
RelationNeedsWAL(relation) && (IsCatalogRelation(relation) ||
RelationIsUsedAsCatalogTable(relation)))

So basically this is all tables created in pg_catalog during initdb
plus all TOAST tables in the system.  If wal_level=logical, then we
also include tables marked with the reloption user_catalog_table=true,
unless they're unlogged.  This all seems a bit complex.  Why not this:

IsSystemRelation() || || RelationIsUsedAsCatalogTable(relation)

And why not this?

IsCatalogRelation() || || RelationIsUsedAsCatalogTable(relation)

i.e. is it really necessary to include all TOAST tables, or does it
suffice to include TOAST tables of system catalogs?  I bet you're
going to tell me that we don't know which TOAST tables pertain to
user-catalog tables, and thus must include them all.  Ugh.

+       /*
+        * It's important *not* to track decoding tasks here because
+        * snapbuild.c uses ->oldestRunningXid to manage its xmin. If it
+        * were to be included here the initial value could never
+        * increase.
+        */

This is not clear, and it uses the " ->member" syntax which I find
confusing and inelegant.

lazy_vacuum_rel() takes the relation as an argument, so why does it
need the to caller to compute IsSystemRelation(onerel) ||
RelationIsAccessibleInLogicalDecoding(onerel)?

Header comment for ReplicationSlotDropAcquired() is bogus.

ReplicationSlotDropAcquired() can easily avoid using a "goto" with a
short else block.  I'd suggest if rename() == 0 then fsync() else
ereport().

+ * GetOldestSafeDecodingTransactionId -- lowest xid not affected by vacuum

It seems to me that this is the lowest XID known not to have been
pruned, whether by vacuum or otherwise.

+       /* ----
+        * This is a bit tricky: We need to determine a safe xmin
horizon to start
+        * decoding from, to avoid starting from a running xacts
record referring
+        * to xids whose rows have been vacuumed or pruned
+        * already. GetOldestSafeDecodingTransactionId() returns such
a value, but
+        * without further interlock it's return value might
immediately be out of
+        * date.
+        *
+        * So we have to acquire the ProcArrayLock to prevent computation of new
+        * xmin horizons by other backends, get the safe decoding xid,
and inform
+        * the slot machinery about the new limit. Once that's done the
+        * ProcArrayLock can be be released as the slot machinery now is
+        * protecting against vacuum.
+        * ----
+        */

I can't claim to be very excited about this.  I'm assuming you've
spent a lot of time thinking about ways to avoid this and utterly
failed to come up with any reasonable alternative, but let me take a
shot.  Suppose we take ProcArrayLock in exclusive mode and compute the
oldest running XID, install it as our xmin, and then release
ProcArrayLock.  At that point, nobody else can compute an oldest-xmin
value that precedes that value, so we can take our time installing
that value as the slot's xmin, without needing to hold a lock
meanwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date:
From: Andres Freund
Date: 25 February 2014, 00:31:11
Subject: Re: Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?
From: Peter Geoghegan
Date: 25 February 2014, 02:05:43
Subject: Re: Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?
Re: Changeset Extraction v7.7 - Mailing list pgsql-hackers

Previous

Next