Thread: Operator families vs. casts

Operator families vs. casts

From

Noah Misch

Date:

24 May 2011, 07:40:56

PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a
binary coercion cast without rewriting the table or unrelated indexes. It
will always rewrite any indexes and recheck any foreign key constraints that
depend on a changing column. This is unnecessary for 100% of core binary
coercion casts. In my original design[1], I planned to detect this by
comparing the operator families of the old and would-be-new indexes. (This
still yields some unnecessary rewrites; oid_ops and int4_ops are actually
compatible, for example.) When I implemented[2] it, I found that the
contracts[3] for operator families are not strong enough to prove that the
existing indexes and constraints remain valid. Specifically, I wished assume
val0 = val1 iff val0::a = val1::b for any val0, val1, a, b such that we
resolve both equality operators in the same operator family. The operator
family contracts say nothing about consistency with casts. Is there a
credible use case for violating that assumption? If not, I'd like to document
it as a requirement for operator family implementors.

The above covers B-tree and hash operator families. GIN and GiST have no
operator family contracts. Here was the comment in my first patch intended to
sweep that under the table:

! * We do not document a contract for GIN or GiST operator families. Only the
! * GIN operator family "array_ops" has more than one constituent operator class,
! * and only typmod-only changes to arrays can avoid a rewrite. Preserving a GIN
! * index across such a change is safe. We therefore support GiST and GIN here
! * using the same rules as for B-tree and hash indexes, but that is mostly
! * academic. Any forthcoming contract for GiST or GIN operator families should,
! * all other things being equal, bolster the validity of this assumption.
! *
! * Exclusion constraints raise the question: can we trust that the operator has
! * the same semantics with the new type? The operator will fall in the index's
! * operator family. For B-tree or hash, the operator will be "=" or "<>",
! * yielding an affirmative answer from contractual requirements. For GiST and
! * GIN, we assume that a similar requirement would fall out of any contract for
! * their operator families, should one arise. We therefore support exclusion
! * constraints without any special treatment, but this is again mostly academic.

Any thoughts on what to do here? We could just add basic operator family
contracts requiring what we need. Perhaps, instead, the ALTER TABLE code
should require an operator family match for B-tree and hash but an operator
class match for other access methods.

For now, I plan to always rewrite indexes on expressions or having predicates.
With effort, we could detect compatible changes there, too.

I also had a more mundane design question in the second paragraph of [2]. It
can probably wait for the review of the next version of the patch. However,
given that it affects a large percentage of the patch, I'd appreciate any
early feedback on it.

Thanks,
nm

[1] http://archives.postgresql.org/message-id/20101229125625.GA27643@tornado.gateway.2wire.net
[2] http://archives.postgresql.org/message-id/20110113230124.GA18733@tornado.gateway.2wire.net
[3] http://www.postgresql.org/docs/9.0/interactive/xindex.html#XINDEX-OPFAMILY

Re: Operator families vs. casts

From

Tom Lane

Date:

24 May 2011, 11:10:43

Noah Misch <noah@leadboat.com> writes:
> PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a
> binary coercion cast without rewriting the table or unrelated indexes.  It
> will always rewrite any indexes and recheck any foreign key constraints that
> depend on a changing column.  This is unnecessary for 100% of core binary
> coercion casts.  In my original design[1], I planned to detect this by
> comparing the operator families of the old and would-be-new indexes.  (This
> still yields some unnecessary rewrites; oid_ops and int4_ops are actually
> compatible, for example.)

No, they aren't: signed and unsigned comparisons do not yield the same
sort order.  I think that example may destroy the rest of your argument.
        regards, tom lane

Re: Operator families vs. casts

From

Noah Misch

Date:

24 May 2011, 11:27:13

On Tue, May 24, 2011 at 10:10:34AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a
> > binary coercion cast without rewriting the table or unrelated indexes.  It
> > will always rewrite any indexes and recheck any foreign key constraints that
> > depend on a changing column.  This is unnecessary for 100% of core binary
> > coercion casts.  In my original design[1], I planned to detect this by
> > comparing the operator families of the old and would-be-new indexes.  (This
> > still yields some unnecessary rewrites; oid_ops and int4_ops are actually
> > compatible, for example.)
> 
> No, they aren't: signed and unsigned comparisons do not yield the same
> sort order.

True; scratch the parenthetical comment.

> I think that example may destroy the rest of your argument.

Not that I'm aware of.

Re: Operator families vs. casts

From

Alexey Klyukin

Date:

10 June 2011, 11:46:55

Noah,

Providing my thoughts on the 'mundane' question first.

On Tue, May 24, 2011 at 1:40 PM, Noah Misch <noah@leadboat.com> wrote:
>
> I also had a more mundane design question in the second paragraph of [2].  It
> can probably wait for the review of the next version of the patch.  However,
> given that it affects a large percentage of the patch, I'd appreciate any
> early feedback on it.

Here's the relevant part of the original post:

> ATPostAlterTypeCleanup has this comment:
>         /*
>          * Re-parse the index and constraint definitions, and attach them to the
>          * appropriate work queue entries. We do this before dropping because in
>          * the case of a FOREIGN KEY constraint, we might not yet have exclusive
>          * lock on the table the constraint is attached to, and we need to get
>          * that before dropping.  It's safe because the parser won't actually look
>          * at the catalogs to detect the existing entry.
>          */
> Is the second sentence true?  I don't think so, so I deleted it for now.  Here
> is the sequence of lock requests against the table possessing the FOREIGN KEY
> constraint when we alter the parent/upstream column:
>
> transformAlterTableStmt - ShareRowExclusiveLock
> ATPostAlterTypeParse - lockmode of original ALTER TABLE
> RemoveTriggerById() for update trigger - ShareRowExclusiveLock
> RemoveTriggerById() for insert trigger - ShareRowExclusiveLock
> RemoveConstraintById() - AccessExclusiveLock
> CreateTrigger() for insert trigger - ShareRowExclusiveLock
> CreateTrigger() for update trigger - ShareRowExclusiveLock
> RI_Initial_Check() - AccessShareLock (3x)

I think the statement in the second sentence of the comment is true.
ATPostAlterTypeParse is called only from ATPostAlterTypeCleanup. It has to
grab the lock on the table the constraint is attached to before dropping the
constraint. What it does is it opens that relation with the same lock that is
grabbed for AT_AlterColumnType subtype, i.e. AccessExclusiveLock. I think
there is no preceding place in AlterTable chain, that grabs stronger lock on
this relation. ShareRowExclusiveLock is taken in transformAlterTableStmt (1st
function in your sequence), but this function is ultimately called from
ATPostAlterTypeParse, just before the latter grabs the AccessExclusiveLock, so
ultimately at the point where this comment is located no locks are taken for
the table with a FOREIGN KEY constraint.

Alexey.

Re: Operator families vs. casts

From

Noah Misch

Date:

10 June 2011, 14:05:41

Alexey,

On Fri, Jun 10, 2011 at 05:46:42PM +0300, Alexey Klyukin wrote:
> Providing my thoughts on the 'mundane' question first.

I was actually referring to this paragraph:
 The standing code handled index/constraint dependencies of changing columns by extracting the SQL definition using
pg_get_indexdefor pg_get_constraintdef, dropping the object, and recreating it afresh.  To implement this optimization
forindexes and index-backed constraints, we need to update the index definition without perturbing its storage.  I
foundtwo major ways to do this, and I'm unsure which will be preferable, so I'm attaching both as alternate
implementationsof the same outcome.  I'd appreciate feedback on which is preferable.  The first skips the drop and
updatespg_index.indclass, pg_attribute, and pg_constraint.conexclop.  The alternate patch retains the drop and create,
thenresurrects the old relfilenode and assigns it to the new object.  The second approach is significantly simpler and
smaller,but it seems less-like anything else we do.  As a further variation on the second approach, I also considered
drillingholes through the performDeletion() and DefineIndex() stacks to avoid the drop-and-later-preserve dynamic, but
thatseemed uglier.

However, while we're on the topic you looked at:

> Here's the relevant part of the original post:
> 
> > ATPostAlterTypeCleanup has this comment:
> >         /*
> >          * Re-parse the index and constraint definitions, and attach them to the
> >          * appropriate work queue entries. We do this before dropping because in
> >          * the case of a FOREIGN KEY constraint, we might not yet have exclusive
> >          * lock on the table the constraint is attached to, and we need to get
> >          * that before dropping.  It's safe because the parser won't actually look
> >          * at the catalogs to detect the existing entry.
> >          */
> > Is the second sentence true?  I don't think so, so I deleted it for now.  Here
> > is the sequence of lock requests against the table possessing the FOREIGN KEY
> > constraint when we alter the parent/upstream column:
> >
> > transformAlterTableStmt - ShareRowExclusiveLock
> > ATPostAlterTypeParse - lockmode of original ALTER TABLE
> > RemoveTriggerById() for update trigger - ShareRowExclusiveLock
> > RemoveTriggerById() for insert trigger - ShareRowExclusiveLock
> > RemoveConstraintById() - AccessExclusiveLock
> > CreateTrigger() for insert trigger - ShareRowExclusiveLock
> > CreateTrigger() for update trigger - ShareRowExclusiveLock
> > RI_Initial_Check() - AccessShareLock (3x)
> 
> I think the statement in the second sentence of the comment is true.
> ATPostAlterTypeParse is called only from ATPostAlterTypeCleanup. It has to
> grab the lock on the table the constraint is attached to before dropping the
> constraint. What it does is it opens that relation with the same lock that is
> grabbed for AT_AlterColumnType subtype, i.e. AccessExclusiveLock. I think
> there is no preceding place in AlterTable chain, that grabs stronger lock on
> this relation. ShareRowExclusiveLock is taken in transformAlterTableStmt (1st
> function in your sequence), but this function is ultimately called from
> ATPostAlterTypeParse, just before the latter grabs the AccessExclusiveLock, so
> ultimately at the point where this comment is located no locks are taken for
> the table with a FOREIGN KEY constraint.

The comment is correct that we don't yet have a lock on the remote table at that
point.  But why do we need a lock before RemoveTriggerById() acquires one?
True, it is bad form to get a ShareRowExclusiveLock followed by upgrading to an
AccessExclusiveLock, but the solution there is that RemoveConstraintById() only
needs a ShareRowExclusiveLock.

Granted, in retrospect, I had no business editorializing on this matter.  It's
proximate to the patch's changes but unrelated to them.

Thanks,
nm