Thread: Disallow unique index on system columns

Disallow unique index on system columns

From
David Rowley
Date:
Hi,

Over in [1], while I was aiming to fix some incorrect formatting in an
error message, Tom noticed that the code in that area was much more
broken than I had thought.

Basically, if you do;

postgres=# create table t (a int) with oids;
CREATE TABLE
postgres=# create unique index t_oid_idx on t(oid);
CREATE INDEX
postgres=# alter table t replica identity using index t_oid_idx;

You get;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

I proposed a fix over there, but it didn't go anywhere, probably
because Tom and Andres discussed just disallowing unique indexes on
system columns altogether. So, the attached patch does just that, and
also fixes up the replica identity bugs too, as it's still possible
that someone could create a unique index on a system column with an
old version, upgrade, then try to set the replica identity to that
index. We'd need to handle that correctly, so I fixed that too.

I hope there's still time to get this backpacked before the releases.

[1] http://www.postgresql.org/message-id/26659.1459485031@sss.pgh.pa.us

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Disallow unique index on system columns

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I proposed a fix over there, but it didn't go anywhere, probably
> because Tom and Andres discussed just disallowing unique indexes on
> system columns altogether. So, the attached patch does just that, and
> also fixes up the replica identity bugs too, as it's still possible
> that someone could create a unique index on a system column with an
> old version, upgrade, then try to set the replica identity to that
> index. We'd need to handle that correctly, so I fixed that too.

AFAIR, what we were discussing was disallowing any index on a system
column (other than OID).  I do not see why only unique indexes are
problematic for them; the semantic issues are independent of that.
        regards, tom lane



Re: Disallow unique index on system columns

From
Andres Freund
Date:
On 2016-04-14 21:02:26 -0400, Tom Lane wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > I proposed a fix over there, but it didn't go anywhere, probably
> > because Tom and Andres discussed just disallowing unique indexes on
> > system columns altogether. So, the attached patch does just that, and
> > also fixes up the replica identity bugs too, as it's still possible
> > that someone could create a unique index on a system column with an
> > old version, upgrade, then try to set the replica identity to that
> > index. We'd need to handle that correctly, so I fixed that too.
> 
> AFAIR, what we were discussing was disallowing any index on a system
> column (other than OID).  I do not see why only unique indexes are
> problematic for them; the semantic issues are independent of that.

+1



Re: Disallow unique index on system columns

From
David Rowley
Date:
On 15 April 2016 at 13:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> I proposed a fix over there, but it didn't go anywhere, probably
>> because Tom and Andres discussed just disallowing unique indexes on
>> system columns altogether. So, the attached patch does just that, and
>> also fixes up the replica identity bugs too, as it's still possible
>> that someone could create a unique index on a system column with an
>> old version, upgrade, then try to set the replica identity to that
>> index. We'd need to handle that correctly, so I fixed that too.
>
> AFAIR, what we were discussing was disallowing any index on a system
> column (other than OID).  I do not see why only unique indexes are
> problematic for them; the semantic issues are independent of that.

I have to admit that my thoughts only considered ctid, which I
imagined would have been OK to have an index on. As for the other
system columns (apart from OID), I agree.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Disallow unique index on system columns

From
Andres Freund
Date:
On 2016-04-15 13:26:35 +1200, David Rowley wrote:
> On 15 April 2016 at 13:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > David Rowley <david.rowley@2ndquadrant.com> writes:
> >> I proposed a fix over there, but it didn't go anywhere, probably
> >> because Tom and Andres discussed just disallowing unique indexes on
> >> system columns altogether. So, the attached patch does just that, and
> >> also fixes up the replica identity bugs too, as it's still possible
> >> that someone could create a unique index on a system column with an
> >> old version, upgrade, then try to set the replica identity to that
> >> index. We'd need to handle that correctly, so I fixed that too.
> >
> > AFAIR, what we were discussing was disallowing any index on a system
> > column (other than OID).  I do not see why only unique indexes are
> > problematic for them; the semantic issues are independent of that.
> 
> I have to admit that my thoughts only considered ctid, which I
> imagined would have been OK to have an index on. As for the other
> system columns (apart from OID), I agree.

What'd be the point of indexing ctid, and why would it be correct?
Wouldn't, hm, HOT break it?

Andres



Re: Disallow unique index on system columns

From
David Rowley
Date:
On 15 April 2016 at 13:30, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-15 13:26:35 +1200, David Rowley wrote:
>> On 15 April 2016 at 13:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > David Rowley <david.rowley@2ndquadrant.com> writes:
>> >> I proposed a fix over there, but it didn't go anywhere, probably
>> >> because Tom and Andres discussed just disallowing unique indexes on
>> >> system columns altogether. So, the attached patch does just that, and
>> >> also fixes up the replica identity bugs too, as it's still possible
>> >> that someone could create a unique index on a system column with an
>> >> old version, upgrade, then try to set the replica identity to that
>> >> index. We'd need to handle that correctly, so I fixed that too.
>> >
>> > AFAIR, what we were discussing was disallowing any index on a system
>> > column (other than OID).  I do not see why only unique indexes are
>> > problematic for them; the semantic issues are independent of that.
>>
>> I have to admit that my thoughts only considered ctid, which I
>> imagined would have been OK to have an index on. As for the other
>> system columns (apart from OID), I agree.
>
> What'd be the point of indexing ctid, and why would it be correct?
> Wouldn't, hm, HOT break it?

I don't personally see the point. I was just concerned that if there's
a slight chance that it's useful, then someone might have one
somewhere in the wild, and they might have problems restoring pg_dump
backups, once we disallow it.

The attached patch just disallows any index containing a system
column, apart from OID.

Is it worth making some changes to pg_dump to skip such indexes?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Disallow unique index on system columns

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 15 April 2016 at 13:30, Andres Freund <andres@anarazel.de> wrote:
>> What'd be the point of indexing ctid, and why would it be correct?
>> Wouldn't, hm, HOT break it?

> I don't personally see the point.

An index on ctid is useless by definition: if you know the ctid of
a tuple, you can just go get it, never mind the index.

> Is it worth making some changes to pg_dump to skip such indexes?

No.
        regards, tom lane



Re: Disallow unique index on system columns

From
David Rowley
Date:
On 15 April 2016 at 13:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> On 15 April 2016 at 13:30, Andres Freund <andres@anarazel.de> wrote:
>>> What'd be the point of indexing ctid, and why would it be correct?
>>> Wouldn't, hm, HOT break it?
>
>> I don't personally see the point.
>
> An index on ctid is useless by definition: if you know the ctid of
> a tuple, you can just go get it, never mind the index.

I'm not sure that's 100% accurate, and perhaps it's not worth arguing,
as they're likely broken because of HOT anyway, but it does seem like
you've totally disregarded the fact that a TIDscan does not support
range scanning, where an index scan on ctid would.

E.g; how many live tuples are on page 0?

select count(*) from t where ctid between '(0,0)' and '(0,10000)';

I'm not saying it's going to be a common case. I just want to ensure
we've considered all semi realistic use cases before we go and turn
this off.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Disallow unique index on system columns

From
David Rowley
Date:
On 15 April 2016 at 13:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
> The attached patch just disallows any index containing a system
> column, apart from OID.

Seems I only did half the job as I forgot to think to check for system
columns that are hidden away inside expressions or predicates.

The attached fixes that.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Disallow unique index on system columns

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 15 April 2016 at 13:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> The attached patch just disallows any index containing a system
>> column, apart from OID.

> Seems I only did half the job as I forgot to think to check for system
> columns that are hidden away inside expressions or predicates.
> The attached fixes that.

I think what we should do with this is split it into two patches, one
to fix ALTER REPLICA IDENTITY's crash (which we'd back-patch to 9.4)
and one to forbid indexes on system columns (which IMO should be HEAD
only).  The first of those should be pretty uncontroversial, so I'll
go ahead with pushing it --- anyone have a problem with the second?
        regards, tom lane



Re: Disallow unique index on system columns

From
Andres Freund
Date:
On 2016-04-15 11:49:27 -0400, Tom Lane wrote:
> I think what we should do with this is split it into two patches, one
> to fix ALTER REPLICA IDENTITY's crash (which we'd back-patch to 9.4)
> and one to forbid indexes on system columns (which IMO should be HEAD
> only).  The first of those should be pretty uncontroversial, so I'll
> go ahead with pushing it --- anyone have a problem with the second?

Only in that I'm wondering whether we shouldn't be more aggressive about
#2.

Andres



Re: Disallow unique index on system columns

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-04-15 11:49:27 -0400, Tom Lane wrote:
>> I think what we should do with this is split it into two patches, one
>> to fix ALTER REPLICA IDENTITY's crash (which we'd back-patch to 9.4)
>> and one to forbid indexes on system columns (which IMO should be HEAD
>> only).  The first of those should be pretty uncontroversial, so I'll
>> go ahead with pushing it --- anyone have a problem with the second?

> Only in that I'm wondering whether we shouldn't be more aggressive about
> #2.

Well, if there *is* someone out there somehow making use of an index on
one of these columns, I think we shouldn't break it in a minor release.

A more likely scenario is that someone's created such an index but it's
lying around unused.  In that case, with the patch as proposed, they'd
have to drop it before they could upgrade to 9.6.  That doesn't bother
me ... but again, possibly causing problems in a minor release does.
        regards, tom lane



Re: Disallow unique index on system columns

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 15 April 2016 at 13:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> The attached patch just disallows any index containing a system
>> column, apart from OID.
> Seems I only did half the job as I forgot to think to check for system
> columns that are hidden away inside expressions or predicates.
> The attached fixes that.

Pushed.  I moved the check into DefineIndex, as that's where user-facing
complaints about indexes generally ought to be.
        regards, tom lane



Re: Disallow unique index on system columns

From
Eric Ridge
Date:
On Sat, Apr 16, 2016 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pushed.  I moved the check into DefineIndex, as that's where user-facing
complaints about indexes generally ought to be.

If you're planning on back-patching this, please don't.  :)  It'll literally ruin my life.

I've got an extension that's actually a custom Access Method, and for reasons that are probably too boring to go into here, it requires that the first column in the index be a function that takes the ctid.  Ie, something akin to:

   CREATE INDEX idx ON table (my_func('table', ctid), other_func(table));

The AM implementation itself doesn't actually use the result of my_func(), but that construct is necessary so I can detect certain queries that look like:
    SELECT FROM table WHERE my_func('table', ctid) ==> 'index condition'

I don't mind that you're changing this for 9.6... 9.6 is going to change so much other stuff around custom AMs that I'll deal with it when the time comes, but back-patching this into 9.3/4/5 would make life very difficult.

Thanks for listening!

eric

Re: Disallow unique index on system columns

From
Tom Lane
Date:
Eric Ridge <eebbrr@gmail.com> writes:
> I've got an extension that's actually a custom Access Method, and for
> reasons that are probably too boring to go into here, it requires that the
> first column in the index be a function that takes the ctid.  Ie, something
> akin to:
>    CREATE INDEX idx ON table (my_func('table', ctid), other_func(table));

That's ... creative.

> The AM implementation itself doesn't actually use the result of my_func(),
> but that construct is necessary so I can detect certain queries that look
> like:
>     SELECT FROM table WHERE my_func('table', ctid) ==> 'index condition'

Um, why's the ctid important here, or perhaps more directly, what is
it you're really trying to do?

> I don't mind that you're changing this for 9.6... 9.6 is going to change so
> much other stuff around custom AMs that I'll deal with it when the time
> comes, but back-patching this into 9.3/4/5 would make life very difficult.

We weren't planning to do that.
        regards, tom lane



Re: Disallow unique index on system columns

From
Eric Ridge
Date:
On Wed, Apr 20, 2016 at 9:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>     SELECT FROM table WHERE my_func('table', ctid) ==> 'index condition'

Um, why's the ctid important here, or perhaps more directly, what is
it you're really trying to do?

This function is defined as my_func(regclass, tid) and simply returns the tid value passed in.  The operator is defined as ==>(tid, text).

Behind the scenes, the AM is actually backed by Elasticsearch, and the tuple's ctid value is used as the "_id" in ES.

When Postgres decides to plan a sequential scan (or filter) to answer WHERE clause conditions that use the ==>(tid, text) operator the AM isn't involved but I still need to use the remote Elasticsearch server to answer that condition.  

So I came up with this "creative" approach to provide enough context in the query plan for me to figure out a) which table is being used and b) which physical row is being evaluated in the seqscan or filter.

When the operator's procedure is called, I notice that it's the first time I've seen the FuncExpr on the LHS, go query Elasticsearch with the text query from the RHS, build a hashtable of the matching ctids and lookup the LHS's value in the hashtable.  If it exists, the row matches.

There just didn't seem to be enough context in the FunctionCallInfo of the the operator's procedure to figure this out without something in the query that's basically statically determined at parse time.

I suspect what I should be doing for this particular problem is taking advantage of the Custom Scan API, but I'm trying to support PG 9.3.

We weren't planning to do that.

Great!

eric