Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: WIP: Covering + unique indexes.
Date
Msg-id CA+TgmoaV+nTh=EFKRbHS1-4onOQ-bXOZtON7cuqn2WpNLeUeTg@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Covering + unique indexes.  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: WIP: Covering + unique indexes.  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Oct 4, 2016 at 9:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> I'd say that the reason for not using included columns in any
>>> operations which require comparisons, is that they don't have opclass.
>>> Let's go back to the example of points.
>>> This data type don't have any opclass for B-tree, because of fundamental
>>> reasons.
>>> And we can not apply _bt_compare() and others to this attribute, so
>>> we don't include it to scan key.
>>>
>>> create table t (i int, i2 int, p point);
>>> create index idx1 on (i) including (i2);
>>> create index idx2 on (i) including (p);
>>> create index idx3 on (i) including (i2, p);
>>> create index idx4 on (i) including (p, i2);
>>>
>>> You can keep tuples ordered in idx1, but not for idx2, partially ordered for
>>> idx3, but not for idx4.
>>
>> Yeah, I think we shouldn't go there.  I mean, once you start ordering
>> by INCLUDING columns, then you're going to need to include them in
>> leaf pages because otherwise you can't actually guarantee that they
>> are in the right order.
>
> I am not sure what you mean by above, because patch already stores
> INCLUDING columns in leaf pages.

Sorry, I meant non-leaf pages.

>>  And then you have to wonder why an INCLUDING
>> column is any different from a non-INCLUDING column.  It seems best to
>> make a firm rule that INCLUDING columns are there only for the values,
>> not for ordering.  That rule is simple and clear, which is a good
>> thing.
>
> Okay, we can make that firm rule, but I think reasoning behind that
> should be clear.  As far as I get it by reading some of the mails in
> this thread, it is because some of the other databases doesn't seem to
> support ordering for included columns or supporting the same can
> complicate the code.  One point, we should keep in mind that
> suggestion for including many other columns in INCLUDING clause to use
> Index Only scans by other databases might not hold equally good for
> PostgreSQL because it can lead to many HOT updates as non-HOT updates.

Right.  Looking back, the originally articulated rationale for this
patch was that you might want a single index that is UNIQUE ON (a) but
also INCLUDING (b) rather than two indexes, a unique index on (a) and
a non-unique index on (a, b).  In that case, the patch is a
straight-up win: you get the same number of HOT updates either way,
but you don't use as much disk space, or spend as much CPU time and
WAL updating your indexes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Rename max_parallel_degree?
Next
From: Tom Lane
Date:
Subject: Misidentification of Python shared library