Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From David Rowley
Subject Re: WIP: Covering + unique indexes.
Date
Msg-id CAKJS1f9W0aB-g7H6yYgNBq7hJsOKF3UwHU7-Q5jobbaTyK9f4g@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Covering + unique indexes.  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: WIP: Covering + unique indexes.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
On 7 January 2016 at 06:36, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jan 5, 2016 at 11:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> create table ab (a int,b int);
> insert into ab select x,y from generate_series(1,20) x(x),
> generate_series(10,1,-1) y(y);
> create index on ab (a) including (b);
> explain select * from ab order by a,b;
>                         QUERY PLAN
> ----------------------------------------------------------
>  Sort  (cost=10.64..11.14 rows=200 width=8)
>    Sort Key: a, b
>    ->  Seq Scan on ab  (cost=0.00..3.00 rows=200 width=8)
> (3 rows)

If you set enable_sort=off, then you get the index-only scan with no
sort.  So it believes the index can be used for ordering (correctly, I
think), just sometimes it thinks it is not faster to do it that way.

I'm not sure why this would be a correctness problem.  The covered
column does not participate in uniqueness checks, but it still usually
participates in index ordering.  (That is why dummy op-classes are
needed if you want to include non-sortable-type columns as being
covered.)

If that's the case, then it appears that I've misunderstood INCLUDING. From reading _bt_doinsert() it appeared that it'll ignore the INCLUDING columns and just find the insert position based on the key columns. Yet that's not the way that it appears to work. I was also a bit confused, as from working with another database which has very similar syntax to this, that one only includes the columns to allow index only scans, and the included columns are not indexed, therefore can't be part of index quals and the index only provides a sorted path for the indexed columns, and not the included columns.

Saying that, I'm now a bit confused to why the following does not produce 2 indexes which are the same size:

create table t1 (a int, b text);
insert into t1 select x,md5(random()::text) from generate_series(1,1000000) x(x);
create index t1_a_inc_b_idx on t1 (a) including (b);
create index t1_a_b_idx on t1 (a,b);
select pg_relation_Size('t1_a_b_idx'),pg_relation_size('t1_a_inc_b_idx');
 pg_relation_size | pg_relation_size 
------------------+------------------
         59064320 |         58744832
(1 row)

Also, if we want INCLUDING() to mean "uniqueness is not enforced on these columns, but they're still in the index", then I don't really think allowing types without a btree opclass is a good idea. It's likely too surprised filled and might not be what the user actually wants. I'd suggest that these non-indexed columns would be better defined by further expanding the syntax, the first (perhaps not very good) thing that comes to mind is:

create unique index idx_name on table (unique_col) also index (other,idx,cols) including (leaf,onlycols);

Looking up thread, I don't think I was the first to be confused by this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!
Next
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing