Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From Thom Brown
Subject Re: WIP: Covering + unique indexes.
Date
Msg-id CAA-aLv7mWdu5WCrUH0sw79abDsq4tdZEpkvV8SfgNyLp92+z6Q@mail.gmail.com
Whole thread Raw
In response to WIP: Covering + unique indexes.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: WIP: Covering + unique indexes.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
On 8 October 2015 at 16:18, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
>
> Hi hackers,
>
> I'm working on a patch that allows to combine covering and unique
> functionality for btree indexes.
>
> Previous discussion was here:
> 1) Proposal thread
> 2) Message with proposal clarification
>
> In a nutshell, the feature allows to create index with "key" columns and
> "included" columns.
> "key" columns can be used as scan keys. Unique constraint relates only to
> "key" columns.
> "included" columns may be used as scan keys if they have suitable opclass.
> Both "key" and "included" columns can be returned from index by
> IndexOnlyScan.
>
> Btree is the default index and it's used everywhere. So it requires properly
> testing.  Volunteers are welcome)
>
> Use case:
> - We have a table (c1, c2, c3, c4);
> - We need to have an unique index on (c1, c2).
> - We would like to have a covering index on all columns to avoid reading of
> heap pages.
>
> Old way:
> CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
> CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);
>
> What's wrong?
> Two indexes contain repeated data. Overhead to data manipulation operations
> and database size.
>
> New way:
> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
>
> The patch is attached.
> In 'test.sql' you can find a test with detailed comments on each step, and
> comparison of old and new indexes.
>
> New feature has following syntax:
> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
> Keyword INCLUDING defines the "included" columns of index. These columns
> aren't concern to unique constraint.
> Also, them are not stored in index inner pages. It allows to decrease index
> size.
>
> Results:
> 1) Additional covering index is not required anymore.
> 2) New index can use IndexOnlyScan on queries, where old index can't.
>
> For example,
> explain analyze select c1, c2 from newt where c1<10000 and c3<20;
>
> *more examples in 'test.sql'
>
> Future work:
> To do opclasses for "included" columns optional.
>
> CREATE TABLE tbl (c1 int, c4 box);
> CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);
>
> If we don't need c4 as an index scankey, we don't need any btree opclass on
> it.
> But we still want to have it in covering index for queries like
>
> SELECT c4 FROM tbl WHERE c1=1000;
> SELECT * FROM tbl WHERE c1=1000;

The definition output needs a space after "INCLUDING":

# SELECT pg_get_indexdef('people_first_name_last_name_email_idx'::regclass::oid);
            pg_get_indexdef
 

--------------------------------------------------------------------------------------------------------------------------CREATE
UNIQUEINDEX people_first_name_last_name_email_idx ON people
 
USING btree (first_name, last_name) INCLUDING(email)
(1 row)


There is also no collation output:

# CREATE UNIQUE INDEX test_idx ON people (first_name COLLATE "en_GB",
last_name) INCLUDING (email COLLATE "pl_PL");
CREATE INDEX

# SELECT pg_get_indexdef('test_idx'::regclass::oid);                                              pg_get_indexdef
-------------------------------------------------------------------------------------------------------------CREATE
UNIQUEINDEX test_idx ON people USING btree (first_name
 
COLLATE "en_GB", last_name) INCLUDING(email)
(1 row)


As for functioning, it works as described:

# EXPLAIN SELECT email FROM people WHERE (first_name,last_name) =
('Paul','Freeman');                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------Index Only
Scanusing people_first_name_last_name_email_idx on people(cost=0.28..1.40 rows=1 width=21)  Index Cond: ((first_name =
'Paul'::text)AND (last_name = 'Freeman'::text))
 
(2 rows)


Typo:

"included columns must not intersects with key columns"

should be:

"included columns must not intersect with key columns"


One thing I've noticed you can do with your patch, which you haven't
mentioned, is have a non-unique covering index:

# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);
CREATE INDEX

# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul';                                  QUERY
PLAN
---------------------------------------------------------------------------------Index Only Scan using covering_idx on
people (cost=0.28..1.44 rows=4 width=13)  Index Cond: (first_name = 'Paul'::text)
 
(2 rows)

But this appears to behave as if it were a regular multi-column index,
in that it will use the index for ordering rather than sort after
fetching from the index.  So is this really stored the same as a
multi-column index?  The index sizes aren't identical, so something is
different.

Thom



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] Comment fixes
Next
From: Robert Haas
Date:
Subject: Re: [PATCH] minor doc patch