Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: WIP: Covering + unique indexes.
Date
Msg-id CAA4eK1JQ3=Fur6NofmrDXnxm90s39GAxJUCFY3ORAQd9aPozvQ@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Covering + unique indexes.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: WIP: Covering + unique indexes.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
On Wed, Sep 21, 2016 at 6:51 PM, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
> 20.09.2016 08:21, Amit Kapila:
>
> On Tue, Sep 6, 2016 at 10:18 PM, Anastasia Lubennikova
> <a.lubennikova@postgrespro.ru> wrote:
>
> 28.08.2016 09:13, Amit Kapila:
>
>
> The problem seems really tricky, but the answer is simple.
> We store included columns unordered. It was mentioned somewhere in
> this thread.
>

Is there any fundamental problem in storing them in ordered way?  I
mean to say, you need to anyway store all the column values on leaf
page, so why can't we find the exact location for the complete key.
Basically use truncated key to reach to leaf level and then use the
complete key to find the exact location to store the key.  I might be
missing some thing here, but if we can store them in ordered fashion,
we can use them even for queries containing ORDER BY (where ORDER BY
contains included columns).

> Let me give you an example:
>
> create table t (i int, p point);
> create index on (i) including (p);
> "point" data type doesn't have any opclass for btree.
> Should we insert (0, '(0,2)') before (0, '(1,1)') or after?
> We have no idea what is the "correct order" for this attribute.
> So the answer is "it doesn't matter". When searching in index,
> we know that only key attrs are ordered, so only them can be used
> in scankey. Other columns are filtered after retrieving data.
>
> explain select i,p from t where i =0 and p <@ circle '((0,0),2)';
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Index Only Scan using idx on t  (cost=0.14..4.20 rows=1 width=20)
>    Index Cond: (i = 0)
>    Filter: (p <@ '<(0,0),2>'::circle)
>

I think here reason for using Filter is that because we don't keep
included columns in scan keys, can't we think of having them in scan
keys, but use only key columns in scan key to reach till leaf level
and then use complete scan key at leaf level.

>
> The same approach is used for included columns of any type, even if
> their data types have opclass.
>
> Is this truncation concept of high key needed for correctness of patch
> or is it just to save space in index?   If you need this, then I think
> nbtree/Readme needs to be updated.
>
>
> Now it's done only for space saving. We never check included attributes
> in non-leaf pages, so why store them? Especially if we assume that included
> attributes can be quite long.
> There is already a note in documentation:
>
> +        It's the same with other constraints (PRIMARY KEY and EXCLUDE).
> This can
> +        also can be used for non-unique indexes as any columns which are
> not required
> +        for the searching or ordering of records can be included in the
> +        <literal>INCLUDING</> clause, which can slightly reduce the size of
> the index,
> +        due to storing included attributes only in leaf index pages.
>

Okay, thanks for clarification.

> What should I add to README (or to documentation),
> to make it more understandable?
>

May be add the data representation like only leaf pages contains all
the columns and how the scan works.  I think you can see if you can
extend "Notes About Data Representation" and or "Other Things That Are
Handy to Know" sections in existing README.

> -- I am getting Assertion failure when I use this patch with database
> created with a build before this patch.  However, if I create a fresh
> database it works fine.  Assertion failure details are as below:
>
> LOG:  database system is ready to accept connections
> LOG:  autovacuum launcher started
> TRAP: unrecognized TOAST vartag("((bool) 1)", File:
> "src/backend/access/common/h
> eaptuple.c", Line: 532)
> LOG:  server process (PID 1404) was terminated by exception 0x80000003
> HINT:  See C include file "ntstatus.h" for a description of the hexadecimal
> valu
> e.
> LOG:  terminating any other active server processes
>
>
> That is expected behavior, because catalog versions are not compatible.
> But I wonder why there was no message about that?
> I suppose, that's because CATALOG_VERSION_NO was outdated in my
> patch. As well as I know, committer will change it before the commit.
> Try new patch with updated value. It should fail with a message about
> incompatible versions.
>

Yeah, that must be reason, but lets not change it now, otherwise we
will face conflicts while applying patch.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Refactor StartupXLOG?
Next
From: Amit Kapila
Date:
Subject: Re: raised checkpoint limit & manual checkpoint