Home > mailing lists

Re: Use unique index for longer pathkeys. - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Use unique index for longer pathkeys.
Date	July 14, 2014 05:31:58
Msg-id	CAA4eK1+6b6Wjwf51oZMrL+mKFH8xUp9J-pEhQvoR8SE7sWyTWw@mail.gmail.com Whole thread Raw
In response to	Use unique index for longer pathkeys. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses	Re: Use unique index for longer pathkeys.
List	pgsql-hackers

Tree view

On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello, This is the continuation from the last CF.
>
> This patch intends to make PG to use index for longer pathkeys
> than index columns when,
>
> - The index is a unique index.
> - All index columns are NOT NULL.
> - The index column list is a subset of query_pathkeys.
>
> ====
>
> This patch consists of mainly three parts.
>
> 1. Mark index as 'Uniquely orders the table' if the conditions
> are satisfied. This is the same from the previous patch.
>
> 2. Collect the "common primary pathkeys", which is pathkeys that
> can perfectly sort all involved
> RTE's. (collect_common_primary_pathkeys())
>
> 3. Trim the query pathkeys using the common primary pathkeys.
> (trim_query_pathkeys())

I have spent some time on this patch and would like to share my

findings as below with you.

1.
It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in

this patch is not getting set properly. Please refer below test:

create table t (a int, b int, c int, d text);

create unique index i_t_pkey on t(a, b);

insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);

analyze;

explain (costs off, analyze on) select distinct * from t;

Unptached -

postgres=# explain (costs off, analyze on) select distinct * from t;

QUERY PLAN

---------------------------------------------------------------------------

Unique (actual time=331.624..597.278 rows=100001 loops=1)

-> Sort (actual time=330.889..430.449 rows=100001 loops=1)

Sort Key: a, b, c, d

Sort Method: external merge Disk: 2344kB

-> Seq Scan on t (actual time=0.013..74.202 rows=100001 loops=1)

Execution time: 665.332 ms

(6 rows)

Patch (pathkey_and_uniqueindx_typ2_v1) -

postgres=# explain (costs off, analyze on) select distinct * from t;

QUERY PLAN

---------------------------------------------------------------------------

Unique (actual time=336.033..601.144 rows=100001 loops=1)

-> Sort (actual time=336.027..436.621 rows=100001 loops=1)

Sort Key: a, b, c, d

Sort Method: external merge Disk: 2344kB

-> Seq Scan on t (actual time=0.014..78.826 rows=100001 loops=1)

Execution time: 667.387 ms

(6 rows)

Shouldn't above query use index scan after patch?

typedef struct IndexOptInfo

bool predOK; /* true if predicate matches query */

bool unique; /* true if a unique index */

bool immediate; /* is uniqueness enforced immediately? */

+ bool full_ordered; /* don't key columns duplicate? */

It seems you have forgotten to update out function _outIndexOptInfo().

+ root->distinct_pathkeys =

+ shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide,

+ true);

+ root->query_pathkeys =

+ shorten_pathkeys_following(root,root->query_pathkeys,pk_guide,

+ true);

Add a space after each parameter in function call.

+PathKeysComparison

+compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)

a. This function return 4 different type of values (PATHKEYS_EQUAL,

PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),

however the caller just checks for PATHKEYS_EQUAL, isn't it better to just

return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.

b. Another idea could be that instead of writting separate function,

pass an additional parameter to compare_pathkeys() to distinguish

for ignore strategy case.

c. In any case, I think it is good to add comments on top of newly

introduced function (compare_pathkeys_ignoring_strategy) or extend

the comments of compare_pathkeys() if you want to add a new parameter

to it.

> Finally, the new patch has grown up to twice in size.. Although
> it seems a bit large, I think that side-effects are clearly
> avoided.

I am bit worried about the extra cycles added by this patch as compare

to your previous patch; example

During trimming of paths, it will build index paths (build_index_pathkeys())

which it will anyway do again during creation of index paths:

create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()

I am not sure if this has direct impact, however I am suspecting trimming

logic can add quite a few instructions even for cases when it is not beneficial.

At this moment, I also don't have better idea, so lets proceed with review of this

patch unless there is any other better way to achieve it.

With Regards,

Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Stephen Frost
Date: 14 July 2014, 04:29:16
Subject: Re: tweaking NTUP_PER_BUCKET

From: Christoph Moench-Tegeder
Date: 14 July 2014, 05:33:43
Subject: Re: 9.4 documentation: duplicate paragraph in logical decoding example

Re: Use unique index for longer pathkeys. - Mailing list pgsql-hackers

Previous

Next