Re: Use unique index for longer pathkeys. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Use unique index for longer pathkeys.
Date
Msg-id CAA4eK1+6b6Wjwf51oZMrL+mKFH8xUp9J-pEhQvoR8SE7sWyTWw@mail.gmail.com
Whole thread Raw
In response to Use unique index for longer pathkeys.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Use unique index for longer pathkeys.
List pgsql-hackers
On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello, This is the continuation from the last CF.
>
> This patch intends to make PG to use index for longer pathkeys
> than index columns when,
>
>  - The index is a unique index.
>  - All index columns are NOT NULL.
>  - The index column list is a subset of query_pathkeys.
>
> ====
>
> This patch consists of mainly three parts.
>
>  1. Mark index as 'Uniquely orders the table' if the conditions
>     are satisfied. This is the same from the previous patch.
>
>  2. Collect the "common primary pathkeys", which is pathkeys that
>     can perfectly sort all involved
>     RTE's. (collect_common_primary_pathkeys())
>
>  3. Trim the query pathkeys using the common primary pathkeys.
>     (trim_query_pathkeys())

I have spent some time on this patch and would like to share my
findings as below with you.

1.
It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
this patch is not getting set properly.  Please refer below test:

create table t (a int, b int, c int, d text);
create unique index i_t_pkey on t(a, b);
insert into t (select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
analyze;
explain (costs off, analyze on) select distinct * from t;

Unptached -
postgres=# explain (costs off, analyze on) select distinct * from t;
                                QUERY PLAN
---------------------------------------------------------------------------
 Unique (actual time=331.624..597.278 rows=100001 loops=1)
   ->  Sort (actual time=330.889..430.449 rows=100001 loops=1)
         Sort Key: a, b, c, d
         Sort Method: external merge  Disk: 2344kB
         ->  Seq Scan on t (actual time=0.013..74.202 rows=100001 loops=1)
 Execution time: 665.332 ms
(6 rows)

Patch (pathkey_and_uniqueindx_typ2_v1) -
postgres=# explain (costs off, analyze on) select distinct * from t;
                                QUERY PLAN
---------------------------------------------------------------------------
 Unique (actual time=336.033..601.144 rows=100001 loops=1)
   ->  Sort (actual time=336.027..436.621 rows=100001 loops=1)
         Sort Key: a, b, c, d
         Sort Method: external merge  Disk: 2344kB
         ->  Seq Scan on t (actual time=0.014..78.826 rows=100001 loops=1)
 Execution time: 667.387 ms
(6 rows)

Shouldn't above query use index scan after patch?

2.
typedef struct IndexOptInfo
  bool predOK; /* true if predicate matches query */
  bool unique; /* true if a unique index */
  bool immediate; /* is uniqueness enforced immediately? */
+ bool full_ordered; /* don't key columns duplicate? */

It seems you have forgotten to update out function _outIndexOptInfo().

3.
+ root->distinct_pathkeys =
+ shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide,
+   true);
+
+ root->query_pathkeys =
+ shorten_pathkeys_following(root,root->query_pathkeys,pk_guide,
+   true);

Add a space after each parameter in function call.

4.
+PathKeysComparison
+compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)

a. This function return 4 different type of values (PATHKEYS_EQUAL,
   PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
   however the caller just checks for PATHKEYS_EQUAL, isn't it better to just
   return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.
b. Another idea could be that instead of writting separate function,
   pass an additional parameter to compare_pathkeys() to distinguish
   for ignore strategy case.
c. In any case, I think it is good to add comments on top of newly
   introduced function (compare_pathkeys_ignoring_strategy) or extend
   the comments of compare_pathkeys() if you want to add a new parameter
   to it.

> Finally, the new patch has grown up to twice in size.. Although
> it seems a bit large, I think that side-effects are clearly
> avoided.

I am bit worried about the extra cycles added by this patch as compare
to your previous patch; example
During trimming of paths, it will build index paths (build_index_pathkeys())
which it will anyway do again during creation of index paths:
create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()

I am not sure if this has direct impact, however I am suspecting trimming
logic can add quite a few instructions even for cases when it is not beneficial.
At this moment, I also don't have better idea, so lets proceed with review of this
patch unless there is any other better way to achieve it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: tweaking NTUP_PER_BUCKET
Next
From: Christoph Moench-Tegeder
Date:
Subject: Re: 9.4 documentation: duplicate paragraph in logical decoding example