Re: Use unique index for longer pathkeys. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Use unique index for longer pathkeys.
Date
Msg-id 20140725.161807.192277291.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Use unique index for longer pathkeys.  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Use unique index for longer pathkeys.
List pgsql-hackers
Hello. 

I've been cooled off and convinced that this patch has become
quite useless by itself. It improves almost only UNIONs with
ORDER BY on tables that have unioform primary keys, and needs my
another patch to work.

I'll try to reintegrate this patch into my 'another patch' as
mentioned below and try next CF with it. I'll close this patch as
'Return with Feedback' by myself. Thank you for your thoughtful
commnets. I learned a lot.

=====
Thank you for the answers, Robert, Amit :)

> > # By the way, this style of calling a person is quite common
> > # among Japanese since the first-name basis implies very close
> > # relationship or it frequently conveys offensive shade.
> 
> I don't know if I have ever offended you or any other Japanese
> community member while interaction, but my intention was never
> so.  The reason for using this style for calling you is during my initial 4
> years of work, I have worked very closely with Japanese, so I have
> learned few things during that time and it was quite common to refer
> the way I used above, however I am not sure I have always used
> during communication, so if something I have used which is not common
> in your culture, please feel free to let me know, I will surely not do that
> again.

I also don't mind at all and that's why I was anxious about
it. Thank you for your answers.

> > Sorry, my memory had been worn down. After some reconfirmation,
> > this description found to be a bit (quite?) wrong.
> >
> > collect_common_primary_pathkeys needs root->eq_classes to be set
> > up beforehand, not appendrels. Bacause build_index_pathkeys
> > doesn't work before every EC member for all relation involved is
> > already registered.
> >
> >
> > standard_qp_callback registers EC members in the following path
> > but only for the primary(?) tlist of the subquery, so EC members
> > for the child relations are not registered at the time.
> >
> > .->make_pathekeys_sortclauses->make_pathkey_from_sortop
> >    ->make_pathkey_from_sortinfo->get_eclass_for_sort_expr
> >
> > EC members for the child rels are registered in
> >
> >  set_base_rel_sizes->set_rel_size->set_append_rel_size
> >    ->add_child_rel_equivalences
> >
> > So trim_query_pathkeys (collect_common...) doesn't work before
> > set_base_rel_sizes(). These EC members needed to be registered at
> > least if root->query_pathkeys exists so this seems to be done in
> > standard_qp_callback adding a separate inheritance tree walk.
> 
> Do we really need that information to eliminate useless keys from
> query_pathkeys?
> 
> 
> * We have to make child entries in the EquivalenceClass data
> * structures as well.  This is needed either if the parent
> * participates in some eclass joins (because we will want to consider
> * inner-indexscan joins on the individual children) or if the parent
> * has useful pathkeys (because we should try to build MergeAppend
> * paths that produce those sort orderings).
> 
> Referring to above comment, I think we don't need it to eliminate
> useless keys based on primary key info from query_pathkeys, but I
> might be missing some case, could you please elaborate more to
> let me know the case/cases where this would be useful.

As I said some time ago, It's not so useful for wide situation,
especially for no transformation is done on the given query. If
my memory is correct, there was more wider application before the
EC fix for nested inheritance, but now this has rather narrow
application:( Anyway,

The most and almost only target of this patch is the situation
that a DISTINCT is implicitly added for the query, that is UNION
on tables with primary key, which is generated by my another
patch which was proposed on last CF4 (or earlier). This patch is
originally a part of it.

https://commitfest.postgresql.org/action/patch_view?id=1279

Come to think of it, maybe I changed my mind. It's better to add
a trimmed (minimum) DISTINCT ON clause directly for flattened
simple UNION. Other examples are seemingly odd or unlikely as
given queries.

> I think there is one more disadvantage in the way current patch is
> done which is that you need to collect index path keys for all relations
> irrespective of whether they will be of any use to eliminate useless
> pathkeys from query_pathkeys.  One trivial case that comes to mind is
> when there are multiple relations involved in query and ORDER BY is
> base on columns of only part of the tables involved in query.

Like this?

select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b;

Equivalent class consists of (x.a=y.a) and (x.b), so index
pathkeys for i_y is (y.a.=x.a). As a result, no common primary
pathkeys found. There seem to be no problem if I understand you
correctly.

> > # rel->has_elcass_joins seems not working but it is not the topic
> > # here.
> >
> > > Could you please explain me why the index information built in above
> > > path is not sufficient or is there any other case which I am missing?
> >
> > Is the description above enough to be the explaination for the
> > place? Sorry for the inaccurate description.
> 
> No issues, above description is sufficient to explain why you have
> written patch in its current form.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Devrim Gündüz
Date:
Subject: Re: PDF builds broken again
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: WAL replay bugs