Re: Use unique index for longer pathkeys. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Use unique index for longer pathkeys.
Date
Msg-id CAA4eK1LeXMGrtqZ-ux+Hr-CaLM_TB=OWOY+Bm98=40UnJ3U-_g@mail.gmail.com
Whole thread Raw
In response to Re: Use unique index for longer pathkeys.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Use unique index for longer pathkeys.
List pgsql-hackers
On Tue, Jul 22, 2014 at 2:49 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Sorry , previous version has bugs. It stamps over the stack and
> crashesX( The attached is the bug fixed version, with no
> substantial changes.
>
> > On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI <
> > horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > >
> > > Hi, the attached is the revised version.
> >
> > Thanks Horiguchi-San for the updated patch.
>
> # By the way, this style of calling a person is quite common
> # among Japanese since the first-name basis implies very close
> # relationship or it frequently conveys offensive shade. 

I don't know if I have ever offended you or any other Japanese
community member while interaction, but my intention was never
so.  The reason for using this style for calling you is during my initial 4
years of work, I have worked very closely with Japanese, so I have
learned few things during that time and it was quite common to refer
the way I used above, however I am not sure I have always used
during communication, so if something I have used which is not common
in your culture, please feel free to let me know, I will surely not do that
again.  

> > Today while looking into updated patch, I was wondering why can't
> > we eliminate useless keys in query_pathkeys when we actually build
> > the same in function standard_qp_callback(), basically somewhat
> > similar to what we do in
> > standard_qp_callback->make_pathkeys_for_sortclauses->pathkey_is_redundant().
>
> I agree that standard_qp_callback is basically more preferable.
>
> > We already have index information related to base_rels before building
> > query pathkeys.  I noticed that you mentioned the below in your original
> > mail which indicates that information related to inheritance tables is built
> > only after set_base_rel_sizes()
> >
> > "These steps take place between set_base_rel_sizes() and
> > set_base_rel_pathlists() in make_one_rel(). The reason for the
> > position is that the step 2 above needs all inheritence tables to
> > be extended in PlannerInfo and set_base_rel_sizes (currently)
> > does that".
>
> Sorry, my memory had been worn down. After some reconfirmation,
> this description found to be a bit (quite?) wrong.
>
> collect_common_primary_pathkeys needs root->eq_classes to be set
> up beforehand, not appendrels. Bacause build_index_pathkeys
> doesn't work before every EC member for all relation involved is
> already registered.
>
>
> standard_qp_callback registers EC members in the following path
> but only for the primary(?) tlist of the subquery, so EC members
> for the child relations are not registered at the time.
>
> .->make_pathekeys_sortclauses->make_pathkey_from_sortop
>    ->make_pathkey_from_sortinfo->get_eclass_for_sort_expr
>
> EC members for the child rels are registered in
>
>  set_base_rel_sizes->set_rel_size->set_append_rel_size
>    ->add_child_rel_equivalences
>
> So trim_query_pathkeys (collect_common...) doesn't work before
> set_base_rel_sizes(). These EC members needed to be registered at
> least if root->query_pathkeys exists so this seems to be done in
> standard_qp_callback adding a separate inheritance tree walk.

Do we really need that information to eliminate useless keys from
query_pathkeys?


* We have to make child entries in the EquivalenceClass data
* structures as well.  This is needed either if the parent
* participates in some eclass joins (because we will want to consider
* inner-indexscan joins on the individual children) or if the parent
* has useful pathkeys (because we should try to build MergeAppend
* paths that produce those sort orderings).

Referring to above comment, I think we don't need it to eliminate
useless keys based on primary key info from query_pathkeys, but I
might be missing some case, could you please elaborate more to
let me know the case/cases where this would be useful.

I think there is one more disadvantage in the way current patch is
done which is that you need to collect index path keys for all relations
irrespective of whether they will be of any use to eliminate useless
pathkeys from query_pathkeys.  One trivial case that comes to mind is
when there are multiple relations involved in query and ORDER BY is
base on columns of only part of the tables involved in query.
 
> # rel->has_elcass_joins seems not working but it is not the topic
> # here.
>
> > Could you please explain me why the index information built in above
> > path is not sufficient or is there any other case which I am missing?
>
> Is the description above enough to be the explaination for the
> place? Sorry for the inaccurate description.

No issues, above description is sufficient to explain why you have
written patch in its current form.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: SKIP LOCKED DATA (work in progress)
Next
From: Devrim Gündüz
Date:
Subject: Re: PDF builds broken again