Re: [Proposal] Table partition + join pushdown - Mailing list pgsql-hackers

From Taiki Kondo
Subject Re: [Proposal] Table partition + join pushdown
Date
Msg-id 12A9442FBAE80D4E8953883E0B84E0886004F3@BPXM01GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: [Proposal] Table partition + join pushdown  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Hello, Horiguchi-san.

Sorry for late reply.

> explain analyze
> select data_x, data_y, num from check_test_div join inner_t on check_test_div.id + 1 = inner_t.id;
>
> Makes the mutation fail then result in an assertion failure.
>
> | TRAP: FailedAssertion("!(list_length(check_constr) ==
> | list_length(result))", File: "joinpath.c", Line: 1608)
>
> This is because both 'check_test_div.id + 1' and inner_t.id don't
> match the var-side of the constraints.

Thank you for picking up the failure example.
This is exactly a bug. I'll fix it.

> I don't see clearly what to do for the situation for now but this
> is the one of the most important function for this feature and
> should be cleanly designed.

Yes, this function is one of the important features of this patch.

This function makes new filtering conditions from CHECK() constraints.
This is to reduce number of rows for making hash table smaller (or
making sorting faster for MergeJoin) to fit to smaller work_mem environment.

Maybe, I must collect realistic examples of CHECK() constraints,
which are used for table partitioning, for designing more cleanly.


Best regards,

--
Taiki Kondo

NEC Solution Innovators, Ltd.



-----Original Message-----
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
Sent: Thursday, October 08, 2015 7:04 PM
To: tai-kondo@yk.jp.nec.com
Cc: kaigai@ak.jp.nec.com; aki-iwaasa@vt.jp.nec.com; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Proposal] Table partition + join pushdown

Hello, thank you for the example.

I could see this patch working with it.

> > In make_restrictinfos_from_check_constr, the function returns
> > modified constraint predicates correspond to vars under hashjoinable
> > join clauses. I don't think expression_tree_mutator is suitable to
> > do that since it could allow unwanted result when constraint
> > predicates or join clauses are not simple OpExpr's.
>
> Do you have any example of this situation?

As a rather simple case on the test environment made by the provided script, the following query,

explain analyze
select data_x, data_y, num from check_test_div join inner_t on check_test_div.id + 1 = inner_t.id;

Makes the mutation fail then result in an assertion failure.

| TRAP: FailedAssertion("!(list_length(check_constr) ==
| list_length(result))", File: "joinpath.c", Line: 1608)

This is because both 'check_test_div.id + 1' and inner_t.id don't match the var-side of the constraints.

I don't see clearly what to do for the situation for now but this is the one of the most important function for this
featureand should be cleanly designed. 


At Thu, 8 Oct 2015 08:28:04 +0000, Taiki Kondo <tai-kondo@yk.jp.nec.com> wrote in
<12A9442FBAE80D4E8953883E0B84E0885F9913@BPXM01GP.gisp.nec.co.jp>
> Hello, Horiguchi-san.
>
> Thank you for your comment.
>
> > I got some warning on compilation on unused variables and wrong
> > arguemtn type.
>
> OK, I'll fix it.
>
> > I failed to have a query that this patch works on. Could you let me
> > have some specific example for this patch?
>
> Please find attached.
> And also make sure that setting of work_mem is '64kB' (not 64MB).
>
> If there is the large work_mem enough to create hash table for
> relation after appending, its cost may be better than pushed-down
> plan's cost, then planner will not choose pushed-down plan this patch makes.
> So, to make this patch working fine, work_mem size must be smaller
> than the hash table size for relation after appending.
>
> > This patch needs more comments. Please put comment about not only
> > what it does but also the reason and other things for it.
>
> OK, I'll put more comments in the code.
> But it will take a long time, maybe...

Sure.

> > -- about namings
> >
> > Names for functions and variables are needed to be more appropriate,
> > in other words, needed to be what properly informs what they are.
> > The followings are the examples of such names.
>
> Thank you for your suggestion.
>
> I also think these names are not good much.
> I'll try to make names better , but it maybe take a long time...
> Of course, I will use your suggestion as reference.

Thanks.

> > "added_restrictlist"'s widely distributed as many function arguemnts
> > and JoinPathExtraData makes me feel dizzy..
>
> "added_restrictinfo" will be deleted from almost functions other than
> try_join_pushdown() in next (v2) patch because the place of filtering
> using this info will be changed from Join node to Scan node and not
> have to place it into other than try_join_pushdown().

I'm looking forward to see it.

> > In make_restrictinfos_from_check_constr, the function returns
> > modified constraint predicates correspond to vars under hashjoinable
> > join clauses. I don't think expression_tree_mutator is suitable to
> > do that since it could allow unwanted result when constraint
> > predicates or join clauses are not simple OpExpr's.
>
> Do you have any example of this situation?
> I am trying to find unwanted results you mentioned, but I don't have
> any results in this timing. I have a hunch that it will allow unwanted
> results because I have thought only about very simple situation for
> this function.

As mentioned above.

> > Otherwise could you give me clear explanation on what it does?
>
> This function transfers CHECK() constraint to filter expression by
> following procedures.
> (1) Get outer table's CHECK() constraint by using get_relation_constraints().
> (2) Walk through expression tree got in (1) by using expression_tree_mutator()
>     with check_constraint_mutator() and change only outer's Var node to
>     inner's one according to join clause.
>
> For example, when CHECK() constraint of table A is "num % 4 = 0" and
> join clause between table A and B is "A.num = B.data", then we can get
> "B.data % 4 = 0" for filtering purpose.
>
> This also accepts more complex join clause like "A.num = B.data * 2",
> then we can get "(B.data * 2) % 4 = 0".
>
> In procedure (2), to decide whether to use each join clause for
> changing Var node or not, I implement check_constraint_mutator() to
> judge whether join clause is hash-joinable or not.

Thanks for the explanation. I think that the function has been made considering only rather plain calses. We should put
morethought to making the logic clearer so that we can define the desired/possible capability and limitations clearly. 

> Actually, I want to judge whether OpExpr as top expression tree of
> join clause means "=" or not, but I can't find how to do it.
>
> If you know how to do it, please let me know.

I don't see for now, too :p

But we at least should put more consideration on the mechanism to obtain the expressions.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Victor Wagner
Date:
Subject: Re: Patch: Implement failover on libpq connect level.
Next
From: Etsuro Fujita
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual