Thread: [HACKERS] How does postgres store the join predicate for a relation in a given query

[HACKERS] How does postgres store the join predicate for a relation in a given query

From

Gourav Kumar

Date:

10 October 2017, 16:59:24

Hi all,

When you fire a query in postgresql, it will first parse the query and create the data structures for storing various aspects of the query and executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).

I want to know how does postgresql stores the join predicates of a query. Like which data structure is used to store the join predicates.

How can we find the join predicates applied on a relation from relid, Oid or RangeTblEntry ?

I want to construct a join graph for a given query, for which I need the join predicates between two relations.

Thanks,
Gourav Kumar

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Nico Williams

Date:

10 October 2017, 20:51:34

On Tue, Oct 10, 2017 at 07:29:24PM +0530, Gourav Kumar wrote:
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
> 
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
> 
> How can we find the join predicates applied on a relation from relid, Oid
> or RangeTblEntry ?
> 
> I want to construct a join graph for a given query, for which I need the
> join predicates between two relations.

In the usingClause or quals fields of a JoinExpr.  See
src/backend/parser/gram.y, search for join_qual.

Of course, WHERE clauses have to be inspected as well, which go into the
whereClause of of a SelectStmt; search for where_clause in
src/backend/parser/gram.y.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Ashutosh Bapat

Date:

11 October 2017, 09:30:23

On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar <gourav1905@gmail.com> wrote:
> Hi all,
>
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
>
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
>
> How can we find the join predicates applied on a relation from relid, Oid or
> RangeTblEntry ?
>

Every relation has a RelOptInfo associated with it. Predicates
applicable to it are stored in this RelOptInfo as a list. For base
relations (simple tables) it's in baserestrictinfo. The join
predicates applicable are in joininfo. You can get RelOptInfo of a
given simple table using find_base_rel().

HTH.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Gourav Kumar

Date:

12 October 2017, 19:32:45

Hi Ashutosh,

I have the RelOptInfo data structure for the relations which are to be joined but when I check their joininfo, it is empty.

Does baserestrictinfo contains base predicates ?

Thanks

Gourav.

On 11 October 2017 at 12:00, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar <gourav1905@gmail.com> wrote:
> Hi all,
>
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
>
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
>
> How can we find the join predicates applied on a relation from relid, Oid or
> RangeTblEntry ?
>

Every relation has a RelOptInfo associated with it. Predicates
applicable to it are stored in this RelOptInfo as a list. For base
relations (simple tables) it's in baserestrictinfo. The join
predicates applicable are in joininfo. You can get RelOptInfo of a
given simple table using find_base_rel().

HTH.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

From

Tom Lane

Date:

12 October 2017, 19:38:36

Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.

You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Gourav Kumar

Date:

12 October 2017, 20:17:14

My objective is to construct join graph from a given query.

A join graph, has a node for each relation involved in a join, and an edge between two relations if they share a join predicate among them.

To do this I first tried to use the make_join_rel() function

- There I checked if they root->join->cur->level is 2, just write the relation names to a file.

- But this strategy failed, because if there is somewhere a Cartesian product among two relations, then they can't have an edge in the join graph.

- So, along with writing the relation name, I need to know if they share a join predicate among them or not.

On 12 October 2017 at 22:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.

You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.

regards, tom lane

Thanks,
Gourav Kumar

Computer Science and Automation

Indian Institute of Science

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Gourav Kumar

Date:

12 October 2017, 20:18:05

What is meant by "unstructured Join"?

Thanks,

Gourav

On 12 October 2017 at 22:47, Gourav Kumar <gourav1905@gmail.com> wrote:

My objective is to construct join graph from a given query.
A join graph, has a node for each relation involved in a join, and an edge between two relations if they share a join predicate among them.

To do this I first tried to use the make_join_rel() function
- There I checked if they root->join->cur->level is 2, just write the relation names to a file.
- But this strategy failed, because if there is somewhere a Cartesian product among two relations, then they can't have an edge in the join graph.
- So, along with writing the relation name, I need to know if they share a join predicate among them or not.

On 12 October 2017 at 22:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.

You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.

regards, tom lane

--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science

Thanks,
Gourav Kumar

Computer Science and Automation

Indian Institute of Science

Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

From

Tom Lane

Date:

12 October 2017, 20:44:25

Gourav Kumar <gourav1905@gmail.com> writes:
> My objective is to construct join graph from a given query.
> A join graph, has a node for each relation involved in a join, and an edge
> between two relations if they share a join predicate among them.

Hm, well, you could adapt the logic in have_relevant_joinclause() and
have_relevant_eclass_joinclause().  Or maybe you could just use them
as-is ... depends on what you have in mind to do with join clauses
that mention 3 or more relations.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Gourav Kumar

Date:

12 October 2017, 21:59:51

A Join clause/predicate will only mention 2 relations. It can't have 3 or more relations.

On 12 October 2017 at 23:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gourav Kumar <gourav1905@gmail.com> writes:
> My objective is to construct join graph from a given query.
> A join graph, has a node for each relation involved in a join, and an edge
> between two relations if they share a join predicate among them.

Hm, well, you could adapt the logic in have_relevant_joinclause() and
have_relevant_eclass_joinclause(). Or maybe you could just use them
as-is ... depends on what you have in mind to do with join clauses
that mention 3 or more relations.

regards, tom lane

Thanks,
Gourav Kumar

Computer Science and Automation

Indian Institute of Science

Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

From

Tom Lane

Date:

12 October 2017, 22:22:16

Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.

Really?  What of, say,
select ... from a,b,c where (a.x + b.y) = c.z;
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Gourav Kumar

Date:

12 October 2017, 22:30:30

Well for this given query it is possible. I haven't come across any such query yet.

Possibly because I am more concerned about the TPCDS and TPCH benchmarks, where it's less likely to occur.

On 13 October 2017 at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.

Really? What of, say,

select ... from a,b,c where (a.x + b.y) = c.z;

regards, tom lane

Thanks,
Gourav Kumar

Computer Science and Automation

Indian Institute of Science

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Gourav Kumar

Date:

14 October 2017, 00:45:02

Why does have_relevant_joinclause() and have_relevant_eclass_joinclause() return true for all possible joins for the query given below.

Even when they have no join predicate between them.

e.g. join between ss1 & ws3, ss2 & ws3 etc.

The query is :

TPC-DS query 50

-- query 50 in stream 0 using template query31.tpl

with ss as

(select ca_county,d_qoy, d_year,sum(ss_ext_sales_price) as store_sales

from store_sales,date_dim,customer_address

where ss_sold_date_sk = d_date_sk

and ss_addr_sk=ca_address_sk

group by ca_county,d_qoy, d_year),

ws as

(select ca_county,d_qoy, d_year,sum(ws_ext_sales_price) as web_sales

from web_sales,date_dim,customer_address

where ws_sold_date_sk = d_date_sk

and ws_bill_addr_sk=ca_address_sk

group by ca_county,d_qoy, d_year)

select /* tt */

ss1.ca_county

,ss1.d_year

,ws2.web_sales/ws1.web_sales web_q1_q2_increase

,ss2.store_sales/ss1.store_sales store_q1_q2_increase

,ws3.web_sales/ws2.web_sales web_q2_q3_increase

,ss3.store_sales/ss2.store_sales store_q2_q3_increase

from

ss ss1

,ss ss2

,ss ss3

,ws ws1

,ws ws2

,ws ws3

where

ss1.d_qoy = 1

and ss1.d_year = 2000

and ss1.ca_county = ss2.ca_county

and ss2.d_qoy = 2

and ss2.d_year = 2000

and ss2.ca_county = ss3.ca_county

and ss3.d_qoy = 3

and ss3.d_year = 2000

and ss1.ca_county = ws1.ca_county

and ws1.d_qoy = 1

and ws1.d_year = 2000

and ws1.ca_county = ws2.ca_county

and ws2.d_qoy = 2

and ws2.d_year = 2000

and ws1.ca_county = ws3.ca_county

and ws3.d_qoy = 3

and ws3.d_year =2000

and case when ws1.web_sales > 0 then ws2.web_sales/ws1.web_sales else null end

> case when ss1.store_sales > 0 then ss2.store_sales/ss1.store_sales else null end

and case when ws2.web_sales > 0 then ws3.web_sales/ws2.web_sales else null end

> case when ss2.store_sales > 0 then ss3.store_sales/ss2.store_sales else null end

order by web_q2_q3_increase;

-- end

On 13 October 2017 at 01:00, Gourav Kumar <gourav1905@gmail.com> wrote:

Well for this given query it is possible. I haven't come across any such query yet.

Possibly because I am more concerned about the TPCDS and TPCH benchmarks, where it's less likely to occur.

On 13 October 2017 at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.

Really? What of, say,

select ... from a,b,c where (a.x + b.y) = c.z;

regards, tom lane

--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science

Thanks,
Gourav Kumar

Computer Science and Automation

Indian Institute of Science

Re: [HACKERS] How does postgres store the join predicate for arelation in a given query

From

Ashutosh Bapat

Date:

16 October 2017, 08:13:12

On Sat, Oct 14, 2017 at 3:15 AM, Gourav Kumar <gourav1905@gmail.com> wrote:
> Why does have_relevant_joinclause() and have_relevant_eclass_joinclause()
> return true for all possible joins for the query given below.
> Even when they have no join predicate between them.
> e.g. join between ss1 & ws3, ss2 & ws3 etc.
>

The prologues of those functions and comments within those explain that.

/** have_relevant_joinclause*      Detect whether there is a joinclause that involves*      the two given relations.**
Note:the joinclause does not have to be evaluable with only these two* relations.  This is intentional.  For example
consider*     SELECT * FROM a, b, c WHERE a.x = (b.y + c.z)* If a is much larger than the other tables, it may be
worthwhileto* cross-join b and c and then use an inner indexscan on a.x.  Therefore* we should consider this joinclause
asreason to join b to c, even though* it can't be applied at that join step.*/

/** have_relevant_eclass_joinclause*      Detect whether there is an EquivalenceClass that could produce*      a
joinclauseinvolving the two given relations.** This is essentially a very cut-down version of*
generate_join_implied_equalities(). Note it's OK to occasionally say "yes"* incorrectly.  Hence we don't bother with
detailslike whether the lack of a* cross-type operator might prevent the clause from actually being generated.*/

May be you want to see whether those comments are applicable in your
case and also see how the callers handle the return values.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers