Thread: [HACKERS] How does postgres store the join predicate for a relation in a given query

Hi all,

When you fire a query in postgresql, it will first parse the query and create the data structures for storing various aspects of the query and executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).

I want to know how does postgresql stores the join predicates of a query. Like which data structure is used to store the join predicates.

How can we find the join predicates applied on a relation from relid, Oid or RangeTblEntry ?

I want to construct a join graph for a given query, for which I need the join predicates between two relations.


--
Thanks,
Gourav Kumar

On Tue, Oct 10, 2017 at 07:29:24PM +0530, Gourav Kumar wrote:
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
> 
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
> 
> How can we find the join predicates applied on a relation from relid, Oid
> or RangeTblEntry ?
> 
> I want to construct a join graph for a given query, for which I need the
> join predicates between two relations.

In the usingClause or quals fields of a JoinExpr.  See
src/backend/parser/gram.y, search for join_qual.

Of course, WHERE clauses have to be inspected as well, which go into the
whereClause of of a SelectStmt; search for where_clause in
src/backend/parser/gram.y.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar <gourav1905@gmail.com> wrote:
> Hi all,
>
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
>
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
>
> How can we find the join predicates applied on a relation from relid, Oid or
> RangeTblEntry ?
>

Every relation has a RelOptInfo associated with it. Predicates
applicable to it are stored in this RelOptInfo as a list. For base
relations (simple tables) it's in baserestrictinfo. The join
predicates applicable are in joininfo. You can get RelOptInfo of a
given simple table using find_base_rel().

HTH.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Hi Ashutosh,

I have the RelOptInfo data structure for the relations which are to be joined but when I check their joininfo, it is empty.
Does baserestrictinfo contains base predicates ?

Thanks 
Gourav.

On 11 October 2017 at 12:00, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar <gourav1905@gmail.com> wrote:
> Hi all,
>
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
>
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
>
> How can we find the join predicates applied on a relation from relid, Oid or
> RangeTblEntry ?
>

Every relation has a RelOptInfo associated with it. Predicates
applicable to it are stored in this RelOptInfo as a list. For base
relations (simple tables) it's in baserestrictinfo. The join
predicates applicable are in joininfo. You can get RelOptInfo of a
given simple table using find_base_rel().

HTH.


--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company




Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.

You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

My objective is to construct join graph from a given query.
A join graph, has a node for each relation involved in a join, and an edge between two relations if they share a join predicate among them.

To do this I first tried to use the make_join_rel() function
     - There I checked if they root->join->cur->level is 2, just write the relation names to a file.
     - But this strategy failed, because if there is somewhere a Cartesian product among two relations, then they can't have an edge in the join graph.
     - So, along with writing the relation name, I need to know if they share a join predicate among them or not.
 

On 12 October 2017 at 22:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.

You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.

                        regards, tom lane



--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science
What is meant by "unstructured Join"?

Thanks,
Gourav

On 12 October 2017 at 22:47, Gourav Kumar <gourav1905@gmail.com> wrote:
My objective is to construct join graph from a given query.
A join graph, has a node for each relation involved in a join, and an edge between two relations if they share a join predicate among them.

To do this I first tried to use the make_join_rel() function
     - There I checked if they root->join->cur->level is 2, just write the relation names to a file.
     - But this strategy failed, because if there is somewhere a Cartesian product among two relations, then they can't have an edge in the join graph.
     - So, along with writing the relation name, I need to know if they share a join predicate among them or not.
 

On 12 October 2017 at 22:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.

You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.

                        regards, tom lane



--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science



--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science
Gourav Kumar <gourav1905@gmail.com> writes:
> My objective is to construct join graph from a given query.
> A join graph, has a node for each relation involved in a join, and an edge
> between two relations if they share a join predicate among them.

Hm, well, you could adapt the logic in have_relevant_joinclause() and
have_relevant_eclass_joinclause().  Or maybe you could just use them
as-is ... depends on what you have in mind to do with join clauses
that mention 3 or more relations.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

A Join clause/predicate will only mention 2 relations. It can't have 3 or more relations.

On 12 October 2017 at 23:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> My objective is to construct join graph from a given query.
> A join graph, has a node for each relation involved in a join, and an edge
> between two relations if they share a join predicate among them.

Hm, well, you could adapt the logic in have_relevant_joinclause() and
have_relevant_eclass_joinclause().  Or maybe you could just use them
as-is ... depends on what you have in mind to do with join clauses
that mention 3 or more relations.

                        regards, tom lane



--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science
Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.

Really?  What of, say,
select ... from a,b,c where (a.x + b.y) = c.z;
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Well for this given query it is possible. I haven't come across any such query yet.

Possibly because I am more concerned about the TPCDS and TPCH benchmarks, where it's less likely to occur.

On 13 October 2017 at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.

Really?  What of, say,

        select ... from a,b,c where (a.x + b.y) = c.z;

                        regards, tom lane



--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science
Why does have_relevant_joinclause() and have_relevant_eclass_joinclause() return true for all possible joins for the query given below.
Even when they have no join predicate between them.
e.g. join between ss1 & ws3, ss2 & ws3 etc.

The query is :
TPC-DS query 50

 -- query 50 in stream 0 using template query31.tpl
with ss as
 (select ca_county,d_qoy, d_year,sum(ss_ext_sales_price) as store_sales
 from store_sales,date_dim,customer_address
 where ss_sold_date_sk = d_date_sk
  and ss_addr_sk=ca_address_sk
 group by ca_county,d_qoy, d_year),
 ws as
 (select ca_county,d_qoy, d_year,sum(ws_ext_sales_price) as web_sales
 from web_sales,date_dim,customer_address
 where ws_sold_date_sk = d_date_sk
  and ws_bill_addr_sk=ca_address_sk
 group by ca_county,d_qoy, d_year)
 select /* tt */
        ss1.ca_county
       ,ss1.d_year
       ,ws2.web_sales/ws1.web_sales web_q1_q2_increase
       ,ss2.store_sales/ss1.store_sales store_q1_q2_increase
       ,ws3.web_sales/ws2.web_sales web_q2_q3_increase
       ,ss3.store_sales/ss2.store_sales store_q2_q3_increase
 from
        ss ss1
       ,ss ss2
       ,ss ss3
       ,ws ws1
       ,ws ws2
       ,ws ws3
 where
    ss1.d_qoy = 1
    and ss1.d_year = 2000
    and ss1.ca_county = ss2.ca_county
    and ss2.d_qoy = 2
    and ss2.d_year = 2000
 and ss2.ca_county = ss3.ca_county
    and ss3.d_qoy = 3
    and ss3.d_year = 2000
    and ss1.ca_county = ws1.ca_county
    and ws1.d_qoy = 1
    and ws1.d_year = 2000
    and ws1.ca_county = ws2.ca_county
    and ws2.d_qoy = 2
    and ws2.d_year = 2000
    and ws1.ca_county = ws3.ca_county
    and ws3.d_qoy = 3
    and ws3.d_year =2000
    and case when ws1.web_sales > 0 then ws2.web_sales/ws1.web_sales else null end 
       > case when ss1.store_sales > 0 then ss2.store_sales/ss1.store_sales else null end
    and case when ws2.web_sales > 0 then ws3.web_sales/ws2.web_sales else null end
       > case when ss2.store_sales > 0 then ss3.store_sales/ss2.store_sales else null end
 order by web_q2_q3_increase;

-- end 


On 13 October 2017 at 01:00, Gourav Kumar <gourav1905@gmail.com> wrote:
Well for this given query it is possible. I haven't come across any such query yet.

Possibly because I am more concerned about the TPCDS and TPCH benchmarks, where it's less likely to occur.

On 13 October 2017 at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.

Really?  What of, say,

        select ... from a,b,c where (a.x + b.y) = c.z;

                        regards, tom lane



--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science



--
Thanks,
Gourav Kumar
Computer Science and Automation
Indian Institute of Science
On Sat, Oct 14, 2017 at 3:15 AM, Gourav Kumar <gourav1905@gmail.com> wrote:
> Why does have_relevant_joinclause() and have_relevant_eclass_joinclause()
> return true for all possible joins for the query given below.
> Even when they have no join predicate between them.
> e.g. join between ss1 & ws3, ss2 & ws3 etc.
>

The prologues of those functions and comments within those explain that.

/** have_relevant_joinclause*      Detect whether there is a joinclause that involves*      the two given relations.**
Note:the joinclause does not have to be evaluable with only these two* relations.  This is intentional.  For example
consider*     SELECT * FROM a, b, c WHERE a.x = (b.y + c.z)* If a is much larger than the other tables, it may be
worthwhileto* cross-join b and c and then use an inner indexscan on a.x.  Therefore* we should consider this joinclause
asreason to join b to c, even though* it can't be applied at that join step.*/
 

/** have_relevant_eclass_joinclause*      Detect whether there is an EquivalenceClass that could produce*      a
joinclauseinvolving the two given relations.** This is essentially a very cut-down version of*
generate_join_implied_equalities(). Note it's OK to occasionally say "yes"* incorrectly.  Hence we don't bother with
detailslike whether the lack of a* cross-type operator might prevent the clause from actually being generated.*/
 
May be you want to see whether those comments are applicable in your
case and also see how the callers handle the return values.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers