Thread: [HACKERS] How does postgres store the join predicate for a relation in a given query
[HACKERS] How does postgres store the join predicate for a relation in a given query
From
Gourav Kumar
Date:
Hi all,
When you fire a query in postgresql, it will first parse the query and create the data structures for storing various aspects of the query and executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
I want to know how does postgresql stores the join predicates of a query. Like which data structure is used to store the join predicates.
How can we find the join predicates applied on a relation from relid, Oid or RangeTblEntry ?
I want to construct a join graph for a given query, for which I need the join predicates between two relations.
Thanks,
Gourav Kumar
Gourav Kumar
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Nico Williams
Date:
On Tue, Oct 10, 2017 at 07:29:24PM +0530, Gourav Kumar wrote: > When you fire a query in postgresql, it will first parse the query and > create the data structures for storing various aspects of the query and > executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.). > > I want to know how does postgresql stores the join predicates of a query. > Like which data structure is used to store the join predicates. > > How can we find the join predicates applied on a relation from relid, Oid > or RangeTblEntry ? > > I want to construct a join graph for a given query, for which I need the > join predicates between two relations. In the usingClause or quals fields of a JoinExpr. See src/backend/parser/gram.y, search for join_qual. Of course, WHERE clauses have to be inspected as well, which go into the whereClause of of a SelectStmt; search for where_clause in src/backend/parser/gram.y. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Ashutosh Bapat
Date:
On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar <gourav1905@gmail.com> wrote: > Hi all, > > When you fire a query in postgresql, it will first parse the query and > create the data structures for storing various aspects of the query and > executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.). > > I want to know how does postgresql stores the join predicates of a query. > Like which data structure is used to store the join predicates. > > How can we find the join predicates applied on a relation from relid, Oid or > RangeTblEntry ? > Every relation has a RelOptInfo associated with it. Predicates applicable to it are stored in this RelOptInfo as a list. For base relations (simple tables) it's in baserestrictinfo. The join predicates applicable are in joininfo. You can get RelOptInfo of a given simple table using find_base_rel(). HTH. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Gourav Kumar
Date:
Hi Ashutosh,
I have the RelOptInfo data structure for the relations which are to be joined but when I check their joininfo, it is empty.
Does baserestrictinfo contains base predicates ?
Thanks
Gourav.
On 11 October 2017 at 12:00, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar <gourav1905@gmail.com> wrote:
> Hi all,
>
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
>
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
>
> How can we find the join predicates applied on a relation from relid, Oid or
> RangeTblEntry ?
>
Every relation has a RelOptInfo associated with it. Predicates
applicable to it are stored in this RelOptInfo as a list. For base
relations (simple tables) it's in baserestrictinfo. The join
predicates applicable are in joininfo. You can get RelOptInfo of a
given simple table using find_base_rel().
HTH.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Re: [HACKERS] How does postgres store the join predicate for a relation in a given query
From
Tom Lane
Date:
Gourav Kumar <gourav1905@gmail.com> writes: > I have the RelOptInfo data structure for the relations which are to be > joined but when I check their joininfo, it is empty. You aren't telling us anything much about the case you're studying, but if the join clauses have the form of equality comparisons, they likely got converted into EquivalenceClass data structures instead. These days the joininfo lists only contain "unstructured" join conditions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Gourav Kumar
Date:
My objective is to construct join graph from a given query.
A join graph, has a node for each relation involved in a join, and an edge between two relations if they share a join predicate among them.
To do this I first tried to use the make_join_rel() function
- There I checked if they root->join->cur->level is 2, just write the relation names to a file.
- But this strategy failed, because if there is somewhere a Cartesian product among two relations, then they can't have an edge in the join graph.
- So, along with writing the relation name, I need to know if they share a join predicate among them or not.
On 12 October 2017 at 22:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.
You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.
regards, tom lane
Thanks,
Gourav Kumar
Computer Science and AutomationGourav Kumar
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Gourav Kumar
Date:
What is meant by "unstructured Join"?
Thanks,
Gourav
On 12 October 2017 at 22:47, Gourav Kumar <gourav1905@gmail.com> wrote:
My objective is to construct join graph from a given query.A join graph, has a node for each relation involved in a join, and an edge between two relations if they share a join predicate among them.To do this I first tried to use the make_join_rel() function- There I checked if they root->join->cur->level is 2, just write the relation names to a file.- But this strategy failed, because if there is somewhere a Cartesian product among two relations, then they can't have an edge in the join graph.- So, along with writing the relation name, I need to know if they share a join predicate among them or not.--On 12 October 2017 at 22:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:Gourav Kumar <gourav1905@gmail.com> writes:
> I have the RelOptInfo data structure for the relations which are to be
> joined but when I check their joininfo, it is empty.
You aren't telling us anything much about the case you're studying,
but if the join clauses have the form of equality comparisons, they
likely got converted into EquivalenceClass data structures instead.
These days the joininfo lists only contain "unstructured" join
conditions.
regards, tom laneIndian Institute of ScienceThanks,Computer Science and Automation
Gourav Kumar
Thanks,
Gourav Kumar
Computer Science and AutomationGourav Kumar
Re: [HACKERS] How does postgres store the join predicate for a relation in a given query
From
Tom Lane
Date:
Gourav Kumar <gourav1905@gmail.com> writes: > My objective is to construct join graph from a given query. > A join graph, has a node for each relation involved in a join, and an edge > between two relations if they share a join predicate among them. Hm, well, you could adapt the logic in have_relevant_joinclause() and have_relevant_eclass_joinclause(). Or maybe you could just use them as-is ... depends on what you have in mind to do with join clauses that mention 3 or more relations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Gourav Kumar
Date:
A Join clause/predicate will only mention 2 relations. It can't have 3 or more relations.
On 12 October 2017 at 23:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> My objective is to construct join graph from a given query.
> A join graph, has a node for each relation involved in a join, and an edge
> between two relations if they share a join predicate among them.
Hm, well, you could adapt the logic in have_relevant_joinclause() and
have_relevant_eclass_joinclause(). Or maybe you could just use them
as-is ... depends on what you have in mind to do with join clauses
that mention 3 or more relations.
regards, tom lane
Thanks,
Gourav Kumar
Computer Science and AutomationGourav Kumar
Re: [HACKERS] How does postgres store the join predicate for a relation in a given query
From
Tom Lane
Date:
Gourav Kumar <gourav1905@gmail.com> writes: > A Join clause/predicate will only mention 2 relations. It can't have 3 or > more relations. Really? What of, say, select ... from a,b,c where (a.x + b.y) = c.z; regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Gourav Kumar
Date:
Well for this given query it is possible. I haven't come across any such query yet.
Possibly because I am more concerned about the TPCDS and TPCH benchmarks, where it's less likely to occur.
On 13 October 2017 at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.
Really? What of, say,
select ... from a,b,c where (a.x + b.y) = c.z;
regards, tom lane
Thanks,
Gourav Kumar
Computer Science and AutomationGourav Kumar
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Gourav Kumar
Date:
Why does have_relevant_joinclause() and have_relevant_eclass_ joinclause() return true for all possible joins for the query given below.
Even when they have no join predicate between them.
e.g. join between ss1 & ws3, ss2 & ws3 etc.
The query is :
TPC-DS query 50
-- query 50 in stream 0 using template query31.tpl
with ss as
(select ca_county,d_qoy, d_year,sum(ss_ext_sales_price) as store_sales
from store_sales,date_dim,customer_address
where ss_sold_date_sk = d_date_sk
and ss_addr_sk=ca_address_sk
group by ca_county,d_qoy, d_year),
ws as
(select ca_county,d_qoy, d_year,sum(ws_ext_sales_price) as web_sales
from web_sales,date_dim,customer_address
where ws_sold_date_sk = d_date_sk
and ws_bill_addr_sk=ca_address_sk
group by ca_county,d_qoy, d_year)
select /* tt */
ss1.ca_county
,ss1.d_year
,ws2.web_sales/ws1.web_sales web_q1_q2_increase
,ss2.store_sales/ss1.store_sales store_q1_q2_increase
,ws3.web_sales/ws2.web_sales web_q2_q3_increase
,ss3.store_sales/ss2.store_sales store_q2_q3_increase
from
ss ss1
,ss ss2
,ss ss3
,ws ws1
,ws ws2
,ws ws3
where
ss1.d_qoy = 1
and ss1.d_year = 2000
and ss1.ca_county = ss2.ca_county
and ss2.d_qoy = 2
and ss2.d_year = 2000
and ss2.ca_county = ss3.ca_county
and ss3.d_qoy = 3
and ss3.d_year = 2000
and ss1.ca_county = ws1.ca_county
and ws1.d_qoy = 1
and ws1.d_year = 2000
and ws1.ca_county = ws2.ca_county
and ws2.d_qoy = 2
and ws2.d_year = 2000
and ws1.ca_county = ws3.ca_county
and ws3.d_qoy = 3
and ws3.d_year =2000
and case when ws1.web_sales > 0 then ws2.web_sales/ws1.web_sales else null end
> case when ss1.store_sales > 0 then ss2.store_sales/ss1.store_sales else null end
and case when ws2.web_sales > 0 then ws3.web_sales/ws2.web_sales else null end
> case when ss2.store_sales > 0 then ss3.store_sales/ss2.store_sales else null end
order by web_q2_q3_increase;
-- end
On 13 October 2017 at 01:00, Gourav Kumar <gourav1905@gmail.com> wrote:
Well for this given query it is possible. I haven't come across any such query yet.Possibly because I am more concerned about the TPCDS and TPCH benchmarks, where it's less likely to occur.--On 13 October 2017 at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:Gourav Kumar <gourav1905@gmail.com> writes:
> A Join clause/predicate will only mention 2 relations. It can't have 3 or
> more relations.
Really? What of, say,
select ... from a,b,c where (a.x + b.y) = c.z;
regards, tom laneIndian Institute of ScienceThanks,Computer Science and Automation
Gourav Kumar
Thanks,
Gourav Kumar
Computer Science and AutomationGourav Kumar
Re: [HACKERS] How does postgres store the join predicate for arelation in a given query
From
Ashutosh Bapat
Date:
On Sat, Oct 14, 2017 at 3:15 AM, Gourav Kumar <gourav1905@gmail.com> wrote: > Why does have_relevant_joinclause() and have_relevant_eclass_joinclause() > return true for all possible joins for the query given below. > Even when they have no join predicate between them. > e.g. join between ss1 & ws3, ss2 & ws3 etc. > The prologues of those functions and comments within those explain that. /** have_relevant_joinclause* Detect whether there is a joinclause that involves* the two given relations.** Note:the joinclause does not have to be evaluable with only these two* relations. This is intentional. For example consider* SELECT * FROM a, b, c WHERE a.x = (b.y + c.z)* If a is much larger than the other tables, it may be worthwhileto* cross-join b and c and then use an inner indexscan on a.x. Therefore* we should consider this joinclause asreason to join b to c, even though* it can't be applied at that join step.*/ /** have_relevant_eclass_joinclause* Detect whether there is an EquivalenceClass that could produce* a joinclauseinvolving the two given relations.** This is essentially a very cut-down version of* generate_join_implied_equalities(). Note it's OK to occasionally say "yes"* incorrectly. Hence we don't bother with detailslike whether the lack of a* cross-type operator might prevent the clause from actually being generated.*/ May be you want to see whether those comments are applicable in your case and also see how the callers handle the return values. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers