Thread: pb with join plan
Hi,
I have a postgres 14 on linux with a 15 TB db, with 20 cores and 150GB RAM, all nvme ssd. . Currently one user :-)
A join between 2 big tables and then another join with a smaller third one takes less than 1 minute and provides a result of 15 M lines. Fine.
if I do add a third join, with a 30k lines table, with a simple equality as join clause it does work almost as fast. explain analyze indicates 45 sec.
same if i do the very same with another equality clause. explain analyze indicates 140 sec.
But if I do the same with clause one OR clause 2, I have to kill the request after an hour, seeing the filesystem showing more than 140 Mb of increased usage.
Looking at the explain plan with one clause or the 2 ORed, there are changes in the plan (of course)
with the fastest clause the estimated cost is 3 700 000 and with the a bit slower one 3 900 000.
with both ORed, the estimated cost is 16 000 000. To me it does sound a bit strange, as ORing the join clauses should add times, but not more (so so)
So, before providing the 3 explain plans (I must anonymize everything, so somewhat boring) I would like to know if there is some obvious thing I am missing.
all tables have been vacuum analyzed.
thanks
On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com> wrote: > But if I do the same with clause one OR clause 2, I have to kill the request after an hour, seeing the filesystem showingmore than 140 Mb of increased usage. > So, before providing the 3 explain plans (I must anonymize everything, so somewhat boring) I would like to know if thereis some obvious thing I am missing. > all tables have been vacuum analyzed. I believe you can anonymise the explain with https://explain.depesz.com/ It's pretty hard to say until we see the query, but having an OR in the join condition makes it impossible to Hash or Merge join, so perhaps it's slow due to Nested Loop join. You could consider rewriting the query to use a UNION or a UNION ALL separating out each branch of the OR into a UNION of its own. That would allow Hash and Merge join to work again. However, that's all speculation until you provide more details. https://wiki.postgresql.org/wiki/Guide_to_reporting_problems David
On Tue, Jun 20, 2023 at 10:14 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com> wrote:
> But if I do the same with clause one OR clause 2, I have to kill the request after an hour, seeing the filesystem showing more than 140 Mb of increased usage.
> So, before providing the 3 explain plans (I must anonymize everything, so somewhat boring) I would like to know if there is some obvious thing I am missing.
> all tables have been vacuum analyzed.
I believe you can anonymise the explain with https://explain.depesz.com/
link to the anonymized plan of the req with one clause : https://explain.depesz.com/s/TWp4
It's pretty hard to say until we see the query, but having an OR in
the join condition makes it impossible to Hash or Merge join, so
perhaps it's slow due to Nested Loop join.
You could consider rewriting the query to use a UNION or a UNION ALL
separating out each branch of the OR into a UNION of its own. That
would allow Hash and Merge join to work again. However, that's all
speculation until you provide more details.
https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
David
On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@mokadb.com> wrote: > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley <dgrowleyml@gmail.com> wrote: >> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com> wrote: >> > But if I do the same with clause one OR clause 2, I have to kill the request after an hour, seeing the filesystem showingmore than 140 Mb of increased usage. >> >> > link to the anonymized plan of the req with one clause : https://explain.depesz.com/s/TWp4 It's quite difficult to know what the problem is you want to fix here. Your initial post indicated it was the query with the OR condition that was causing you the problems, but the plan you've posted has no OR condition?! You're more likely to get help here if you take time to properly explain the situation and post the information that's actually relevant to the problem you're having, or state the problem more clearly, as there's a mismatch somewhere. It might also be worth having a look at https://wiki.postgresql.org/wiki/Slow_Query_Questions . EXPLAIN is not going to tell us what part of the query is slow. I'll let the wiki page guide you into what to do instead. David
On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@mokadb.com> wrote:
>
> On Tue, Jun 20, 2023 at 10:14 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com> wrote:
>> > But if I do the same with clause one OR clause 2, I have to kill the request after an hour, seeing the filesystem showing more than 140 Mb of increased usage.
>>
>>
> link to the anonymized plan of the req with one clause : https://explain.depesz.com/s/TWp4
link to the plan with the second clause alone: https://explain.depesz.com/s/byW5
link to the plan with both clauses ORed (the one not finishing) https://explain.depesz.com/s/jHO2
It's quite difficult to know what the problem is you want to fix here.
Your initial post indicated it was the query with the OR condition
that was causing you the problems, but the plan you've posted has no
OR condition?!
You're more likely to get help here if you take time to properly
explain the situation and post the information that's actually
relevant to the problem you're having, or state the problem more
clearly, as there's a mismatch somewhere.
It might also be worth having a look at
https://wiki.postgresql.org/wiki/Slow_Query_Questions . EXPLAIN is not
going to tell us what part of the query is slow. I'll let the wiki
page guide you into what to do instead.
I know that page. obviously, as I have to kill the request, I cannot provide a explain analyze...
David
On Wed, 21 Jun 2023 at 10:26, Marc Millas <marc.millas@mokadb.com> wrote: > link to the plan with both clauses ORed (the one not finishing) https://explain.depesz.com/s/jHO2 I'd go with the UNION or UNION ALL idea I mentioned earlier. David
On 6/21/23 00:26, Marc Millas wrote: > > > On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml@gmail.com > <mailto:dgrowleyml@gmail.com>> wrote: > > On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@mokadb.com > <mailto:marc.millas@mokadb.com>> wrote: > > > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley > <dgrowleyml@gmail.com <mailto:dgrowleyml@gmail.com>> wrote: > >> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com > <mailto:marc.millas@mokadb.com>> wrote: > >> > But if I do the same with clause one OR clause 2, I have to > kill the request after an hour, seeing the filesystem showing more > than 140 Mb of increased usage. > >> > >> > > link to the anonymized plan of the req with one clause : > https://explain.depesz.com/s/TWp4 <https://explain.depesz.com/s/TWp4> > > link to the plan with the second > clause alone: https://explain.depesz.com/s/byW5 > <https://explain.depesz.com/s/byW5> > link to the plan with both clauses ORed (the one not > finishing) https://explain.depesz.com/s/jHO2 > <https://explain.depesz.com/s/jHO2> > > > > It's quite difficult to know what the problem is you want to fix here. > Your initial post indicated it was the query with the OR condition > that was causing you the problems, but the plan you've posted has no > OR condition?! > > You're more likely to get help here if you take time to properly > explain the situation and post the information that's actually > relevant to the problem you're having, or state the problem more > clearly, as there's a mismatch somewhere. > > It might also be worth having a look at > https://wiki.postgresql.org/wiki/Slow_Query_Questions > <https://wiki.postgresql.org/wiki/Slow_Query_Questions> . EXPLAIN is not > going to tell us what part of the query is slow. I'll let the wiki > page guide you into what to do instead. > > > I know that page. obviously, as I have to kill the request, I cannot > provide a explain analyze... > It's a bit weird the "victor" table is joined seemingly without any join conditions, leading to a cross join (which massively inflates the cost for joins above it). Maybe the anonymized plan mangles it somehow. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 6/21/23 00:26, Marc Millas wrote:
>
>
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml@gmail.com
> <mailto:dgrowleyml@gmail.com>> wrote:
>
> On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> <dgrowleyml@gmail.com <mailto:dgrowleyml@gmail.com>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >> > But if I do the same with clause one OR clause 2, I have to
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
> >>
> >>
> > link to the anonymized plan of the req with one clause :
> https://explain.depesz.com/s/TWp4 <https://explain.depesz.com/s/TWp4>
>
> link to the plan with the second
> clause alone: https://explain.depesz.com/s/byW5
> <https://explain.depesz.com/s/byW5>
> link to the plan with both clauses ORed (the one not
> finishing) https://explain.depesz.com/s/jHO2
> <https://explain.depesz.com/s/jHO2>
>
>
>
> It's quite difficult to know what the problem is you want to fix here.
> Your initial post indicated it was the query with the OR condition
> that was causing you the problems, but the plan you've posted has no
> OR condition?!
>
> You're more likely to get help here if you take time to properly
> explain the situation and post the information that's actually
> relevant to the problem you're having, or state the problem more
> clearly, as there's a mismatch somewhere.
>
> It might also be worth having a look at
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
> <https://wiki.postgresql.org/wiki/Slow_Query_Questions> . EXPLAIN is not
> going to tell us what part of the query is slow. I'll let the wiki
> page guide you into what to do instead.
>
>
> I know that page. obviously, as I have to kill the request, I cannot
> provide a explain analyze...
>
It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.
the query does:
select blabla from table1 join table2 on (list of 9 fields ANDed and corresponding to the index of both table1 and table2)
join table3 on table1.a=table3.a and table1.b=table3.b
join table4 on (list of 2 clauses table2.d=table4.e and one clause substr(table2.f)=table4.g all ORed)
table1 and table2 are big (15M and 60M lines), table3 and table4 are small (30k lines)
basically, if I rewrites the query beginning by the join between table2 and table4, then join table1 and then table3, postgres generates the same plan, which doesnt end.
if instead of the 3 clauses of the last join I keep one equality clause, the explain plan looks the same, but executes in 45 secondes.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Marc MILLAS
Marc MILLAS
On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 6/21/23 00:26, Marc Millas wrote:
>
>
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml@gmail.com
> <mailto:dgrowleyml@gmail.com>> wrote:
>
> On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> <dgrowleyml@gmail.com <mailto:dgrowleyml@gmail.com>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >> > But if I do the same with clause one OR clause 2, I have to
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.
So I did try to simplify my pb.
I create a table with the result of the first 3 joins.
That table do have 15M lines. all tables have been vacuum analyze
Now if I do an explain analyze of a simple join between that table and my original table 4
using a simple = clause, I get a result in one second (around). and the planner guesses for rows seems in line with the observed values .
if I use a substr(table1.a)= table2.b, the explain analyze get a result in 21 seconds and the planner estimates a 65M rows result set while the observed is 330 k rows
so here its 20 times slower and the discrepency between planner rows guess and reality is a 200 ratio.
Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f
then... I kill the query after a quarter an hour without any answer.
if I try to just explain the query, the planner rows guess becomes more than 2 Billions....
the extremely simple query and plan are here, without automatic obfuscation
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Marc MILLAS
On Wed, Jun 21, 2023 at 12:10 PM Marc Millas <marc.millas@mokadb.com> wrote:
Marc MILLASOn Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:On 6/21/23 00:26, Marc Millas wrote:
>
>
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml@gmail.com
> <mailto:dgrowleyml@gmail.com>> wrote:
>
> On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> <dgrowleyml@gmail.com <mailto:dgrowleyml@gmail.com>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >> > But if I do the same with clause one OR clause 2, I have to
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.So I did try to simplify my pb.I create a table with the result of the first 3 joins.That table do have 15M lines. all tables have been vacuum analyzeNow if I do an explain analyze of a simple join between that table and my original table 4using a simple = clause, I get a result in one second (around). and the planner guesses for rows seems in line with the observed values .if I use a substr(table1.a)= table2.b, the explain analyze get a result in 21 seconds and the planner estimates a 65M rows result set while the observed is 330 k rowsso here its 20 times slower and the discrepency between planner rows guess and reality is a 200 ratio.Now, if I try an explain analyze with join on a=b or substr(c)=d or e=fthen... I kill the query after a quarter an hour without any answer.if I try to just explain the query, the planner rows guess becomes more than 2 Billions....the extremely simple query and plan are here, without automatic obfuscation
First, I am not sure why you cannot send us the explain analyze. But moving on...
substr() is a function that mutilates a value such that the index becomes useless...
If you are looking for the LEFT() of the value, then an INDEX can be used.
I have COLLATION "C" and when I query:
WHERE fld like fld_b||"%"
The optimizer constructs a query that uses the index on "fld"...
But when I try:
WHERE fld like CONCAT_WS("", fld_b,"%")
It doesn't use the index version. (because the function call is too complicated to see through)
When using functions in where clauses, indexes either have to be made on those functions, or often times the index cannot be used.
BTW, I noted the COLLATION. That turned out to be important, because my first DB test did NOT use that collation, and the result
WHERE fld like CONCAT_WS("", fld_b,"%")
It doesn't use the index version. (because the function call is too complicated to see through)
When using functions in where clauses, indexes either have to be made on those functions, or often times the index cannot be used.
BTW, I noted the COLLATION. That turned out to be important, because my first DB test did NOT use that collation, and the result
of the LIKE was the non-indexed version...
I hope you find something useful in here.
Also, WHERE fld <> 72... (unless you have a heavily skewed set of statistics, I read that as. SCAN everything, and check later,
I hope you find something useful in here.
Also, WHERE fld <> 72... (unless you have a heavily skewed set of statistics, I read that as. SCAN everything, and check later,
because this should filter very few rows), whereas fld = 72 will be blazingly fast.
Kirk
Kirk
On Tue, Jun 27, 2023 at 8:12 PM Kirk Wolak <wolakk@gmail.com> wrote:
On Wed, Jun 21, 2023 at 12:10 PM Marc Millas <marc.millas@mokadb.com> wrote:Marc MILLASOn Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:On 6/21/23 00:26, Marc Millas wrote:
>
>
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley <dgrowleyml@gmail.com
> <mailto:dgrowleyml@gmail.com>> wrote:
>
> On Wed, 21 Jun 2023 at 08:34, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> <dgrowleyml@gmail.com <mailto:dgrowleyml@gmail.com>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <marc.millas@mokadb.com
> <mailto:marc.millas@mokadb.com>> wrote:
> >> > But if I do the same with clause one OR clause 2, I have to
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.So I did try to simplify my pb.I create a table with the result of the first 3 joins.That table do have 15M lines. all tables have been vacuum analyzeNow if I do an explain analyze of a simple join between that table and my original table 4using a simple = clause, I get a result in one second (around). and the planner guesses for rows seems in line with the observed values .if I use a substr(table1.a)= table2.b, the explain analyze get a result in 21 seconds and the planner estimates a 65M rows result set while the observed is 330 k rowsso here its 20 times slower and the discrepency between planner rows guess and reality is a 200 ratio.Now, if I try an explain analyze with join on a=b or substr(c)=d or e=fthen... I kill the query after a quarter an hour without any answer.if I try to just explain the query, the planner rows guess becomes more than 2 Billions....the extremely simple query and plan are here, without automatic obfuscationFirst, I am not sure why you cannot send us the explain analyze. But moving on...
Kirk, the explain analyze, with the SQL query is directly accessible on the explain.depesz link .
substr() is a function that mutilates a value such that the index becomes useless...If you are looking for the LEFT() of the value, then an INDEX can be used.I have COLLATION "C" and when I query:WHERE fld like fld_b||"%"
there are NO indexes on those columns. One of the reasons is that the simplest index on one column is 380 GB on disk
So to put indexes on each criteria, I must add around 1 TB of disk just for ttt....
the full scan is not a problem. Its fast.. The problem is the nested loop which do compare each of the 15M lines of ttt to each of the 30K lines of inc_pha_r.
its an operation done 450 000 000 000 times. so if each comparison is 1 microsecond long, the nested loop is 125 hours long.
And I am not sure that the comparison is done in 1 microsecond...
The optimizer constructs a query that uses the index on "fld"...But when I try:
WHERE fld like CONCAT_WS("", fld_b,"%")
It doesn't use the index version. (because the function call is too complicated to see through)
When using functions in where clauses, indexes either have to be made on those functions, or often times the index cannot be used.
BTW, I noted the COLLATION. That turned out to be important, because my first DB test did NOT use that collation, and the resultof the LIKE was the non-indexed version...
I hope you find something useful in here.
Thanks for trying
Also, WHERE fld <> 72... (unless you have a heavily skewed set of statistics, I read that as. SCAN everything, and check later,because this should filter very few rows), whereas fld = 72 will be blazingly fast.
Kirk