Re: PostgreSQL seems to create inefficient plans in simple conditional joins - Mailing list pgsql-performance

From Hedayat Vatankhah
Subject Re: PostgreSQL seems to create inefficient plans in simple conditional joins
Date
Msg-id 56AD303D.8090800@gmail.com
Whole thread Raw
In response to Re: PostgreSQL seems to create inefficient plans in simple conditional joins  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: PostgreSQL seems to create inefficient plans in simple conditional joins  (Hedayat Vatankhah <hedayat.fwd@gmail.com>)
List pgsql-performance
Hi,

/*David Rowley*/ wrote on Sun, 31 Jan 2016 04:57:04 +1300:
> On 31 January 2016 at 01:30, Hedayat Vatankhah <hedayat.fwd@gmail.com> wrote:
>> Personally, I expect both queries below to perform exactly the same:
>>
>> SELECT
>>      t1.id, *
>> FROM
>>      t1
>> INNER JOIN
>>      t2 ON t1.id = t2.id
>>      where t1.id > -9223372036513411363;
>>
>> And:
>>
>> SELECT
>>      t1.id, *
>> FROM
>>      t1
>> INNER JOIN
>>      t2 ON t1.id = t2.id
>>      where t1.id > -9223372036513411363 and t2.id > -9223372036513411363;
>>
>> Unfortunately, they do not. PostgreSQL creates different plans for these
>> queries, which results in very poor performance for the first one compared
>> to the second (What I'm testing against is a DB with around 350 million
>> rows in t1, and slightly less in t2).
>>
>> EXPLAIN output:
>> First query: http://explain.depesz.com/s/uauk
>> Second query: link: http://explain.depesz.com/s/uQd
> Yes, unfortunately you've done about the only thing that you can do,
> and that's just include both conditions in the query. Is there some
> special reason why you can't just write the t2.id > ... condition in
> the query too? or is the query generated dynamically by some software
> that you have no control over?
I can, but it would make my application code much more complex. I was
hoping to be able to hide the complexity of DB data model in DB itself
using views, triggers etc. If I want to add such conditions, the query
generator in my application code would be more complex, and certainly
the internal structure of DB will be visible to it.

I'm working to re-design a DB which can grow large and slow, as I guess
that we can find a more optimal design before trying optimizations like
using materialized views and other common optimizations. I've found two
completely different approaches for such problems: de-normalizing data,
highly normalizing data (6NF) like Anchor Modeling approach. I decided
to experiment with something similar to the latter one (not that
extreme!) specially since our current design was not that normalized,
and it performs poorly. I'm investigating why it should perform so bad
with my queries, and this problem was one of the reasons. In such a
design, views are used to present the JOIN of many tables as a single
table, so that using the model is easy and transparent. But usually a
single table doesn't have 10 ID columns (which can change as the model
changes) for which you should repeat any conditions to get acceptable
results!
While it can be done, it is so annoying: the application should know how
many tables are joined together, and repeat the condition for all such
columns. And the problem become worse when you are going to create a
relation between two different IDs of different data, e.g. relating
customer info (composed of joining 5 tables) with info about items (s)he
bought (composed of joining 3 tables).

Anyway, it seems that this is what I should implement in my application
code. I just hope that adding explicit conditions for each joined table
will not turn off any other optimizations!

Such an optimization seemed so natural to me that I didn't believe that
PostgreSQL doesn't understand that a condition on ID applies to all id
columns in a JOINed query, that I simplified my query step by step until
I reached the minimum problematic query which is very similar to the one
I posted here. It was at this point that I finally realized that maybe
PostgreSQL really doesn't understand it, and I was ... shocked!


> I'd personally quite like to see improvements in this area, and even
> wrote a patch [1] which fixes this problem too. The problem I had when
> proposing the fix for this was that I was unable to report details
> about how many people are hit by this planner limitation. The patch I
> proposed caused a very small impact on planning time for many queries,
> and was thought by many not to apply in enough cases for it to be
> worth slowing down queries which cannot possibly benefit. Of course I
> agree with this, I've no interest in slowing down planning on queries,
> but at the same time understand the annoying poor optimisation in this
> area.
>
> Although please remember the patch I proposed was merely a first draft
> proposal. Not for production use.
>
> [1]
http://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com#CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
>
That's great, I might consider experimenting with this too.

Regards,
Hedayat




pgsql-performance by date:

Previous
From: Vitalii Tymchyshyn
Date:
Subject: Re: PostgreSQL seems to create inefficient plans in simple conditional joins
Next
From: Mathieu De Zutter
Date:
Subject: View containing a recursive function