Re: Performance regression from 8.3.7 to 9.0.3 - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Performance regression from 8.3.7 to 9.0.3
Date
Msg-id AANLkTimCRMGyOU3Oq6RnonGY5Zib-gqDSSmp+E-XmH+6@mail.gmail.com
Whole thread Raw
In response to Re: Performance regression from 8.3.7 to 9.0.3  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-performance
On Mon, Mar 14, 2011 at 12:46 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Mar 14, 2011 at 2:34 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> changes to:
>>
>> SELECT member_statistics.member_id
>>         FROM member_statistics
>>         WHERE EXISTS
>>         (
>>           SELECT mat1.tag_id
>>           FROM member_all_tags_v AS mat1
>>           WHERE mat1.member_id = member_statistics.member_id
>>             AND mat1.tag_id
>>             IN (640, 641, 3637, 3638, 637, 638, 639) AND
>> mat1.polarity >= 90
>>             AND mat1.member_id  IN ( <<400 ids>> )
>>         )
>
> It isn't easy to get the ORM to spit that kind of queries, but I could
> try them by hand.
>
>> also, always try to compare vs straight join version:
>>
>>
>> SELECT member_statistics.member_id
>>         FROM member_statistics
>>         JOIN VALUES ( <<400 ids>> ) q(member_id) using (member_id)
>>         JOIN
>>         (
>>           SELECT mat1.member_id
>>           FROM member_all_tags_v AS mat1
>>           WHERE mat1.tag_id  IN (640, 641, 3637, 3638, 637, 638, 639)
>>             AND mat1.polarity >= 90) p
>>            USING(member_id)
>>          ) p using(member_id);
>>
>> merlin
>
> The straight join like that was used long ago, but it replicates rows
> unacceptably: for each row in the subquery, one copy of member_id is
> output, which create an unacceptable overhead in the application and
> network side. It could be perhaps fixed with distinct, but then
> there's sorting overhead.

ah -- right. my mistake. well, you could always work around with
'distinct', although the exists version should be better (semi vs full
join).  what options *do* you have in terms of coaxing the ORM to
produce particular sql? :-).  This is likely 100% work-aroundable via
tweaking the SQL.  I don't have the expertise to suggest a solution
with your exact sql, if there is one.

merlin

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Performance regression from 8.3.7 to 9.0.3
Next
From: Tom Lane
Date:
Subject: Re: Performance regression from 8.3.7 to 9.0.3