Thread: Performance of full outer join in 8.3

Performance of full outer join in 8.3

From
Christian Schröder
Date:
Hi list,
we have just migrated one of our databases from 8.2.12 to 8.3.7. We now
experience a strange problem: A query that was really fast on the 8.2
server is now much slower on the 8.3 server (1 ms vs. 60 sec). I had a
look at the query plan and it is completely different. Both servers run
on the same machine. The configuration (planner constants etc.) is
identical. The database has been vacuum analyzed after the migration. So
why the difference?

This is the query:
select isin from ts_frontend.attachment_isins full OUTER JOIN
ts_frontend.rec_isins using (attachment,isin)  WHERE attachment=2698120
GROUP BY isin limit 1000;

Here is the explain analyze in 8.2:


QUERY
PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=826.44..826.61 rows=17 width=32) (actual time=0.163..0.172
rows=2 loops=1)
   ->  HashAggregate  (cost=826.44..826.61 rows=17 width=32) (actual
time=0.159..0.162 rows=2 loops=1)
         ->  Merge Full Join  (cost=799.62..826.40 rows=17 width=32)
(actual time=0.122..0.144 rows=2 loops=1)
               Merge Cond: (("outer"."?column3?" = "inner"."?column3?")
AND (attachment_isins.attachment = rec_isins.attachment))
               Filter: (COALESCE(attachment_isins.attachment,
rec_isins.attachment) = 2698120)
               ->  Sort  (cost=13.39..13.74 rows=138 width=20) (actual
time=0.065..0.067 rows=1 loops=1)
                     Sort Key: (attachment_isins.isin)::bpchar,
attachment_isins.attachment
                     ->  Index Scan using
attachment_isins_attachment_idx on attachment_isins  (cost=0.00..8.49
rows=138 width=20) (actual time=0.042..0.047 rows=1 loops=1)
                           Index Cond: (attachment = 2698120)
               ->  Sort  (cost=786.23..794.80 rows=3429 width=20)
(actual time=0.045..0.049 rows=2 loops=1)
                     Sort Key: (rec_isins.isin)::bpchar,
rec_isins.attachment
                     ->  Index Scan using idx_rec_isins_attachment on
rec_isins  (cost=0.00..584.89 rows=3429 width=20) (actual
time=0.019..0.024 rows=2 loops=1)
                           Index Cond: (attachment = 2698120)
 Total runtime: 0.302 ms
(14 rows)

And this is the 8.3 plan:

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=345890.35..345900.35 rows=1000 width=26) (actual
time=53926.706..53927.071 rows=2 loops=1)
   ->  HashAggregate  (cost=345890.35..346296.11 rows=40576 width=26)
(actual time=53926.702..53927.061 rows=2 loops=1)
         ->  Merge Full Join  (cost=71575.91..345788.91 rows=40576
width=26) (actual time=10694.727..53926.559 rows=2 loops=1)
               Merge Cond: (((rec_isins.isin)::bpchar =
(attachment_isins.isin)::bpchar) AND (rec_isins.attachment =
attachment_isins.attachment))
               Filter: (COALESCE(attachment_isins.attachment,
rec_isins.attachment) = 2698120)
               ->  Index Scan using rec_isin_pkey on rec_isins
(cost=0.00..229562.97 rows=8115133 width=17) (actual
time=0.141..18043.605 rows=8036226 loops=1)
               ->  Materialize  (cost=71575.91..78318.19 rows=539383
width=17) (actual time=10181.074..14471.215 rows=539101 loops=1)
                     ->  Sort  (cost=71575.91..72924.36 rows=539383
width=17) (actual time=10181.064..13019.906 rows=539101 loops=1)
                           Sort Key: attachment_isins.isin,
attachment_isins.attachment
                           Sort Method:  external merge  Disk: 18936kB
                           ->  Seq Scan on attachment_isins
(cost=0.00..13111.83 rows=539383 width=17) (actual time=0.036..912.963
rows=539101 loops=1)
 Total runtime: 53937.213 ms
(12 rows)

These are the table definitions:
           Table "ts_frontend.attachment_isins"
    Column    |              Type              | Modifiers
--------------+--------------------------------+-----------
 attachment   | integer                        | not null
 isin         | isin                           | not null
 editor       | name                           |
 last_changed | timestamp(0) without time zone |
Indexes:
    "attachment_isins_pkey" PRIMARY KEY, btree (attachment, isin)
    "attachment_isins_attachment_idx" btree (attachment)
    "attachment_isins_attachment_isin" btree (attachment, isin)
    "attachment_isins_isin_idx" btree (isin)
Foreign-key constraints:
    "attachment_isins_attachment_fkey" FOREIGN KEY (attachment)
REFERENCES ts_frontend.attachments(id) ON UPDATE CASCADE ON DELETE CASCADE

  Table "ts_frontend.rec_isins"
   Column   |  Type   | Modifiers
------------+---------+-----------
 attachment | integer | not null
 isin       | isin    | not null
Indexes:
    "rec_isin_pkey" PRIMARY KEY, btree (isin, attachment)
    "idx_rec_isins_attachment" btree (attachment)
Foreign-key constraints:
    "rec_isins_attachment_fkey" FOREIGN KEY (attachment) REFERENCES
ts_frontend.attachments(id) ON UPDATE CASCADE ON DELETE CASCADE

Thanks for any ideas!

Regards
    Christian

P.S.: I think the full outer join is not what the developer really
wanted to do. Instead, he should have done a union (which is pretty
fast, by the way). However, I still want to understand why the query
plan of his query changed between both database releases.

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


Re: Performance of full outer join in 8.3

From
Grzegorz Jaśkiewicz
Date:
set work_mem=24000; before running the query.

postgres is doing merge and sort on disc, that's always slow.

is there an index on column isin ?

Re: Performance of full outer join in 8.3

From
Christian Schröder
Date:
Grzegorz Jaśkiewicz wrote:
> set work_mem=24000; before running the query.
>
> postgres is doing merge and sort on disc, that's always slow.
>
Ok, but why is the plan different in 8.2? As you can see the same query
is really fast in 8.2, but slow in 8.3.
> is there an index on column isin ?
>
There is a separate index on the isin column of the attachment_isins
table (attachment_isins_isin_idx). The other table (rec_isins) has the
combination of attachment and isin as primary key which creates an
implicit index. Can this index be used for the single column isin? And
again: Why doesn't this matter in 8.2??

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


Re: Performance of full outer join in 8.3

From
Grzegorz Jaśkiewicz
Date:
2009/4/15 Christian Schröder <cs@deriva.de>:
> Grzegorz Jaśkiewicz wrote:
>>
>> set work_mem=24000; before running the query.
>>
>> postgres is doing merge and sort on disc, that's always slow.
>>
>
> Ok, but why is the plan different in 8.2? As you can see the same query is
> really fast in 8.2, but slow in 8.3.

Did that set help ?

I think Tom will know more about it, but probably (and I am guessing
here, to be honest) - Materialize plan wasn't either available, or
didn't appear too be a planners favourite.
on 8.2 the two loops instead were were much faster.

Can you try increasing stat target to 100, vacuum analyze and see if
different plan is choosen ?

Again, I don't know at that point why is it so - just trying to
suggests things that I would try .

>> is there an index on column isin ?
>>
>
> There is a separate index on the isin column of the attachment_isins table
> (attachment_isins_isin_idx). The other table (rec_isins) has the combination
> of attachment and isin as primary key which creates an implicit index. Can
> this index be used for the single column isin? And again: Why doesn't this
> matter in 8.2??

well, it is a different major release, and differences between
8.2->8.3 are vast.


--
GJ

Re: Performance of full outer join in 8.3

From
Simon Riggs
Date:
On Wed, 2009-04-15 at 14:04 +0200, Christian Schröder wrote:
> Grzegorz Jaśkiewicz wrote:
> > set work_mem=24000; before running the query.
> >
> > postgres is doing merge and sort on disc, that's always slow.
> >
> Ok, but why is the plan different in 8.2? As you can see the same query
> is really fast in 8.2, but slow in 8.3.

The cost of the query seems accurate, so the absence of
attachment_isins_attachment_idx on the 8.3 plan looks to be the reason.
There's no way it would choose to scan 8115133 rows on the pkey if the
other index was available and usable.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Performance of full outer join in 8.3

From
Grzegorz Jaśkiewicz
Date:
On Wed, Apr 15, 2009 at 1:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The cost of the query seems accurate, so the absence of
> attachment_isins_attachment_idx on the 8.3 plan looks to be the reason.
> There's no way it would choose to scan 8115133 rows on the pkey if the
> other index was available and usable.

hance my question, if there's index on it in 8.3 version of db.



--
GJ

Re: Performance of full outer join in 8.3

From
Tom Lane
Date:
=?ISO-8859-1?Q?Christian_Schr=F6der?= <cs@deriva.de> writes:
> This is the query:
> select isin from ts_frontend.attachment_isins full OUTER JOIN
> ts_frontend.rec_isins using (attachment,isin)  WHERE attachment=2698120
> GROUP BY isin limit 1000;

Hmm.  It seems 8.3 is failing to push the attachment=2698120 condition
down to the input relations.  Not sure why.  All that code got massively
rewritten in 8.3, but I thought it still understood about pushing
equalities through a full join ...

            regards, tom lane

Re: Performance of full outer join in 8.3

From
Christian Schröder
Date:
Grzegorz Jaśkiewicz wrote:
> On Wed, Apr 15, 2009 at 1:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> The cost of the query seems accurate, so the absence of
>> attachment_isins_attachment_idx on the 8.3 plan looks to be the reason.
>> There's no way it would choose to scan 8115133 rows on the pkey if the
>> other index was available and usable.
>>
>
> hance my question, if there's index on it in 8.3 version of db.
>
I added an index on this column, but it didn't change the query plan.
Stupid question: Do I have to analyze again or perform a reindex after
adding the index?

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


Re: Performance of full outer join in 8.3

From
Sam Mason
Date:
On Thu, Apr 16, 2009 at 01:31:45PM +0200, Christian Schröder wrote:
> Stupid question: Do I have to analyze again or perform a reindex after
> adding the index?

No, it's a regression in PG's handling of outer joins---it used to
realise that this was a possible optimisation, but now it doesn't.

Tom Lane started discussion on -hackers about this issue:

  http://archives.postgresql.org/pgsql-hackers/2009-04/msg00849.php

it looks as though performance in 8.3 is going to be bad until this
behaviour is changed.  A possible fix is to rewrite your query to work
around the problem:

  SELECT isin
  FROM (SELECT * FROM ts_frontend.attachment_isins WHERE attachment = 2698120) a
    FULL OUTER JOIN (SELECT * FROM ts_frontend.rec_isins WHERE attachment = 2698120) USING (isin)
  GROUP BY isin
  LIMIT 1000;

It looks as though what you're trying to do could also be expressed as:

  SELECT isin FROM ts_frontend.rec_isins WHERE attachment = 2698120
  UNION
  SELECT isin FROM ts_frontend.attachment_isins WHERE attachment = 2698120;

not sure if it's part of something larger so this may not be a useful
transform.

--
  Sam  http://samason.me.uk/

Re: Performance of full outer join in 8.3

From
Tom Lane
Date:
I wrote:
> =?ISO-8859-1?Q?Christian_Schr=F6der?= <cs@deriva.de> writes:
>> This is the query:
>> select isin from ts_frontend.attachment_isins full OUTER JOIN
>> ts_frontend.rec_isins using (attachment,isin)  WHERE attachment=2698120
>> GROUP BY isin limit 1000;

> Hmm.  It seems 8.3 is failing to push the attachment=2698120 condition
> down to the input relations.  Not sure why.  All that code got massively
> rewritten in 8.3, but I thought it still understood about pushing
> equalities through a full join ...

I've applied a patch for this.  It will be in 8.3.8, or if you're
in a hurry you can grab it from our CVS server or here:

http://archives.postgresql.org/message-id/20090416204228.579317540E2@cvs.postgresql.org

            regards, tom lane

Re: Performance of full outer join in 8.3

From
Grzegorz Jaśkiewicz
Date:
On Thu, Apr 16, 2009 at 9:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
> I've applied a patch for this.  It will be in 8.3.8, or if you're
> in a hurry you can grab it from our CVS server or here:
>
> http://archives.postgresql.org/message-id/20090416204228.579317540E2@cvs.postgresql.org

just out of curiosity - when was it introduced, ie - which version was
the first affected ? We're still on 8.3.5 here.

--
GJ

Re: Performance of full outer join in 8.3

From
Grzegorz Jaśkiewicz
Date:
On Fri, Apr 17, 2009 at 9:22 AM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:

> just out of curiosity - when was it introduced, ie - which version was
> the first affected ? We're still on 8.3.5 here.
(I had no idea release-notes have date), it got in by 8.3.4 (changed
right after 8.3.3 was released).


>
> --
> GJ
>



--
GJ

Re: Performance of full outer join in 8.3

From
Christian Schröder
Date:
Tom Lane wrote:
> I've applied a patch for this.  It will be in 8.3.8, or if you're
> in a hurry you can grab it from our CVS server or here:
>
Thanks a lot for your effort and the quick response!

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer