Thread: Weird 8.2.4 performance

Weird 8.2.4 performance

From
Kurt Overberg
Date:
Gang,

I'm running a mid-size production 8.0 environment.  I'd really like
to upgrade to 8.2, so I've been doing some testing to make sure my
app works well with 8.2, and I ran across this weirdness.  I set up
and configured 8.2 in the standard way, MacOSX Tiger, current
patches, download src, configure, make, make install, initdb, start
the db, create a few users, dump out my 8.0 DB (its about 13 GB raw
text), load it into 8.2.4, vacuum analyze.

This is a simple query the shows some weird behavior.  I have two
tables, task and taskinstance.  A taskinstance is tied to a campaign
through the task table (taskinstance points at task which points at
campaign).  Very simple.  To select all the taskinstances associated
with a certain campaign, I use this query:

select id from taskinstance where taskid in (select id from task
where campaignid = 75);

Now, I know this could (and should) be rewritten to not use the WHERE
x IN () style, but this is actually a sub-query to a larger query-
The bigger query was acting slow, and I've narrowed it down to this
snippet.  Task has a total of ~2000 rows, in which 11 of them belong
to campaign 75.  TaskInstance has around 650,000 rows.

This query runs great on production under 8.0 (27ms), but under 8.2.4
(on my mac) I'm seeing times in excess of 50,000ms.  Note that on
8.2.4, if I run the query again, it gets successively faster
(50,000ms->6000ms->27ms).  Is this normal?  If I change the
campaignid from 75 to another number, it jumps back to 50,000ms,
which leads me to believe that postgresql is somehow caching the
results of the query and not figuring out a better way to run the query.

Indexes:
Taskinstance has "taskid_taskinstance_key" btree (taskid)
Task has "Task_campaignId_key" btree (campaignid)

Explain Outputs:

-- 8.2


explain analyze select id from taskinstance where taskid in (select
id from task where campaignid = 75);

QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------------------------------
  Nested Loop  (cost=37.65..15068.50 rows=2301 width=4) (actual
time=99.986..50905.512 rows=881 loops=1)
    ->  HashAggregate  (cost=16.94..17.01 rows=7 width=4) (actual
time=0.213..0.236 rows=9 loops=1)
          ->  Index Scan using "Task_campaignId_key" on task
(cost=0.00..16.93 rows=7 width=4) (actual time=0.091..0.197 rows=9
loops=1)
                Index Cond: (campaignid = 76)
    ->  Bitmap Heap Scan on taskinstance  (cost=20.71..2143.26
rows=556 width=8) (actual time=421.423..5655.745 rows=98 loops=9)
          Recheck Cond: (taskinstance.taskid = task.id)
          ->  Bitmap Index Scan on taskid_taskinstance_key
(cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709
rows=196 loops=9)
                Index Cond: (taskinstance.taskid = task.id)
  Total runtime: 50907.264 ms
(9 rows)



-- 8.0

  explain analyze select id from taskinstance where taskid in (select
id from task where campaignid = 75);

QUERY PLAN
------------------------------------------------------------------------
-------------------------------------------------------------------
  Nested Loop  (cost=13.70..17288.28 rows=2640 width=4) (actual
time=0.188..21.496 rows=1599 loops=1)
    ->  HashAggregate  (cost=13.70..13.70 rows=8 width=4) (actual
time=0.153..0.217 rows=11 loops=1)
          ->  Index Scan using "Task_campaignId_key" on task
(cost=0.00..13.68 rows=8 width=4) (actual time=0.026..0.082 rows=11
loops=1)
                Index Cond: (campaignid = 75)
    ->  Index Scan using taskid_taskinstance_key on taskinstance
(cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832
rows=145 loops=11)
          Index Cond: (taskinstance.taskid = "outer".id)
  Total runtime: 27.406 ms
(7 rows)

The weird thing is that on 8.2, I don't see any sequential scans
taking place, it seems to be properly using the indexes.

If anyone has any ideas, I'd appreciate your thoughts.  This one has
got me boggled.  If I can provide any more information that would
helpful, please let me know.

Thanks for any light you could shed on my situation!

/kurt


Re: Weird 8.2.4 performance

From
Mark Kirkwood
Date:
Kurt Overberg wrote:

> Explain Outputs:
>
> -- 8.2
>
>

>    ->  Bitmap Heap Scan on taskinstance  (cost=20.71..2143.26 rows=556
> width=8) (actual time=421.423..5655.745 rows=98 loops=9)
>          Recheck Cond: (taskinstance.taskid = task.id)
>          ->  Bitmap Index Scan on taskid_taskinstance_key
> (cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709 rows=196
> loops=9)

> -- 8.0
>

>    ->  Index Scan using taskid_taskinstance_key on taskinstance
> (cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832 rows=145
> loops=11)


8.2 is deciding to use a bitmap index scan on taskid_taskinstance_key,
which seems to be slower (!) than a plain old index scan that 8.0 is
using. A dirty work around is to disable bitmap scans via:

SET enable_bitmapscan=off

but it is probably worthwhile to try to find out *why* the bitmap scan
is 1) slow and 2) chosen at all given 1).

One thought that comes to mind - is work_mem smaller on your 8.2 system
than the 8.0 one? (or in fact is it very small on both?). Also it might
be interesting to see your non-default postgresql.conf settings for both
systems.

Cheers

Mark

Re: Weird 8.2.4 performance

From
Michael Glaesemann
Date:
On Jun 6, 2007, at 18:27 , Kurt Overberg wrote:

> select id from taskinstance where taskid in (select id from task
> where campaignid = 75);
>
> Now, I know this could (and should) be rewritten to not use the
> WHERE x IN () style, but this is actually a sub-query to a larger
> query.

Granted, it won't explain why this particular query is slower in 8.2,
but it shouldn't be to hard to drop in something like

SELECT id
FROM taskinstance
NATURAL JOIN (
     SELECT id AS taskid, campaignid
     FROM tasks) t
WHERE campaignid = 75

AIUI, the planner can sometimes rewrite IN as a join, but I don't
know whether or not that's is happening in this case. I'm guessing
not as I see nested loops in the plans. (I'm a novice at reading
plans, so take this with at least a teaspoon of salt. :) )

> if I run the query again, it gets successively faster (50,000ms-
> >6000ms->27ms).  Is this normal?  If I change the campaignid from
> 75 to another number, it jumps back to 50,000ms, which leads me to
> believe that postgresql is somehow caching the results of the query
> and not figuring out a better way to run the query.

As the query is repeated, the associated rows are probably already in
memory, leading to the speedups you're seeing.

> -- 8.2

>          Recheck Cond: (taskinstance.taskid = task.id)
>          ->  Bitmap Index Scan on taskid_taskinstance_key
> (cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709
> rows=196 loops=9)
>                Index Cond: (taskinstance.taskid = task.id)


> -- 8.0

>    ->  Index Scan using taskid_taskinstance_key on taskinstance
> (cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832
> rows=145 loops=11)
>          Index Cond: (taskinstance.taskid = "outer".id)

I see that the row estimates in both of the query plans are off a
little. Perhaps increasing the statistics would help? Also, you can
see that 8.2 is using bitmap scans, which aren't available in 8.0.
Perhaps try setting enable_bitmapscan off and running the query again
to see if there's a performance difference.

> The weird thing is that on 8.2, I don't see any sequential scans
> taking place, it seems to be properly using the indexes.

As an aside, whether the planner decides to use a sequential scan or
an index has more to do with the particular query: indexes are not a
guaranteed performance win.

Hope this helps a bit.

Michael Glaesemann
grzm seespotcode net



Re: Weird 8.2.4 performance

From
Richard Huxton
Date:
Mark Kirkwood wrote:
>
> 8.2 is deciding to use a bitmap index scan on taskid_taskinstance_key,
> which seems to be slower (!) than a plain old index scan that 8.0 is
> using. A dirty work around is to disable bitmap scans via:

I'm having difficulty figuring out why it's doing this at all. There's
only one index involved, and it's over the primary-key to boot!

An EXPLAIN ANALYSE with enable_bitmapscan off should say why PG thinks
the costs are cheaper than they actually are.

PS - well worded question Kurt. All the relevant information neatly laid
out, explain analyse on both platforms - you should be charging to let
people help ;-)

--
   Richard Huxton
   Archonet Ltd

Re: Weird 8.2.4 performance

From
"Steinar H. Gunderson"
Date:
On Wed, Jun 06, 2007 at 07:27:27PM -0400, Kurt Overberg wrote:
> This query runs great on production under 8.0 (27ms), but under 8.2.4
> (on my mac) I'm seeing times in excess of 50,000ms.  Note that on
> 8.2.4, if I run the query again, it gets successively faster
> (50,000ms->6000ms->27ms).  Is this normal?

Your production server probably has all the data in your cache, and your Mac
has not. Furthermore, they seem to be running on different data sets, judging
from your EXPLAIN ANALYZE.

How big did you say these tables were?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Weird 8.2.4 performance

From
"Steinar H. Gunderson"
Date:
On Thu, Jun 07, 2007 at 11:35:27AM +0200, Steinar H. Gunderson wrote:
> How big did you say these tables were?

Sorry, you already said that -- 650k rows for one of them. If that table
doesn't fit in the cache on your Mac, you pretty much lose. From the EXPLAIN
output, it looks like it fits very nicely in cache on your server. Thus, I
don't think the difference is between 8.0 and 8.2, but rather your production
server and your test machine.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Weird 8.2.4 performance

From
Mark Kirkwood
Date:
Steinar H. Gunderson wrote:
> On Thu, Jun 07, 2007 at 11:35:27AM +0200, Steinar H. Gunderson wrote:
>
> If that table
> doesn't fit in the cache on your Mac, you pretty much lose. From the EXPLAIN
> output, it looks like it fits very nicely in cache on your server. Thus, I
> don't think the difference is between 8.0 and 8.2, but rather your production
> server and your test machine.
>

That's a good point, however its not immediately obvious that the
production server is *not* running MacOSX Tiger (or has any more
memory)... Kurt can you post the relevant specs for the the 8.0 and 8.2
boxes?

Cheers

Mark


Re: Weird 8.2.4 performance

From
Kurt Overberg
Date:
Thank you everyone for the replies.  I'll try to answer everyone's
questions in one post.

* Regarding production/mac memory and cache usage.  This query HAS
been running on 8.0 on my Mac, I just got that particular query
explain from our production system because I had to nuke my local 8.0
database before installing  8.2.4 due to disk space limitations.  The
query that this sample query is part of run in under 5 seconds when I
was running 8.0 locally on my mac, and it did a bunch of agregations
based on task instance.

* work_mem is set to 1 megabyte (the default) on both 8.0 and 8.2.4.

* setting enable_bitmapscan = false on 8.2.4

0605=# explain analyze  select id from taskinstance where taskid in
(select id from task where campaignid = 76);

                                          QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------
Nested Loop  (cost=16.94..15484.61 rows=2309 width=4) (actual
time=44.751..8498.689 rows=1117 loops=1)
    ->  HashAggregate  (cost=16.94..17.01 rows=7 width=4) (actual
time=0.144..0.194 rows=10 loops=1)
          ->  Index Scan using "Task_campaignId_key" on task
(cost=0.00..16.93 rows=7 width=4) (actual time=0.069..0.116 rows=10
loops=1)
                Index Cond: (campaignid = 51)
    ->  Index Scan using taskid_taskinstance_key on taskinstance
(cost=0.00..2202.73 rows=554 width=8) (actual time=20.305..849.640
rows=112 loops=10)
          Index Cond: (taskinstance.taskid = task.id)
Total runtime: 8499.599 ms

...FWIW, this query returns about 900 rows.  TaskInstance is a fairly
large table in width (20 columns, about 15 are varchar, 3 timestamps
and a few ints)
and height (650,000) rows.  I can't really run the same query
multiple times due to caching, so I change up "campaignid".  Is there
a way to flush that cache?  Turning off bitmap scans definitely seems
to help things, but I'm concerned that when/if I flip my production
machine, I'm going to run into who-knows-what.  I don't really have a
set of SQL acceptance tests to test jumping from rev to rev (I know I
should- BAD DEVELOPER, BAD!).

* Configuration

- My production environment is running RedHat 2.6.9.ELsmp on a server
with 16GB of memory

-  My old 8.0 database on my mac only had this modified from default:

shared_buffers =  100
work_mem = 1024

- 8.2.4 database seemed to go through some sort of auto-config when I
installed it, settings I think are different are as follows:

shared_buffers = 128MB                  # min 128kB or
max_connections*16kB
work_mem = 100MB                        # when I ran the original
query, this was set to 1MB, increased on Mark Kirkwood's advice,
seemed to help a bit but not really

8.2.4 Database size- 25 GB (from du -sh on the directory 'base')

* Richard Huxton

Thanks for the kind words- I'm glad I was able to 'ask a good
question'.  I'm very new to this mailing list, but I'm on many Java/
Struts/Perl mailing lists and have seen enough poorly worded/spelled/
asked questions to last a lifetime.  My situation is: I'm the senior
(read: first) developer at a small but growing startup.  Everything I
know about PostgreSQL I've learned over the past 4 years in which our
tiny little DB grew from one database with 100 users to over a 4 node
Slony setup 300,000 users.  Somehow, I'm not sure why, but I find
myself in the awkward position of being the 'go-to guy' for all
database related stuff at my company.  What I don't know could fill
volumes, but I've been able to keep the durn database running for
over 4 years (which is mostly a testament to how awesome PostgreSQL
is)- so when I hit something that makes no sense, I KNOW that if I
have any hope of getting one of ye postgresql gods to help me with an
obscure, non-sensical problem such as this one, I'd better include as
much context as possible.  :-)   FWIW- we're looking to hire a
PostgreSQL hired gun to help me with this and many other things.
Ideally, that person would be in Boston, MA, USA and be able to come
into the office, but we'd consider remote people too.  If you're
interested, drop me a line.

Thanks again for the replies, gang.  Have there been many reported
performance related problems regarding people upgrading from 8.0->8.2?

Is there a primer somewhere on how to read EXPLAIN output?

Thanks again for helping me with this...

/kurt



On Jun 7, 2007, at 5:23 AM, Richard Huxton wrote:

> Mark Kirkwood wrote:
>> 8.2 is deciding to use a bitmap index scan on
>> taskid_taskinstance_key, which seems to be slower (!) than a plain
>> old index scan that 8.0 is using. A dirty work around is to
>> disable bitmap scans via:
>
> I'm having difficulty figuring out why it's doing this at all.
> There's only one index involved, and it's over the primary-key to
> boot!
>
> An EXPLAIN ANALYSE with enable_bitmapscan off should say why PG
> thinks the costs are cheaper than they actually are.
>
> PS - well worded question Kurt. All the relevant information neatly
> laid out, explain analyse on both platforms - you should be
> charging to let people help ;-)
>
> --
>   Richard Huxton
>   Archonet Ltd
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that
> your
>       message can get through to the mailing list cleanly


Re: Weird 8.2.4 performance

From
"Steinar H. Gunderson"
Date:
On Thu, Jun 07, 2007 at 07:18:22AM -0400, Kurt Overberg wrote:
> - My production environment is running RedHat 2.6.9.ELsmp on a server
> with 16GB of memory

Seriously, this (the RAM amount) _is_ all the difference. (You don't say how
much RAM is in your Mac, but something tells me it's not 16GB.) If you install
8.2.4 on your server, there's no reason why the query you pasted shouldn't be
at least as fast as on 8.0.

> Is there a primer somewhere on how to read EXPLAIN output?

Yes, the documentation contains one.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Weird 8.2.4 performance

From
Tom Lane
Date:
Kurt Overberg <kurt@hotdogrecords.com> writes:
> ... Turning off bitmap scans definitely seems
> to help things,

I really seriously doubt that.  On queries like this, where each inner
scan is fetching a couple hundred rows, the small extra overhead of a
bitmap scan should easily pay for itself.  I think you're looking
entirely at caching effects that allow a re-read of the same data to
go faster.

You might try running the same query plan several times in a row and
noting the lowest time, then repeat for the other query plan.  This will
get you comparable fully-cached times, which I bet will be very close
to the same.

            regards, tom lane

Re: {Spam} Re: Weird 8.2.4 performance

From
Dimitri Fontaine
Date:
Le jeudi 07 juin 2007, Kurt Overberg a écrit :
> Is there a primer somewhere on how to read EXPLAIN output?

Those Robert Treat slides are a great reading:
  http://www.postgresql.org/communityfiles/13.sxi

Regards,
--
dim

Re: Weird 8.2.4 performance

From
Mark Kirkwood
Date:
Kurt Overberg wrote:
> work_mem = 100MB                        # when I ran the original query,
> this was set to 1MB, increased on Mark Kirkwood's advice, seemed to help
> a bit but not really
>

For future reference, be careful with this parameter, as *every*
connection will use this much memory for each sort or hash (i.e it's not
shared and can be allocated several times by each connection!)...yeah, I
know I suggested increasing it to see what effect it would have :-).

And I'd agree with Steiner and others, looks like caching effects are
the cause of the timing difference between production and the mac!

Cheers

Mark




Re: Weird 8.2.4 performance

From
"Joshua D. Drake"
Date:
Mark Kirkwood wrote:
> Kurt Overberg wrote:
>> work_mem = 100MB                        # when I ran the original
>> query, this was set to 1MB, increased on Mark Kirkwood's advice,
>> seemed to help a bit but not really
>>
>
> For future reference, be careful with this parameter, as *every*
> connection will use this much memory for each sort or hash (i.e it's not
> shared and can be allocated several times by each connection!)...yeah, I
> know I suggested increasing it to see what effect it would have :-).

This is however a parameter that can be set on the fly for the specific
query.

Joshua D. Drake


>
> And I'd agree with Steiner and others, looks like caching effects are
> the cause of the timing difference between production and the mac!
>
> Cheers
>
> Mark
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/