Thread: Performance Tuning Question

Performance Tuning Question

From
Brian Hirt
Date:
I'm looking for some suggestions on database tuning.  I've  looked on
the postgres site and google and found very little in-depth information
on tuning.

I have a database (7.2.2) that gets a lot of read access and very few
updates/inserts.  The server (rh73 dual xeon 2ghz) running this database
has 2gb of memory and is only running postgres.  The database itself is
about 1.5gb (via du -sk), with a lot of that 1.5gb not being part of the
active dataset.  We find that there is almost no IO on this machine.
The small amount of I/O is because of the infrequent writes and the
aggressive disk caching of the linux kernel.

It seems the planner tries to avoid I/O so much that the default tuning
parameters works against us a bit.  i've tried a few changes here and
there, but without much luck since i don't really know what to change
tho values to.

One of the things I see over and over again is the planner picking a seq
scan over an index scan. And practically always, when I force a index
scan and use explain analyze the index scan would have been faster.
I've heard the explanation be that at some point it's cheaper to do a
scan instead of using the index.  I think that assumption might be based
on IO estimates.

I can just give one example here that's indicative of what I'm seeing
over and over.  The two explain outputs are below, and both are
executing without any I/O.  The table has 12904 rows, the plan estimates
959 rows (about 7.4% of table) and actually only 639 (~ 5%) are
fetched.  The table scan consistently takes 50 times longer to execute.
I see this over and over and over.  I know a few hundred msec here and
there seems small, but this machine is performing at least a few million
queries a day -- it adds up.

Any advice or pointers would be greatly appreciated.

--thanks

basement=# explain analyze select count(*) from game_cover where
cover_scan_of_id = 6;
NOTICE:  QUERY PLAN:

Aggregate  (cost=1970.70..1970.70 rows=1 width=0) (actual
time=121.07..121.07 rows=1 loops=1)
  ->  Seq Scan on game_cover  (cost=0.00..1968.30 rows=959 width=0)
(actual time=0.13..120.56 rows=639 loops=1)
Total runtime: 121.14 msec

and

basement=# set enable_seqscan TO false;
SET VARIABLE
basement=# explain analyze select count(*) from game_cover where
cover_scan_of_id = 6;
NOTICE:  QUERY PLAN:

Aggregate  (cost=2490.66..2490.66 rows=1 width=0) (actual
time=2.45..2.45 rows=1 loops=1)
  ->  Index Scan using game_cover_scan_of on game_cover
(cost=0.00..2488.26 rows=959 width=0) (actual time=0.12..2.03 rows=639
loops=1)
Total runtime: 2.54 msec





Re: Performance Tuning Question

From
Martijn van Oosterhout
Date:
On Sun, Sep 08, 2002 at 11:04:31PM -0600, Brian Hirt wrote:
> It seems the planner tries to avoid I/O so much that the default tuning
> parameters works against us a bit.  i've tried a few changes here and
> there, but without much luck since i don't really know what to change
> tho values to.

Why is this a bad thing? The less IO the better, right?

> One of the things I see over and over again is the planner picking a seq
> scan over an index scan. And practically always, when I force a index
> scan and use explain analyze the index scan would have been faster.
> I've heard the explanation be that at some point it's cheaper to do a
> scan instead of using the index.  I think that assumption might be based
> on IO estimates.

There are values somewhere to estimate the amount of cache to estimate for.
I beleive SHOW ALL will show all tunable parameters.

> I can just give one example here that's indicative of what I'm seeing
> over and over.  The two explain outputs are below, and both are
> executing without any I/O.  The table has 12904 rows, the plan estimates
> 959 rows (about 7.4% of table) and actually only 639 (~ 5%) are
> fetched.  The table scan consistently takes 50 times longer to execute.
> I see this over and over and over.  I know a few hundred msec here and
> there seems small, but this machine is performing at least a few million
> queries a day -- it adds up.

Is there any clustering going on? Also, I'm assuming you have run VACUUM
ANALYZE over all the relevent tables. If possible, could you post the result
of:

select * from pg_stats where tablename = 'game_cover';

Hope this helps.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

Re: Performance Tuning Question

From
Brian Hirt
Date:
On Mon, 2002-09-09 at 03:21, Martijn van Oosterhout wrote:
> On Sun, Sep 08, 2002 at 11:04:31PM -0600, Brian Hirt wrote:
> > It seems the planner tries to avoid I/O so much that the default tuning
> > parameters works against us a bit.  i've tried a few changes here and
> > there, but without much luck since i don't really know what to change
> > tho values to.
>
> Why is this a bad thing? The less IO the better, right?

If there was actually IO going on, then yes I agree.

I don't know the internals of postgres, but I can easily think of a
hypothetical situation where more IO would be better (in my
environment).  Say for instance, the planner picks a slower method for
performing a query (table scan) because it expects to do less IO than
using an Index which has to read the index and then also pages from the
table.  Now, if you took disk caching out of the picture, the avoided IO
would more than make up for the slower query plan.  However, since all
of my data is cached in memory, the scan is actually much slower since
there actually is no I/O.   This was the whole angle of my question.  I
seem to recall from the hackers list that there are parameters to help
the planner estimate how expensive I/O is, but I don't really know how
to set it.   This is what I'm interested.  I want to planner to put less
emphases on IO since my database remain mostly in memory and IO is
usually not very expensive.

>
> > One of the things I see over and over again is the planner picking a seq
> > scan over an index scan. And practically always, when I force a index
> > scan and use explain analyze the index scan would have been faster.
> > I've heard the explanation be that at some point it's cheaper to do a
> > scan instead of using the index.  I think that assumption might be based
> > on IO estimates.
>
> There are values somewhere to estimate the amount of cache to estimate for.
> I beleive SHOW ALL will show all tunable parameters.
>

I know this, but i cannot find any good reference how to tune them and
how changing them.
http://www.postgresql.org/docs/momjian/hw_performance is basically skips
over all this and give a high level talk mainly about UNIX memory
management.
http://www.postgresql.org/idocs/index.php?performance-tips.html also
doesn't talk about this.
http://www.postgresql.org/idocs/index.php?runtime-config.html actually
lists the options that can be used for tuning but, but that's about it.
I don't really see any other documents about performance tuning.  If you
know some other turning docs, I would love to see them.

FYI: I did set effective_cache_size to 1.5gb, but the estimated cost for
the query using the index remained exactly the same.

> > I can just give one example here that's indicative of what I'm seeing
> > over and over.  The two explain outputs are below, and both are
> > executing without any I/O.  The table has 12904 rows, the plan estimates
> > 959 rows (about 7.4% of table) and actually only 639 (~ 5%) are
> > fetched.  The table scan consistently takes 50 times longer to execute.
> > I see this over and over and over.  I know a few hundred msec here and
> > there seems small, but this machine is performing at least a few million
> > queries a day -- it adds up.
>
> Is there any clustering going on? Also, I'm assuming you have run VACUUM
> ANALYZE over all the relevent tables. If possible, could you post the result
> of:

Yup, vacuum analyze get's run on the database frequently, and for the
test case I quoted, I make sure I ran it before the queries ran.

>
> select * from pg_stats where tablename = 'game_cover';
>
> Hope this helps.
>
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > There are 10 kinds of people in the world, those that can do binary
> > arithmetic and those that can't.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org



Re: Performance Tuning Question

From
Andrew Sullivan
Date:
On Mon, Sep 09, 2002 at 09:41:30AM -0600, Brian Hirt wrote:
>
> I know this, but i cannot find any good reference how to tune them and
> how changing them.

[. . .]

> FYI: I did set effective_cache_size to 1.5gb, but the estimated cost for
> the query using the index remained exactly the same.

You probably want to twiddle CPU_INDEX_TUPLE_COST (floating point)
and RANDOM_PAGE_COST (floating point).  But note what the docs say
about these items:

Note: Unfortunately, there is no well-defined method of determining
ideal values for the family of "COST" variables that were just
described. You are encouraged to experiment and share your findings.

If you have any luck, it'd be nice to know what you discover.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Performance Tuning Question

From
Tom Lane
Date:
Brian Hirt <bhirt@mobygames.com> writes:
> On Mon, 2002-09-09 at 03:21, Martijn van Oosterhout wrote:
>> On Sun, Sep 08, 2002 at 11:04:31PM -0600, Brian Hirt wrote:
> It seems the planner tries to avoid I/O so much that the default tuning
> parameters works against us a bit.  i've tried a few changes here and
> there, but without much luck since i don't really know what to change
> tho values to.
>>
>> Why is this a bad thing? The less IO the better, right?

> If there was actually IO going on, then yes I agree.

Yeah.  For a situation where the database is effectively 100% cached
in memory by the kernel, it'd be reasonable to set random_page_cost
to 1 (or maybe a little more depending on how much you believe the
"100% cached" approximation).  The actual cost of pulling a cached page
from kernel buffer to PG buffer should be the same whether the access
is sequential or not.  It's only when you have to physically go to the
disk that there's likely to be a penalty for random access.

Anyway, RANDOM_PAGE_COST is the thing to be tweaking.

            regards, tom lane