Re: PostgreSQL vs. MySQL - Mailing list pgsql-performance

From Stephan Szabo
Subject Re: PostgreSQL vs. MySQL
Date
Msg-id 20030704094307.B21273-100000@megazone23.bigpanda.com
Whole thread Raw
In response to Re: PostgreSQL vs. MySQL  ("Brian Tarbox" <btarbox@theworld.com>)
Responses Re: PostgreSQL vs. MySQL
List pgsql-performance
On Fri, 4 Jul 2003, Brian Tarbox wrote:

> > I don't think Brian has any interest in being helped.
> >I suspect he'd made up his mind already.
>
>
> With all due respect Tom, I don't think I'm the one demonstrating a closed
> mind.
> Rather than trying to figure out whats going on in my head, how about
> figuring out whats going on in my database?  :-)

Well, in the case of getting a sequential scan on something like
select * from foo where col=10;
where col is a primary key, the things I can think of to check
are does select * from foo where col='10'; give a different plan?

In general for cases where you can't post queries or schema we're kinda
stuck and not really able to give intelligent advice since it's often
schema/query specific, so the general questions/comments are things like
(which you've probably heard, but I think they should get put into this
thread if only to move the thread towards usefulness)

What is the relative costs/plan if you set enable_seqscan to false before
explain analyzing the query?  If there are places you think that it should
be able to do an index scan and it still doesn't, make sure that there
aren't cross datatype issues (especially with int constants).

Also, using explain analyze, where is the time being taken, it's often not
where the cost factor would expect it.

Do the row estimates match reality in the explain analyze output, if not
does analyzing help, if not does raising the statistics target (to say 50,
100, 1000) with alter table and then analyzing help?

Does vacuuming help, what about vacuum full?  If the latter does and the
former doesn't, you may need to look at raising the fsm settings.

If shared_buffers is less than 1000, does setting it to something between
1000-8000 raise performance?

How much memory does the machine have that's being used for caching, if
it's alot, try raising effective_cache_size to see if that helps the
choice of plan by making a more reasonable guess as to cache hit rates.

Are there any sorts in the query, if so, how large would expect the result
set that's being sorted to be, can you afford to make sort_mem cover that
(either permanently by changing conf files or before the query with a set
command)?

Is it possible to avoid some sorts in the plan with a multi-column index?

For 7.3 and earlier, does the query use IN or =ANY, if so it might help to
try to convert to an exists form.

Does the query use any mix/max aggregates, it might help to look for a
workaround, this is one case that is truly slow.



PostgreSQL really does require more than minimal optimization at start,
effective_cache_size, shared_buffers, sort_mem and the fsm settings really
need to be set at a level for the machine/queries you have.  Without the
queries we can't be too specific. Big speed losses I can think of are the
datatype mismatch confusion, followed quickly by row estimates that don't
match reality (generally requiring a greater statistics target on the
column) and issues with correlation (I'm not really sure there's a good
solution for this currently, maybe someone will know -- I've not run into
it really on anything I've looked at).


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: [pgsql-advocacy] About the default performance
Next
From: Andrew Sullivan
Date:
Subject: Re: PostgreSQL vs. MySQL