Re: PostgreSQL Query Speed Issues - Mailing list pgsql-novice

From Kevin Grittner
Subject Re: PostgreSQL Query Speed Issues
Date
Msg-id 1361638407.67335.YahooMailNeo@web162902.mail.bf1.yahoo.com
Whole thread Raw
In response to Re: PostgreSQL Query Speed Issues  (Joseph Pravato <joseph.pravato@nomagic.com>)
Responses Re: PostgreSQL Query Speed Issues
List pgsql-novice
Joseph Pravato <joseph.pravato@nomagic.com> wrote:
> From: "Kevin Grittner" <kgrittn@ymail.com>

> We actually only have 11.37GB on our server, so we've used 3072MB
> for shared_buffers & 8400MB for effective_cache_size.

Sounds reasonable.  Keep in mind that the actual "sweet spot" for a
lot of this configuration depends not only on the hardware, but
also the database and workload.  Once you have a performance
baseline with actual workload using these settings, trying
incremental changes and monitoring the results will help zero in on
ideal settings for you situation.  Many settings can take effect
with just a reload, and don't disrupt ongoing workload, although
shared_buffers is one which requires a restart.  You can check that
in the docs or the context column in the pg_settings view.

>> In addition, with 12GB and apparently not more than 2 million
>> rows per table, you seem very likely to have the active portion
>> of your database fully cached.  So these settings are likely to
>> help:
>>
>> seq_page_cost = 0.1
>> random_page_cost = 0.1
>> cpu_tuple_cost = 0.03

> Wow, the random_page_cost setting made a huge difference. The
> query we mentioned yesterday that takes 30 minutes is down to
> about 30 seconds. Nice speed improvement, reasonable speed, but
> still could use improvements. However, I'm a bit worried about
> changing it so drastically. Can there be any future issues from
> this change? We have 4 or 5 tables that will continue to grow
> fairly rapidly (1 million row increase every 2-3 years).

PostgreSQL uses a cost-based optimizer (rather than rules-based)
and it goes through the OS and its filesystems, so caching effects
need to be considered in setting the cost factors in order to get
the best plan for any query.  The default configuration is intended
to allow PostgreSQL to run on a low-end laptop, so that people
don't have the unfortunate experience of installing it and not
being able to get it to even start.  Tuning is required for serious
production work.  The original configuration assumed fairly heavy
disk access when reading data, so if that's not the case for your
environment, you need to make adjustments like the above to more
accurately model the costs of each possible plan.  If the cache hit
rate falls dramatically, you will want to either add RAM to keep
data cached, or adjust the cost factors so that it will try to
minimize slow disk access.

>> If you haven't already done so, run VACUUM ANALYZE at the
>> database level.  If most of your data was loaded at about the
>> same time, run VACUUM FREEZE ANALYZE instead, to prevent a
>> read-and-rewrite pass of your entire database at peak OLTP load.

> The vacuuming didn't seem to change any performances, and it
> didn't take very long to run. Autovacuum may have already taken
> care of most potential issues from this.

That's good, but if most of the data was loaded at about the same
time, you may want to run VACUUM FREEZE ANALYZE during an off-peak
period.  Without that, there will be an autovacuum run at some
point which will be more aggressive than usual, and may affect
performance enought to notice.  A manual run will let you pick when
to do this more aggressive maintenance.

>> Your plan shows an index scan with no index conditions to pass
>> every row in a table, which is much slower than a seqscan.  Did
>> you turn off enable_seqscan [...]?

> No, enable_seqscan is still turned on.

That plan choice strikes me as very odd, and not likely to be
optimal.  The only other things that I can think of which might
cause this plan choice would be if seq_page_cost is higher than
random_page_cost, or if the table has a lot of dead space in it.
Could you show EXPLAIN ANALYZE output for the current settings,
along with the output of running this?:

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');
SELECT oid, relname, relpages, reltuples FROM pg_class
  WHERE relname = 'contact';
SELECT * FROM pg_stat_user_tables WHERE relname = 'contact';

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-novice by date:

Previous
From: Joseph Pravato
Date:
Subject: Re: PostgreSQL Query Speed Issues
Next
From: Wolfgang Keller
Date:
Subject: Re: minimum hardware requirements for small postgres db