Re: The need for clustered indexes to boost TPC-V performance - Mailing list pgsql-performance

From Craig Ringer
Subject Re: The need for clustered indexes to boost TPC-V performance
Date
Msg-id 4FF63553.4080300@ringerc.id.au
Whole thread Raw
In response to The need for clustered indexes to boost TPC-V performance  (Reza Taheri <rtaheri@vmware.com>)
Responses Re: The need for clustered indexes to boost TPC-V performance
List pgsql-performance
First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out as a final release soon enough, and index only scans may make a big difference for the problem you're currently having.

Looking at your configuration I have a few comments, but it's worth noting that I don't work with hardware at that scale, and I'm more used to tuning I/O bottlenecked systems with onboard storage rather than CPU-bottlenecked ones on big SANs. Hopefully now that you've posted your configuration and setup there might be interest from others.

If you're able to post an EXPLAIN ANALYZE or two for a query you feel is slow that certainly won't hurt. Using http://explain.depesz.com/ saves you the hassle of dealing with word-wrapping when posting them, btw.

As for your config:

I notice that your autovacuum settings are at their defaults. With heavy UPDATE / DELETE load this'll tend to lead to table and index bloat, so the DB has to scan more useless data to get what it needs. It also means table stats won't be maintained as well, potentially leading to poor planner decisions. The following fairly scary query can help identify bloat, as the database server doesn't currently have anything much built in to help you spot such issues:

   http://wiki.postgresql.org/wiki/Show_database_bloat

It might be helpful to set effective_cache_size and effective_io_concurrency so Pg has more idea of the scale of your hardware. The defaults are very conservative - it's supposed to be easy for people to use for simple things without melting their systems, and it's expected that anyone doing bigger work will tune the database.

http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html

It looks like you've already tweaked many of the critical points for big installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I lack the big hardware experience to know if they're appropriate, but they're not the extremely conservative defaults, which is a start.

Your random_page_cost and seq_page_cost are probably dead wrong for a SAN with RAM and SSD cache in front of fast disks. Their defaults are for local uncached spinning HDD media where seeks are expensive. The typical advice on such hardware is to set them to something more like seq_page_cost = 0.1  random_page_cost  = 0.15 - ie cheaper relative to the cpu cost, and with random I/O only a little more expensive than sequential I/O. What's right for your situation varies a bit based on DB size vs hardware size, etc; Greg discusses this more in his book.

What isolation level do your transactions use? This is significant because of the move to true serializable isolation with predicate locking in 9.0; it made serializable transactions a bit slower in some circumstances in exchange for much stronger correctness guarantees. The READ COMMITTED default was unchanged.



It also looks like you might not have seen the second part of my earlier reply:


while PostgreSQL doesn't support covering indexes or clustered indexes at this point, 9.2 has added support for index-only scans, which are a half-way point of sorts. See:

  http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
  http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
  http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this PostgreSQL 9.2 enhancement. It should make a big difference, and if it doesn't it's important to know why.

(CC'd Robert Haas)



As for the "invest" side - that's really a matter for EnterpriseDB, Command Prompt, Red Hat, and the other backers who're employing people to work on the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good explanation of the current state and progress toward clustered indexes.

Some links that may be useful to you are:

  http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. Surprisingly, covering/clustered indexes aren't on there or at least aren't easily found. It's certainly a much-desired feature despite its apparent absence from the TODO.

  http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
  http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer




pgsql-performance by date:

Previous
From: Andy Colson
Date:
Subject: Re: The need for clustered indexes to boost TPC-V performance
Next
From: Mark Kirkwood
Date:
Subject: SSDs again, LSI Warpdrive 2 anyone?