Re: The need for clustered indexes to boost TPC-V performance - Mailing list pgsql-performance

From Craig Ringer
Subject Re: The need for clustered indexes to boost TPC-V performance
Date
Msg-id 4FF3D80F.9040408@ringerc.id.au
Whole thread Raw
In response to The need for clustered indexes to boost TPC-V performance  (Reza Taheri <rtaheri@vmware.com>)
Responses Re: The need for clustered indexes to boost TPC-V performance  (Thomas Kellerer <spam_eater@gmx.net>)
Re: The need for clustered indexes to boost TPC-V performance  (Daniel Farina <daniel@heroku.com>)
List pgsql-performance
On 07/04/2012 07:13 AM, Reza Taheri wrote:

Following the earlier email introducing the TPC-V benchmark, and that we are developing an industry standard benchmarking kit for TPC-V using PostgreSQL, here is a specific performance issue we have run into.


Which version of PostgreSQL are you using?

How has it been tuned beyond the defaults - autovacuum settings, shared_buffers, effective_cache_size, WAL settings, etc?

How much RAM is on the blade? What OS and version are on the blade?

Comparing the table sizes, we are close to 2X larger (more on this in a later note). But the index size is what stands out. Our overall index usage (again, after accounting for different numbers of rows) is 4.8X times larger. 35% of our I/Os are to the index space. I am guessing that the 4.8X ballooning has something to do with this, and that in itself explains a lot about our high I/O rate, as well as higher CPU/tran cycles compared to MS SQL (we are  2.5-3 times slower).

This is making me wonder about bloat issues and whether proper vacuuming is being done. If the visibility map and free space map aren't maintained by proper vaccum operation everything gets messy, fast.

Well, MS SQL used a “clustered index” for CT, i.e., the data is held in the leaf pages of the index B-Tree. The data and index are in one data structure. Once you lookup the index, you also have the data at zero additional cost.

[snip]

Is the PGSQL community willing to invest in a feature that a) has been requested by many others already; and b) can make a huge difference in a benchmark that can lend substantial credibility to PGSQL performance?


while PostgreSQL doesn't support covering indexes or clustered indexes at this point, 9.2 has added support for index-only scans, which are a half-way point of sorts. See:

  http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
  http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
  http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this PostgreSQL 9.2 enhancement. It should make a big difference, and if it doesn't it's important to know why.

(CC'd Robert Haas)

I'm not sure what the best option for getting a 9.2 beta build for Windows is.


As for the "invest" side - that's really a matter for EnterpriseDB, Command Prompt, Red Hat, and the other backers who're employing people to work on the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good explanation of the current state and progress toward clustered indexes.

Some links that may be useful to you are:

  http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. Surprisingly, covering/clustered indexes aren't on there or at least aren't easily found. It's certainly a much-desired feature despite its apparent absence from the TODO.

  http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
  http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Introducing the TPC-V benchmark, and its relationship to PostgreSQL
Next
From: Thomas Kellerer
Date:
Subject: Re: The need for clustered indexes to boost TPC-V performance