More tablescanning fun - Mailing list pgsql-performance

From Jim C. Nasby
Subject More tablescanning fun
Date
Msg-id 20030424183817.A66185@flake.decibel.org
Whole thread Raw
Responses Re: More tablescanning fun  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On this table

 project_id | integer | not null
 id         | integer | not null
 date       | date    | not null
 team_id    | integer | not null
 work_units | bigint  | not null
Indexes: email_contrib_pkey primary key btree (project_id, id, date)

with this breakdown of data

 project_id |  count
------------+----------
          5 | 56427141
          8 |  1058843
         24 |   361595
         25 |  4092575
        205 | 58512516

Any kind of operation on an entire project wants to tablescan, even
though it's going to take way longer.

explain analyze select sum(work_units) from email_contrib where
project_id=8;

Index scan  126, 56, 55 seconds
Seq. scan   1517, 850, 897 seconds

It seems like the metrics used for the cost of index scanning v. table
scanning on large tables need to be revisited. It might be such a huge
difference in this case because the table is essentially clustered on
the primary key. I can test this by doing an aggregate for, say, a
specific team_id, which would be pretty well spread across the entire
table, but that'll have to wait a bit.

Anyone have any thoughts on this? Also, is there a TODO to impliment
real clustered indexes? Doing stuff by project_id on this table in
sybase was very efficient, because there was a real clustered index on
the PK. By clustered index, I mean an index where the leaf nodes of the
B-tree were the actual table rows. This means the only overhead in going
through the index is scanning the branches, which in this case would be
pretty light-weight.

Is this something that I should be using some PGSQL-specific feature
for, like inheritance?

I've been really happy so far with PGSQL (comming from Sybase and DB2),
but it seems there's still some pretty big performance issues that want
to be addressed (or I should say performance issues that hurt really big
when you hit them :) ).
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


pgsql-performance by date:

Previous
From: Frederic Jolliton
Date:
Subject: Re: Important speed difference between a query and a
Next
From: Tom Lane
Date:
Subject: Re: More tablescanning fun