Home > mailing lists

Table Partitions / Partial Indexes - Mailing list pgsql-performance

From	Mike C
Subject	Table Partitions / Partial Indexes
Date	December 11, 2005 22:07:59
Msg-id	bd0eabd0512111807q6b54115q2477c7c2a37f1c1a@mail.gmail.com Whole thread Raw
Responses	Re: Table Partitions / Partial Indexes Re: Table Partitions / Partial Indexes
List	pgsql-performance

Tree view

Hello,

I've got a table with ~60 Million rows and am having performance problems querying it. Disks are setup as 4x10K SCSI 76GB, RAID 1+0. The table is being inserted into multiple times every second of the day, with no updates and every 2nd day we delete 1/60th of the data (as it becomes old). Vacuum analyze is scheduled to run 3 times a day.

Query:

select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 desc

Table:

CREATE TABLE PC_TRAFFIC (
     PK_PC_TRAFFIC  INTEGER NOT NULL,
     TRAFFIC_DATE   TIMESTAMP NOT NULL,
     SOURCE_MAC     CHAR(20) NOT NULL,
     DEST_IP        CHAR(15),
     DEST_PORT      INTEGER,
     TOTAL_TO       DOUBLE PRECISION,
     TOTAL_FROM     DOUBLE PRECISION,
     FK_DEVICE      SMALLINT,
     PROTOCOL_TYPE  SMALLINT
);

CREATE INDEX pc_traffic_pkidx ON pc_traffic (pk_pc_traffic);
CREATE INDEX pc_traffic_idx3 ON pc_traffic (fk_device, traffic_date);

Plan:
Sort  (cost=76650.58..76650.58 rows=2 width=40)
   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=76650.54..76650.57 rows=2 width=40)
         ->  Bitmap Heap Scan on pc_traffic  (cost=534.64..76327.03 rows=43134 width=40)
               Recheck Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
               ->  Bitmap Index Scan on pc_traffic_idx3  (cost=0.00..534.64 rows=43134 width=0)
                     Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
(7 rows)

CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance:

Sort  (cost=39886.65..39886.66 rows=2 width=40)
   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=39886.61..39886.64 rows=2 width=40)
         ->  Index Scan using pc_traffic_idx3 on pc_traffic  (cost=0.00..39551.26 rows=44714 width=40)
               Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
(5 rows)

However the clustering is only effective on the first shot. Because of the constant usage of the table we can't perform a vacuum full nor any exclusive lock function.

Would table partitioning/partial indexes help much? Partitioning on date range doesn't make much sense for this setup, where a typical 1-month query spans both tables (as the billing month for the customer might start midway through a calendar month).

Noting that the index scan was quicker than the bitmap, I'm trying to make the indexes smaller/more likely to index scan. I have tried partitioning against fk_device, with 10 child tables. I'm using fk_device % 10 = 1, fk_device % 10 = 2, fk_device % 10 = 3, etc... as the check constraint.

CREATE TABLE pc_traffic_0 (CHECK(FK_DEVICE % 10 = 0)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_1 (CHECK(FK_DEVICE % 10 = 1)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_2 (CHECK(FK_DEVICE % 10 = 2)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_3 (CHECK(FK_DEVICE % 10 = 3)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_4 (CHECK(FK_DEVICE % 10 = 4)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_5 (CHECK(FK_DEVICE % 10 = 5)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_6 (CHECK(FK_DEVICE % 10 = 6)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_7 (CHECK(FK_DEVICE % 10 = 7)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_8 (CHECK(FK_DEVICE % 10 = 8)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_9 (CHECK(FK_DEVICE % 10 = 9)) INHERITS (pc_traffic);

... indexes now look like:
CREATE INDEX pc_traffic_6_idx3 ON pc_traffic_6 (fk_device, traffic_date);

To take advantage of the query my SQL now has to include the mod operation (so the query planner picks up the correct child tables):

select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and FK_DEVICE % 10 = 6 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 desc

Sorry I would show the plan but I'm rebuilding the dev database atm. It was faster though and did pick up the correct child table. It was also a bitmap scan on the index IIRC.

Would I be better off creating many partial indexes instead of multiple tables AND multiple indexes?
Am I using a horrid method for partitioning the data? (% 10)
Should there be that big of an improvement for multiple tables given that all the data is still stored on the same filesystem?
Any advice on table splitting much appreciated.

Cheers,

Mike C.

pgsql-performance by date:

From: "Luke Lonergan"
Date: 11 December 2005, 17:15:42
Subject: Re: Should Oracle outperform PostgreSQL on a complex

From: Tom Lane
Date: 11 December 2005, 22:39:11
Subject: Re: Table Partitions / Partial Indexes

Table Partitions / Partial Indexes - Mailing list pgsql-performance

Previous

Next