Thread: Table Partitions / Partial Indexes

Table Partitions / Partial Indexes

From
Mike C
Date:
Hello,

I've got a table with ~60 Million rows and am having performance problems querying it. Disks are setup as 4x10K SCSI 76GB, RAID 1+0. The table is being inserted into multiple times every second of the day, with no updates and every 2nd day we delete 1/60th of the data (as it becomes old). Vacuum analyze is scheduled to run 3 times a day.

Query:

select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 desc

Table:

CREATE TABLE PC_TRAFFIC (
     PK_PC_TRAFFIC  INTEGER NOT NULL,
     TRAFFIC_DATE   TIMESTAMP NOT NULL,
     SOURCE_MAC     CHAR(20) NOT NULL,
     DEST_IP        CHAR(15),
     DEST_PORT      INTEGER,
     TOTAL_TO       DOUBLE PRECISION,
     TOTAL_FROM     DOUBLE PRECISION,
     FK_DEVICE      SMALLINT,
     PROTOCOL_TYPE  SMALLINT
);

CREATE INDEX pc_traffic_pkidx ON pc_traffic (pk_pc_traffic);
CREATE INDEX pc_traffic_idx3 ON pc_traffic (fk_device, traffic_date);

Plan:
Sort  (cost=76650.58..76650.58 rows=2 width=40)
   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=76650.54..76650.57 rows=2 width=40)
         ->  Bitmap Heap Scan on pc_traffic  (cost=534.64..76327.03 rows=43134 width=40)
               Recheck Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
               ->  Bitmap Index Scan on pc_traffic_idx3  (cost=0.00..534.64 rows=43134 width=0)
                     Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
(7 rows)

CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance:

Sort  (cost=39886.65..39886.66 rows=2 width=40)
   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=39886.61..39886.64 rows=2 width=40)
         ->  Index Scan using pc_traffic_idx3 on pc_traffic  (cost=0.00..39551.26 rows=44714 width=40)
               Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
(5 rows)

However the clustering is only effective on the first shot. Because of the constant usage of the table we can't perform a vacuum full nor any exclusive lock function.

Would table partitioning/partial indexes help much? Partitioning on date range doesn't make much sense for this setup, where a typical 1-month query spans both tables (as the billing month for the customer might start midway through a calendar month).

Noting that the index scan was quicker than the bitmap, I'm trying to make the indexes smaller/more likely to index scan. I have tried partitioning against fk_device, with 10 child tables. I'm using fk_device % 10 = 1, fk_device % 10 = 2, fk_device % 10 = 3, etc... as the check constraint.

CREATE TABLE pc_traffic_0 (CHECK(FK_DEVICE % 10 = 0)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_1 (CHECK(FK_DEVICE % 10 = 1)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_2 (CHECK(FK_DEVICE % 10 = 2)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_3 (CHECK(FK_DEVICE % 10 = 3)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_4 (CHECK(FK_DEVICE % 10 = 4)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_5 (CHECK(FK_DEVICE % 10 = 5)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_6 (CHECK(FK_DEVICE % 10 = 6)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_7 (CHECK(FK_DEVICE % 10 = 7)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_8 (CHECK(FK_DEVICE % 10 = 8)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_9 (CHECK(FK_DEVICE % 10 = 9)) INHERITS (pc_traffic);

... indexes now look like:
CREATE INDEX pc_traffic_6_idx3 ON pc_traffic_6 (fk_device, traffic_date);

To take advantage of the query my SQL now has to include the mod operation (so the query planner picks up the correct child tables):

select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and FK_DEVICE % 10 = 6 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 desc

Sorry I would show the plan but I'm rebuilding the dev database atm. It was faster though and did pick up the correct child table. It was also a bitmap scan on the index IIRC.

Would I be better off creating many partial indexes instead of multiple tables AND multiple indexes?
Am I using a horrid method for partitioning the data? (% 10)
Should there be that big of an improvement for multiple tables given that all the data is still stored on the same filesystem?
Any advice on table splitting much appreciated.

Cheers,

Mike C.

Re: Table Partitions / Partial Indexes

From
Tom Lane
Date:
Mike C <smith.not.western@gmail.com> writes:
> CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance:

How can you tell?  Neither of these are EXPLAIN ANALYZE output.

            regards, tom lane

Re: Table Partitions / Partial Indexes

From
Mike C
Date:
On 12/12/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mike C <smith.not.western@gmail.com> writes:
> CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance:

How can you tell?  Neither of these are EXPLAIN ANALYZE output.

                        regards, tom lane


Sorry that's a result of my bad record keeping. I've been keeping records of the explain but not the analyze. IIRC the times dropped from ~25 seconds down to ~8 seconds (using analyze).

Regards,

Mike


Re: Table Partitions / Partial Indexes

From
Simon Riggs
Date:
On Mon, 2005-12-12 at 15:07 +1300, Mike C wrote:

> Partitioning on date range doesn't make much sense for this setup,
> where a typical 1-month query spans both tables (as the billing month
> for the customer might start midway through a calendar month).

Maybe not for queries, but if you use a date range then you never need
to run a DELETE and never need to VACUUM.

You could split the data into two-day chunks.

> Am I using a horrid method for partitioning the data? (% 10)

No, but what benefit do you think it provides. I'm not sure I see...

> Should there be that big of an improvement for multiple tables given
> that all the data is still stored on the same filesystem?

You could store partitions in separate tablespaces/filesystems.

Best Regards, Simon Riggs


Re: Table Partitions / Partial Indexes

From
Mike C
Date:
On 12/14/05, Simon Riggs <simon@2ndquadrant.com> wrote:
Maybe not for queries, but if you use a date range then you never need
to run a DELETE and never need to VACUUM.

You could split the data into two-day chunks.

That's an interesting idea, thanks.

> Am I using a horrid method for partitioning the data? (% 10)

No, but what benefit do you think it provides. I'm not sure I see...

I was trying to get both the indexes to be smaller without loosing selectivity, and make any table scans/index scans faster from having to read less data.

> Should there be that big of an improvement for multiple tables given
> that all the data is still stored on the same filesystem?

You could store partitions in separate tablespaces/filesystems.

Ideally that's what I would do, but to make the most of that I would have to have a dedicated RAID setup for each partition right? (Which is a bit pricey for the budget).

Cheers,

Mike