Re: Partial indexes instead of partitions - Mailing list pgsql-general
From | Leonardo F |
---|---|
Subject | Re: Partial indexes instead of partitions |
Date | |
Msg-id | 167510.21073.qm@web29020.mail.ird.yahoo.com Whole thread Raw |
In response to | Re: Partial indexes instead of partitions (Sergey Konoplev <gray.ru@gmail.com>) |
Responses |
Re: Partial indexes instead of partitions
|
List | pgsql-general |
> Well the situation is still ambiguous > so: > Is it possible to provide this table and indexes definitions? > And it > would be great it you describe the queries you are going to do > on this table > or just provide the SQL. Sure! Basically what I'm trying to do is to partition the index in the table where the data is going to be inserted into smaller indexes, but without using partitions: I would use partial indexes. "Historic" data will have just the big index... say that I want to store 1G rows: 100M per day, 10 days. I would have 10 tables, 9 of them with 2 big indexes (the indexes on the 2 columns that are going to be used in queries together with the timestamp) and the latest one with 24*2 smaller indexes (so that insertion will still be fast) to be dropped overnight after the 2 big indexes have been created... then a new table is created (for the new day's data) with the small indexes and the oldest table dropped (as I said, I won't store more than 10 days). This is "pseudo SQL": CREATE TABLE master ( ts timestamp, key1 bigint, <-- populated with almost-random values key2 bigint, <-- populated with almost-random values data1 varchar(20), [...] ); CREATE TABLE master_01 ( CHECK ( ts >= DATE '2006-03-01' AND ts < DATE '2006-03-02' ) ) INHERITS (master); CREATE INDEX master_01_ix1 ON master_01 (key1); CREATE INDEX master_01_ix2 ON master_01 (key2) CREATE TABLE master_02 ( CHECK ( ts >= DATE '2006-03-02' AND ts < DATE '2006-03-03' ) ) INHERITS (master); CREATE INDEX master_02_ix1 ON master_02 (key1); CREATE INDEX master_02_ix2 ON master_02 (key2) [10 tables like the above...] With this config insertion on the "today's" table will be slow at the end of the day, because updating the 2 indexes will be very slow (they will be getting very large). So I thought I could make, on "today's table", instead of the 2 indexes on the whole table, something like: CREATE INDEX master_10_1_ix1 ON master_10 (key1) WHERE (ts >= DATETIME '2006-03-10 00:00' and ts < DATETIME '2006-03-10 01:00') (same thing for second indexed column) CREATE INDEX master_10_2_ix1 ON master_10 (key1) WHERE (ts >= DATETIME '2006-03-10 01:00' and ts < DATETIME '2006-03-10 02:00') (same thing for second indexed column) [other 22 indexes definition like the above, one per hour...] That is, the table where data will be inserted (ts will always be ascending, so I will always insert data in the latest table) will have multiple small indexes. Then, at night, the small indexes would be dropped after one big index has been created (since no more rows will be inserted in that table, I don't care if the index is big). So, a query like: select * from master where key1=938479 and ts between now() and "now()-10 minutes" would use the proper index on the "today's" table; a query like: select * from master where key1=938479 and ts between "3 days ago" and "2 days ago" would use the indexes in table "today minus 2 days" and "today minus 3 days"
pgsql-general by date: