Thread: Partial indexes instead of partitions
HI all, I have a very big table (2000 inserts per sec, I have to store 20 days of data). The table has 2 indexes, in columns that have almost-random values. Since keeping those two indexes up-to-date can't be done (updating 2000 times per second 2 indexes with random values on such a huge table is impossible) I thought that partitioning was the way to go. Now I think I have 2 options: a) create 480 partitions, 1 for each hour of the day. 2 indexes on each partition b) create 20 partitions, and create 24*2 partial indexes on the current partition; then the next day (overnight) create 2 global indexes for the table and drop the 24*2 indexes... I thought about option b) because I don't like the fact that the planner takes "ages" to plan a query in case there are 480 partitions; in option b) I would have: 19 partitions with 2 indexes each 1 partition (the "current day" one) with 24*2 partial indexes Does it make sense? Anyone has ever used partial indexes instead of partitions?
On 11 June 2010 13:00, Leonardo F <m_lists@yahoo.it> wrote: > a) create 480 partitions, 1 for each hour of the day. 2 indexes on each > partition > b) create 20 partitions, and create 24*2 partial indexes on the current > partition; then the next day (overnight) create 2 global indexes for the > table and drop the 24*2 indexes... > > I thought about option b) because I don't like the fact that the planner takes > "ages" to plan a query in case there are 480 partitions; in option b) I would > have: > > 19 partitions with 2 indexes each > 1 partition (the "current day" one) with 24*2 partial indexes Could you please explain the reason to do so many partitions? In case b) you will face a huge overhead related to necessity of checking all the data in the table every time new index is created (doesn't matter it is partial). -- Sergey Konoplev Blog: http://gray-hemp.blogspot.com / Linkedin: http://ru.linkedin.com/in/grayhemp / JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802
> Could you please explain the reason to do so many > partitions? Because otherwise there would be tons of rows in each partition, and randomly "updating" the index for that many rows 2000 times per second isn't doable (the indexes get so big that it would be like writing a multi-GB file randomly) > In case b) you will face a huge overhead related to necessity > of > checking all the data in the table every time new index is > created I would create the table with all the indexes already in; but only the index related to the "current timestamp of the inserted row" would be updated; the others wouldn't be touched.
On 11 June 2010 16:29, Leonardo F <m_lists@yahoo.it> wrote: > >> Could you please explain the reason to do so many >> partitions? > > > Because otherwise there would be tons of rows in each > partition, and randomly "updating" the index for that many > rows 2000 times per second isn't doable (the indexes > get so big that it would be like writing a multi-GB file > randomly) > >> In case b) you will face a huge overhead related to necessity >> of >> checking all the data in the table every time new index is >> created > > > I would create the table with all the indexes already in; but only > > the index related to the "current timestamp of the inserted row" > would be updated; the others wouldn't be touched. Well the situation is still ambiguous so: Is it possible to provide this table and indexes definitions? And it would be great it you describe the queries you are going to do on this table or just provide the SQL. -- Sergey Konoplev Blog: http://gray-hemp.blogspot.com / Linkedin: http://ru.linkedin.com/in/grayhemp / JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802
> Well the situation is still ambiguous > so: > Is it possible to provide this table and indexes definitions? > And it > would be great it you describe the queries you are going to do > on this table > or just provide the SQL. Sure! Basically what I'm trying to do is to partition the index in the table where the data is going to be inserted into smaller indexes, but without using partitions: I would use partial indexes. "Historic" data will have just the big index... say that I want to store 1G rows: 100M per day, 10 days. I would have 10 tables, 9 of them with 2 big indexes (the indexes on the 2 columns that are going to be used in queries together with the timestamp) and the latest one with 24*2 smaller indexes (so that insertion will still be fast) to be dropped overnight after the 2 big indexes have been created... then a new table is created (for the new day's data) with the small indexes and the oldest table dropped (as I said, I won't store more than 10 days). This is "pseudo SQL": CREATE TABLE master ( ts timestamp, key1 bigint, <-- populated with almost-random values key2 bigint, <-- populated with almost-random values data1 varchar(20), [...] ); CREATE TABLE master_01 ( CHECK ( ts >= DATE '2006-03-01' AND ts < DATE '2006-03-02' ) ) INHERITS (master); CREATE INDEX master_01_ix1 ON master_01 (key1); CREATE INDEX master_01_ix2 ON master_01 (key2) CREATE TABLE master_02 ( CHECK ( ts >= DATE '2006-03-02' AND ts < DATE '2006-03-03' ) ) INHERITS (master); CREATE INDEX master_02_ix1 ON master_02 (key1); CREATE INDEX master_02_ix2 ON master_02 (key2) [10 tables like the above...] With this config insertion on the "today's" table will be slow at the end of the day, because updating the 2 indexes will be very slow (they will be getting very large). So I thought I could make, on "today's table", instead of the 2 indexes on the whole table, something like: CREATE INDEX master_10_1_ix1 ON master_10 (key1) WHERE (ts >= DATETIME '2006-03-10 00:00' and ts < DATETIME '2006-03-10 01:00') (same thing for second indexed column) CREATE INDEX master_10_2_ix1 ON master_10 (key1) WHERE (ts >= DATETIME '2006-03-10 01:00' and ts < DATETIME '2006-03-10 02:00') (same thing for second indexed column) [other 22 indexes definition like the above, one per hour...] That is, the table where data will be inserted (ts will always be ascending, so I will always insert data in the latest table) will have multiple small indexes. Then, at night, the small indexes would be dropped after one big index has been created (since no more rows will be inserted in that table, I don't care if the index is big). So, a query like: select * from master where key1=938479 and ts between now() and "now()-10 minutes" would use the proper index on the "today's" table; a query like: select * from master where key1=938479 and ts between "3 days ago" and "2 days ago" would use the indexes in table "today minus 2 days" and "today minus 3 days"
On 11 June 2010 17:15, Leonardo F <m_lists@yahoo.it> wrote: > Basically what I'm trying to do is to partition the index in the table > where the data is going to be inserted into smaller indexes, but > without using partitions: I would use partial indexes. > "Historic" data will have just the big index... Well, you can estimate if it's worth bothering with index partitioning. For "selects" you should compare logM(N) N - number of records M - (base) number of records in b-tree node (in one 8k page) for whole table partition and index partition but I do not think the difference would be great. For "inserts" I do not see the reason why it would be better to use index partitioning because AFAIK b-tree would behave exactly the same in both cases. > That is, the table where data will be inserted (ts will always be > ascending, so I will always insert data in the latest table) > will have multiple small indexes. > Then, at night, the small indexes would be dropped after one big > index has been created (since no more rows will be inserted in that > table, I don't care if the index is big). > > So, a query like: > select * from master where key1=938479 > and ts between now() and "now()-10 minutes" You should explicitly state the index conditions and the partition conditions here otherwise they would not be used SELECT * FROM master WHERE -- For table partition ts >= '2006-03-10' AND ts < '2006-04-10' AND -- For index partition ts >= '2006-03-10 01:00' AND ts < '2006-03-10 02:00' AND -- Target conditions key1 = 938479 AND ts BETWEEN now() AND now() - interval '10 minutes'; Furthermore I would suggest you to use this index CREATE INDEX master_10_2_ix1 ON master_10 (key1, ts) WHERE ts >= '2006-03-10 01:00' and ts < '2006-03-10 02:00'; if you want "Target conditions" to work optimal way. > a query like: > select * from master where key1=938479 > and ts between "3 days ago" and "2 days ago" You can not use BETWEEN here because it is equal to "ts >= ... AND ts <= ..." not "ts >= ... AND ts < ..." as specified in the table definition. See above. -- Sergey Konoplev Blog: http://gray-hemp.blogspot.com / Linkedin: http://ru.linkedin.com/in/grayhemp / JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802
> For "inserts" I do not see the reason > why > it would be better to use index partitioning because AFAIK > b-tree > would behave exactly the same in both cases. no, when the index gets very big inserting random values gets very slow. But still, my approach doesn't work because I thought Postgresql was able to "merge" different partial indexes (using BitmapOr/BitmapAnd) when the WHERE condition matches multiple partial indexes... but that's (I guess) not that easy to do (basically partial indexes condition should be checked like the constraint of the inheritance mechanism, and Or/And Bitmapped as happens in the "regular" partitioning method). that is, having a table with: CREATE TABLE test ( a timestamp without time zone, b integer ) CREATE INDEX test1idx ON test (b) WHERE a >= '2008-03-10 14:00:00' AND a < '2008-03-10 16:00:00'; CREATE INDEX test2idx ON test (b) WHERE a >= '2008-03-10 16:00:00' AND a < '2008-03-10 18:00:00'; the select: select * from test where a > '2008-03-10 15:00:00' and a < '2008-03-10 17:00:00' and b = 100 should use a BitmapOr between an index scan on test1idx and test2idx... But I think that's a complicated thing for the planner... even though it doesn't sound that different from what the planner does for partition pruning...
On 14 June 2010 13:24, Leonardo F <m_lists@yahoo.it> wrote: >> For "inserts" I do not see the reason >> why >> it would be better to use index partitioning because AFAIK >> b-tree >> would behave exactly the same in both cases. > > no, when the index gets very big inserting random values gets > very slow. Hm, interesting. Could you please provide some references/links? -- Sergey Konoplev Blog: http://gray-hemp.blogspot.com / Linkedin: http://ru.linkedin.com/in/grayhemp / JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802
On Mon, Jun 14, 2010 at 5:24 AM, Leonardo F <m_lists@yahoo.it> wrote:
Do you have any empirical evidence for this being a real problem, or are you simply guessing? I have tables with 500m+ rows, on commodity hardware (4 SATA disks in raid 10), and inserts to the indexes on those tables remain quite acceptable from a performance standpoint.
> For "inserts" I do not see the reasonno, when the index gets very big inserting random values gets
> why
> it would be better to use index partitioning because AFAIK
> b-tree
> would behave exactly the same in both cases.
very slow.
Do you have any empirical evidence for this being a real problem, or are you simply guessing? I have tables with 500m+ rows, on commodity hardware (4 SATA disks in raid 10), and inserts to the indexes on those tables remain quite acceptable from a performance standpoint.
--
- David T. Wilson
david.t.wilson@gmail.com
On Mon, Jun 14, 2010 at 7:27 AM, David Wilson <david.t.wilson@gmail.com> wrote: > > > On Mon, Jun 14, 2010 at 5:24 AM, Leonardo F <m_lists@yahoo.it> wrote: >> >> > For "inserts" I do not see the reason >> > why >> > it would be better to use index partitioning because AFAIK >> > b-tree >> > would behave exactly the same in both cases. >> >> no, when the index gets very big inserting random values gets >> very slow. > > Do you have any empirical evidence for this being a real problem, or are you > simply guessing? I have tables with 500m+ rows, on commodity hardware (4 > SATA disks in raid 10), and inserts to the indexes on those tables remain > quite acceptable from a performance standpoint. > Can you define acceptable? IIRC the OP is looking for 20,000+ inserts / sec. -- Peter Hunsberger
On Mon, Jun 14, 2010 at 8:38 AM, Peter Hunsberger <peter.hunsberger@gmail.com> wrote:
Can you define acceptable? IIRC the OP is looking for 20,000+ inserts / sec.
He's actually only looking for 2k inserts/sec. With a battery backed controller I can sustain that, yes. That's also on commodity hardware (the whole system was under $2k a year and a half ago).
--
- David T. Wilson
david.t.wilson@gmail.com
On Mon, Jun 14, 2010 at 08:27:49AM -0400, David Wilson wrote: > On Mon, Jun 14, 2010 at 5:24 AM, Leonardo F <m_lists@yahoo.it> wrote: > > > For "inserts" I do not see the reason why it would be better to > > > use index partitioning because AFAIK b-tree would behave exactly > > > the same in both cases. > > > > no, when the index gets very big inserting random values gets > > very slow. > > Do you have any empirical evidence for this being a real problem, or are you > simply guessing? Just guessing here as well, but when you're inserting uniformly distributed "random" values, then it should slow down quite a lot. You may happen to be lucky in your distributions and keep the upper nodes of the tree in cache but with more uniform distributions the less this is going to happen. The larger an index and the more uniform the distribution the more time is going to be spent pulling blocks off the disk. AFAIU the OP is trying to give the cache a chance of doing some useful work by partitioning by time so it's going to be forced to go to disk less. Slightly more usefully for the OP, have you considered a couple of "levels" to your hierarchy. Maybe bi-hourly (~15 million records?) within the current day and move them over into a "day" table at night (or whenever is better). It would be a good time to cluster the data, if that would help as well. -- Sam http://samason.me.uk/
> AFAIU the OP is trying to give the cache a chance of > doing some useful > work by partitioning by time so it's going to be forced to > go to disk > less. Exactly > have you > considered a couple of > "levels" to your hierarchy. Maybe bi-hourly (~15 > million records?) > within the current day and move them over into a "day" > table at night I was going for the partitioned-index approach because it would avoid re-copying the data over another table. My idea was: 1) create partial indexes on today's table 2) at night, create a whole index (not partial) on yesterday's table 3) drop the partial indexes on yesterday's table But this doesn't work, because partial indexes aren't "appended" the way partitioned tables are... that is, if I have one index covering half table, and another covering the other half, if I query the data over the "intersection" I'll always get a plain table scan, where I would expect the planner to do an append of the result of 2 index scans... Would it be something that could be added to the TODO list? It doesn't look that different from what table partitioning/pruning does.... Thank you everybody for your replies anyway!