Thread: Building multiple indexes on one table.
Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For smalltables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't wantto read that table 6 times. Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that are alreadythere and I don't know if that reads the table once or multiple times. If I could create indexes inactive and thenrun reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. best regards, chris -- chris ruprecht database grunt and bit pusher extraordinaíre
On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote: > Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For smalltables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't wantto read that table 6 times. > Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that arealready there and I don't know if that reads the table once or multiple times. If I could create indexes inactive andthen run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. Just build them with separate but concurrent connections, and the scans will be synchronized so it will be only one. Btw, reindex rebuilds one index at a time, so what I do is issue separate reindex for each index in parallel, to avoid the repeated scans as well. Just make sure you've got the I/O and CPU capacity for it (you'll be writing many indexes at once, so there is a lot of I/O).
>Von: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org]" im Auftrag von "ClaudioFreire [klaussfreire@gmail.com] >Gesendet: Freitag, 18. Juli 2014 01:21 >An: Chris Ruprecht >Cc: pgsql-performance@postgresql.org >Betreff: Re: [PERFORM] Building multiple indexes on one table. > >On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote: >> Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For smalltables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't wantto read that table 6 times. >> Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that arealready there and I don't know if that reads the table once or multiple times. If I could create indexes inactive andthen run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. > >Just build them with separate but concurrent connections, and the >scans will be synchronized so it will be only one. > >Btw, reindex rebuilds one index at a time, so what I do is issue >separate reindex for each index in parallel, to avoid the repeated >scans as well. > >Just make sure you've got the I/O and CPU capacity for it (you'll be >writing many indexes at once, so there is a lot of I/O). Index creation on large tables are mostly CPU bound as long as no swap occurs. I/O may be an issue when all your indexes are similar; e.g. all on single int4 columns. in other cases the writes will not all take place concurrently. To reduce I/O due to swap, you can consider increasing maintenance_work_mem on the connextions/sessionns that build the indexes. regards, Marc Mamin
On Wed, Jul 23, 2014 at 4:40 PM, Marc Mamin <M.Mamin@intershop.de> wrote: >>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote: >>> Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? Forsmall tables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't wantto read that table 6 times. >>> Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that arealready there and I don't know if that reads the table once or multiple times. If I could create indexes inactive andthen run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. >> >>Just build them with separate but concurrent connections, and the >>scans will be synchronized so it will be only one. >> >>Btw, reindex rebuilds one index at a time, so what I do is issue >>separate reindex for each index in parallel, to avoid the repeated >>scans as well. >> >>Just make sure you've got the I/O and CPU capacity for it (you'll be >>writing many indexes at once, so there is a lot of I/O). > > Index creation on large tables are mostly CPU bound as long as no swap occurs. > I/O may be an issue when all your indexes are similar; e.g. all on single int4 columns. > in other cases the writes will not all take place concurrently. > To reduce I/O due to swap, you can consider increasing maintenance_work_mem on the connextions/sessionns > that build the indexes. Usually there will always be swap, unless you've got toy indexes. But swap I/O is all sequential I/O, with a good readahead setting there should be no problem. It's the final writing step that can be a bottleneck if you have a lame I/O system and try to push 5 or 6 indexes at once.
Your question: Is there any way that I can build multiple indexes on one table without having to scan the table multiple times?
My answer: I don't think so. Since each index has a different indexing rule, it will analyze the same table in a different way. I've built indexes on a 100GB table recently and it didn't take me too much time (Amazon EC2 with 8 CPU cores / 70 GB RAM). I don't remember how much time it took, but that's a good sign right ;-) ? Painful jobs are always remembered... (ok, the hardware helped a lot).
So, my advice is: get yourself a good maintenance window and just build indexes, remember that they will help a lot of people querying this table.
2014-07-23 16:49 GMT-03:00 Claudio Freire <klaussfreire@gmail.com>:
On Wed, Jul 23, 2014 at 4:40 PM, Marc Mamin <M.Mamin@intershop.de> wrote:Usually there will always be swap, unless you've got toy indexes.
>>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote:
>>> Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For small tables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't want to read that table 6 times.
>>> Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that are already there and I don't know if that reads the table once or multiple times. If I could create indexes inactive and then run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either.
>>
>>Just build them with separate but concurrent connections, and the
>>scans will be synchronized so it will be only one.
>>
>>Btw, reindex rebuilds one index at a time, so what I do is issue
>>separate reindex for each index in parallel, to avoid the repeated
>>scans as well.
>>
>>Just make sure you've got the I/O and CPU capacity for it (you'll be
>>writing many indexes at once, so there is a lot of I/O).
>
> Index creation on large tables are mostly CPU bound as long as no swap occurs.
> I/O may be an issue when all your indexes are similar; e.g. all on single int4 columns.
> in other cases the writes will not all take place concurrently.
> To reduce I/O due to swap, you can consider increasing maintenance_work_mem on the connextions/sessionns
> that build the indexes.
But swap I/O is all sequential I/O, with a good readahead setting
there should be no problem.
It's the final writing step that can be a bottleneck if you have a
lame I/O system and try to push 5 or 6 indexes at once.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance