Thread: Building multiple indexes on one table.

Building multiple indexes on one table.

From
Chris Ruprecht
Date:
Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For
smalltables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't
wantto read that table 6 times. 
Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that are
alreadythere and I don't know if that reads the table once or multiple times. If I could create indexes inactive and
thenrun reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. 

best regards,
chris
--
chris ruprecht
database grunt and bit pusher extraordinaíre



Re: Building multiple indexes on one table.

From
Claudio Freire
Date:
On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote:
> Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For
smalltables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't
wantto read that table 6 times. 
> Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that
arealready there and I don't know if that reads the table once or multiple times. If I could create indexes inactive
andthen run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. 

Just build them with separate but concurrent connections, and the
scans will be synchronized so it will be only one.

Btw, reindex rebuilds one index at a time, so what I do is issue
separate reindex for each index in parallel, to avoid the repeated
scans as well.

Just make sure you've got the I/O and CPU capacity for it (you'll be
writing many indexes at once, so there is a lot of I/O).


Re: Building multiple indexes on one table.

From
Marc Mamin
Date:
>Von: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org]" im Auftrag von
"ClaudioFreire [klaussfreire@gmail.com] 
>Gesendet: Freitag, 18. Juli 2014 01:21
>An: Chris Ruprecht
>Cc: pgsql-performance@postgresql.org
>Betreff: Re: [PERFORM] Building multiple indexes on one table.
>
>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote:
>> Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For
smalltables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't
wantto read that table 6 times. 
>> Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that
arealready there and I don't know if that reads the table once or multiple times. If I could create indexes inactive
andthen run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. 
>
>Just build them with separate but concurrent connections, and the
>scans will be synchronized so it will be only one.
>
>Btw, reindex rebuilds one index at a time, so what I do is issue
>separate reindex for each index in parallel, to avoid the repeated
>scans as well.
>
>Just make sure you've got the I/O and CPU capacity for it (you'll be
>writing many indexes at once, so there is a lot of I/O).

Index creation on large tables are mostly CPU bound as long as no swap occurs.
I/O may be an issue when all your indexes are similar; e.g. all on single int4 columns.
in other cases the writes will not all take place concurrently.
To reduce I/O due to swap, you can consider increasing maintenance_work_mem on the connextions/sessionns
that build the indexes.

regards,

Marc Mamin


Re: Building multiple indexes on one table.

From
Claudio Freire
Date:
On Wed, Jul 23, 2014 at 4:40 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote:
>>> Is there any way that I can build multiple indexes on one table without having to scan the table multiple times?
Forsmall tables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't
wantto read that table 6 times. 
>>> Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that
arealready there and I don't know if that reads the table once or multiple times. If I could create indexes inactive
andthen run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either. 
>>
>>Just build them with separate but concurrent connections, and the
>>scans will be synchronized so it will be only one.
>>
>>Btw, reindex rebuilds one index at a time, so what I do is issue
>>separate reindex for each index in parallel, to avoid the repeated
>>scans as well.
>>
>>Just make sure you've got the I/O and CPU capacity for it (you'll be
>>writing many indexes at once, so there is a lot of I/O).
>
> Index creation on large tables are mostly CPU bound as long as no swap occurs.
> I/O may be an issue when all your indexes are similar; e.g. all on single int4 columns.
> in other cases the writes will not all take place concurrently.
> To reduce I/O due to swap, you can consider increasing maintenance_work_mem on the connextions/sessionns
> that build the indexes.

Usually there will always be swap, unless you've got toy indexes.

But swap I/O is all sequential I/O, with a good readahead setting
there should be no problem.

It's the final writing step that can be a bottleneck if you have a
lame I/O system and try to push 5 or 6 indexes at once.


Re: Building multiple indexes on one table.

From
Felipe Santos
Date:
Your question: Is there any way that I can build multiple indexes on one table without having to scan the table multiple times?

My answer: I don't think so. Since each index has a different indexing rule, it will analyze the same table in a different way. I've built indexes on a 100GB table recently and it didn't take me too much time (Amazon EC2 with 8 CPU cores / 70 GB RAM). I don't remember how much time it took, but that's a good sign right  ;-)  ? Painful jobs are always remembered... (ok, the hardware helped a lot).

So, my advice is: get yourself a good maintenance window and just build indexes, remember that they will help a lot of people querying this table.


2014-07-23 16:49 GMT-03:00 Claudio Freire <klaussfreire@gmail.com>:
On Wed, Jul 23, 2014 at 4:40 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht <chris@cdrbill.com> wrote:
>>> Is there any way that I can build multiple indexes on one table without having to scan the table multiple times? For small tables, that's probably not an issue, but if I have a 500 GB table that I need to create 6 indexes on, I don't want to read that table 6 times.
>>> Nothing I could find in the manual other than reindex, but that's not helping, since it only rebuilds indexes that are already there and I don't know if that reads the table once or multiple times. If I could create indexes inactive and then run reindex, which then reads the table once, I would have a solution. But that doesn't seem to exist either.
>>
>>Just build them with separate but concurrent connections, and the
>>scans will be synchronized so it will be only one.
>>
>>Btw, reindex rebuilds one index at a time, so what I do is issue
>>separate reindex for each index in parallel, to avoid the repeated
>>scans as well.
>>
>>Just make sure you've got the I/O and CPU capacity for it (you'll be
>>writing many indexes at once, so there is a lot of I/O).
>
> Index creation on large tables are mostly CPU bound as long as no swap occurs.
> I/O may be an issue when all your indexes are similar; e.g. all on single int4 columns.
> in other cases the writes will not all take place concurrently.
> To reduce I/O due to swap, you can consider increasing maintenance_work_mem on the connextions/sessionns
> that build the indexes.

Usually there will always be swap, unless you've got toy indexes.

But swap I/O is all sequential I/O, with a good readahead setting
there should be no problem.

It's the final writing step that can be a bottleneck if you have a
lame I/O system and try to push 5 or 6 indexes at once.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance