Thread: Index creation running now for 14 hours

Index creation running now for 14 hours

From
Tory M Blue
Date:
I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.

CentOS 6.6


 16398 | clsdb | 25765 |       10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 |                 |       45712 | 2015-08-25 21:12:01.6

19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 2015-08-25 22:16:03.100992-07 | f       | active | select "_cls".fini

shTableAfterCopy(143); analyze "torque"."impressions"; 

I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.


Thanks

Tory



iostat: sdb is the db directory 


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.55    0.00    0.23    0.00    0.00   96.22


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.00         0.00        12.00          0         24

sdb               0.00         0.00         0.00          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.57    0.00    0.06    0.00    0.00   96.37


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb              21.50         0.00     15484.00          0      30968


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.72    0.00    0.06    0.00    0.00   96.22


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               2.00         0.00        20.00          0         40

sdb               0.00         0.00         0.00          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           4.06    0.00    0.05    0.02    0.00   95.87


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               4.00         0.00        64.00          0        128

sdb               3.50         0.00       108.00          0        216


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.36    0.00    0.03    0.00    0.00   96.61


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.41    0.00    0.06    0.00    0.00   96.53


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.45    0.00    0.27    0.00    0.00   96.28


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               1.00         0.00        24.00          0         48


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.50    0.00    0.30    0.00    0.00   96.20


Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.50         0.00       344.00          0        688

sdb               0.00         0.00         0.00          0          0


Re: Index creation running now for 14 hours

From
Igor Neyman
Date:

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours

 

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.

 

CentOS 6.6

 

 

 16398 | clsdb | 25765 |       10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 |                 |       45712 | 2015-08-25 21:12:01.6

19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 2015-08-25 22:16:03.100992-07 | f       | active | select "_cls".fini

shTableAfterCopy(143); analyze "torque"."impressions"; 

I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.

 

Thanks

Tory

 

 

iostat: sdb is the db directory 

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.55    0.00    0.23    0.00    0.00   96.22

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.00         0.00        12.00          0         24

sdb               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.57    0.00    0.06    0.00    0.00   96.37

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb              21.50         0.00     15484.00          0      30968

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.72    0.00    0.06    0.00    0.00   96.22

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               2.00         0.00        20.00          0         40

sdb               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           4.06    0.00    0.05    0.02    0.00   95.87

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               4.00         0.00        64.00          0        128

sdb               3.50         0.00       108.00          0        216

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.36    0.00    0.03    0.00    0.00   96.61

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.41    0.00    0.06    0.00    0.00   96.53

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.45    0.00    0.27    0.00    0.00   96.28

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               1.00         0.00        24.00          0         48

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           3.50    0.00    0.30    0.00    0.00   96.20

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.50         0.00       344.00          0        688

sdb               0.00         0.00         0.00          0          0

 

Check pg_locks in regards to the table in question.

 

Regards,

Igor Neyman

Re: Index creation running now for 14 hours

From
Tory M Blue
Date:


On Wed, Aug 26, 2015 at 12:18 PM, Igor Neyman <ineyman@perceptron.com> wrote:

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours

 

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.

 

CentOS 6.6

 

 

 16398 | clsdb | 25765 |       10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 |                 |       45712 | 2015-08-25 21:12:01.6

19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 2015-08-25 22:16:03.100992-07 | f       | active | select "_cls".fini

shTableAfterCopy(143); analyze "torque"."impressions"; 

I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.

 

Thanks

Tory

 

 

i

 

Check pg_locks in regards to the table in question.

 

Regards,

Igor Neyman


thanks Igor I did, but not clear what that is telling me, there are 249 rows in there, nothing has a table name , they are all for the PID in the "analyze torque.impressions line that I listed above pid 25765.

Here is one for an exclusive lock, but what should I be looking for? There are no other processes on this box other than slon and this index creation.


 transactionid |          |            |      |       |            |      93588453 |         |       |          | 4/25823460         | 25765 | ExclusiveL

ock       | t       | f


Thanks
Tory 

Re: Index creation running now for 14 hours

From
Igor Neyman
Date:

 

 

From: Tory M Blue [mailto:tmblue@gmail.com]
Sent: Wednesday, August 26, 2015 3:26 PM
To: Igor Neyman <ineyman@perceptron.com>
Cc: pgsql-performance <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Index creation running now for 14 hours

 

 

 

On Wed, Aug 26, 2015 at 12:18 PM, Igor Neyman <ineyman@perceptron.com> wrote:

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours

 

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.

 

CentOS 6.6

 

 

 16398 | clsdb | 25765 |       10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 |                 |       45712 | 2015-08-25 21:12:01.6

19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 2015-08-25 22:16:03.100992-07 | f       | active | select "_cls".fini

shTableAfterCopy(143); analyze "torque"."impressions"; 

I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.

 

Thanks

Tory

 

 

i

 

Check pg_locks in regards to the table in question.

 

Regards,

Igor Neyman

 

thanks Igor I did, but not clear what that is telling me, there are 249 rows in there, nothing has a table name , they are all for the PID in the "analyze torque.impressions line that I listed above pid 25765.

 

Here is one for an exclusive lock, but what should I be looking for? There are no other processes on this box other than slon and this index creation.

 

 

 transactionid |          |            |      |       |            |      93588453 |         |       |          | 4/25823460         | 25765 | ExclusiveL

ock       | t       | f

 

Thanks

Tory 

 

There are objects OIDs in pg_lock, not names.

Find the OID of the table that you create your index for, and search pg_locks for the records referencing your table.

It cannot be that all records in pg_locks are for pid running “analyze”, there should be records with pid running your “create index”.

What’s the size of the table you are indexing?

Also, take a look at pg_stat_activity for long running transactions/queries.

 

Igor Neyman

 

Re: Index creation running now for 14 hours

From
Tory M Blue
Date:


On Wed, Aug 26, 2015 at 12:36 PM, Igor Neyman <ineyman@perceptron.com> wrote:

 

 

From: Tory M Blue [mailto:tmblue@gmail.com]
Sent: Wednesday, August 26, 2015 3:26 PM
To: Igor Neyman <ineyman@perceptron.com>
Cc: pgsql-performance <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Index creation running now for 14 hours

 

 

 

On Wed, Aug 26, 2015 at 12:18 PM, Igor Neyman <ineyman@perceptron.com> wrote:

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours

 

I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.

 

CentOS 6.6

 

 

 16398 | clsdb | 25765 |       10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 |                 |       45712 | 2015-08-25 21:12:01.6

19819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 2015-08-25 22:16:03.100992-07 | f       | active | select "_cls".fini

shTableAfterCopy(143); analyze "torque"."impressions"; 

I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.

 

Thanks

Tory

 

 

i

 

Check pg_locks in regards to the table in question.

 

Regards,

Igor Neyman

 

thanks Igor I did, but not clear what that is telling me, there are 249 rows in there, nothing has a table name , they are all for the PID in the "analyze torque.impressions line that I listed above pid 25765.

 

Here is one for an exclusive lock, but what should I be looking for? There are no other processes on this box other than slon and this index creation.

 

 

 transactionid |          |            |      |       |            |      93588453 |         |       |          | 4/25823460         | 25765 | ExclusiveL

ock       | t       | f

 

Thanks

Tory 

 

There are objects OIDs in pg_lock, not names.

Find the OID of the table that you create your index for, and search pg_locks for the records referencing your table.

It cannot be that all records in pg_locks are for pid running “analyze”, there should be records with pid running your “create index”.

What’s the size of the table you are indexing?

Also, take a look at pg_stat_activity for long running transactions/queries.

 

Igor Neyman

 


the table is 90GB without indexes,  285GB with indexes and bloat,  The row count is not actually completing.. 125Million rows over 13 months, this table is probably close to 600million rows.

Yes I have long running queries, my job started last night at 9pm, it appears 3 of the 6 indexes on this table are completed, but I'm about to blow out my disk space, so I won't be able to give it much longer to run.... Bah!

16398 | clsdb | 25765 |       10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 |                 |       45712 | 2015-08-25 21:12:01.619819-07 | 2015-08-25 21:22:08.68766-07  | 2015-08-25 22:16:03.10099-07  | 2015-08-25 22:16:03.100992-07 | f       | active | select "_cls".finishTableAfterCopy(143); analyze "torque"."impressions"; 

 16398 | lsdb | 25777 |       10 | postgres | slon.local_cleanup        | 10.13.200.232 |                 |       45718 | 2015-08-25 21:12:01.624032-07 | 2015-08-25 21:23:09.103395-07 | 2015-08-25 21:23:09.103395-07 | 2015-08-25 21:23:09.103397-07 | t       | active | begin;lock table "_cls".sl_config_lock;select "_cls".cleanupEvent('10 minutes'::interval);commit; 

there is nothing else in the pg_stat table other than a bunch of slony connections, these are the only 2 items that have been running since the index started last night at 10:16pm

2015-08-25 22:16:03 PDT CONFIG remoteWorkerThread_1: 67254824703 bytes copied for table "torque"."impressions"

The above is when it had finished copying the table and started on the index..

Well as I said I'm running out of storage as the index is creating some serious data on the filesystem, I'll have to kill it, try to massage the data a bit and increase the maintenance_work mem to use some of my 256GB of ram to try to get through this. Right now the 100% cpu process which is this index is only using 3.5GB and has been for the last 15 hours


Tory

Re: Index creation running now for 14 hours

From
Qingqing Zhou
Date:
On Wed, Aug 26, 2015 at 1:26 PM, Tory M Blue <tmblue@gmail.com> wrote:
>
> Right now the 100% cpu process which is this index is only using 3.5GB
> and has been for the last 15 hours
>

If 100% cpu, you can do 'sudo perf top' to see what the CPU is busy about.

Regards,
Qingqing


Re: Index creation running now for 14 hours

From
Tory M Blue
Date:


On Wed, Aug 26, 2015 at 2:45 PM, Qingqing Zhou <zhouqq.postgres@gmail.com> wrote:
On Wed, Aug 26, 2015 at 1:26 PM, Tory M Blue <tmblue@gmail.com> wrote:
>
> Right now the 100% cpu process which is this index is only using 3.5GB
> and has been for the last 15 hours
>

If 100% cpu, you can do 'sudo perf top' to see what the CPU is busy about.

Regards,
Qingqing


I appreciate the attempted help, but I know what it's doing, it's creating indexes for the last 14+ hours. I've killed it now,  as it was about to run my machine out of disk space, stopped it at 97% full, could not go any longer.

I will now clean up the table a bit but will still have 500million rows with 6 indexes on it. I will create the indexes after the data is laid down vs during,  so it doesn't block my other table replications. I will then fire off my index creations in parallel  for my other tables so I can actually use the hardware the DB is sitting on.

But I guess the answer is, no real way to tell what the box is doing when it's creating an index. Yes there was a lock, no I could not find a way to see how it's progressing so there was no way for me to gauge when it would be done.

Thanks
Tory

Tory

Re: Index creation running now for 14 hours

From
Tomas Vondra
Date:
Hi,

On 08/26/2015 11:53 PM, Tory M Blue wrote:
>
>
> On Wed, Aug 26, 2015 at 2:45 PM, Qingqing Zhou
> <zhouqq.postgres@gmail.com <mailto:zhouqq.postgres@gmail.com>> wrote:
>
>     On Wed, Aug 26, 2015 at 1:26 PM, Tory M Blue <tmblue@gmail.com
>     <mailto:tmblue@gmail.com>> wrote:
>      >
>      > Right now the 100% cpu process which is this index is only using
>     3.5GB
>      > and has been for the last 15 hours
>      >
>
>     If 100% cpu, you can do 'sudo perf top' to see what the CPU is busy
>     about.
>
>     Regards,
>     Qingqing
>
>
>
> I appreciate the attempted help, but I know what it's doing, it's
> creating indexes for the last 14+ hours.

Sure, but what exactly was it doing? 'perf top' might give us a hint
which function is consuming most of the time, for example.

 > I've killed it now,  as it was
> about to run my machine out of disk space, stopped it at 97% full, could
> not go any longer.

Which suggests it's using a lot of temp files.

Indexes are built by reading all the necessary data from the table (just
the columns), sorted and then an index is built using the sorted data
(because it can be done very efficiently - much faster than when simply
inserting the tuples into the btree index).

The fact that you ran out of disk space probably means that you don't
have enough space for the sort (it clearly does not fit into
maintenance_work_mem), and there's no way around that - you need enough
disk space.

> I will now clean up the table a bit but will still have 500million rows
> with 6 indexes on it. I will create the indexes after the data is laid
> down vs during,  so it doesn't block my other table replications. I will
> then fire off my index creations in parallel  for my other tables so I
> can actually use the hardware the DB is sitting on.

That's a very bad idea, because each of the index builds will require
disk space for the sort, and you're even more likely to run out of disk
space.

>
> But I guess the answer is, no real way to tell what the box is doing
> when it's creating an index. Yes there was a lock, no I could not find a
> way to see how it's progressing so there was no way for me to gauge when
> it would be done.

Had it been waiting on a lock, it wouldn't consume 100% of CPU.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Index creation running now for 14 hours

From
Tomas Vondra
Date:

On 08/26/2015 10:26 PM, Tory M Blue wrote:

>
> the table is 90GB without indexes,  285GB with indexes and bloat,  The
> row count is not actually completing.. 125Million rows over 13 months,
> this table is probably close to 600million rows.

You don't need to do SELECT COUNT(*) if you only need an approximate
number. You can look at pg_class.reltuples:

    SELECT reltuples FROM pg_class WHERE relname = 'impressions';

That should be a sufficiently accurate estimate.

> The above is when it had finished copying the table and started on the
> index..
>
> Well as I said I'm running out of storage as the index is creating some
> serious data on the filesystem, I'll have to kill it, try to massage the
> data a bit and increase the maintenance_work mem to use some of my 256GB
> of ram to try to get through this. Right now the 100% cpu process which
> is this index is only using 3.5GB and has been for the last 15 hours

Please post details on the configuration (shared_buffer, work_mem,
maintenance_work_mem and such).

BTW while the the CREATE INDEX is reporting 3.5GB, it most likely wrote
a lot of data into on-disk chunks when sorting the data. So it's
actually using the memory through page cache (i.e. don't increase
maintenance_work_mem too much, you don't want to force the data to disk
needlessly).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Index creation running now for 14 hours

From
Peter Geoghegan
Date:
On Wed, Aug 26, 2015 at 3:36 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>> But I guess the answer is, no real way to tell what the box is doing
>> when it's creating an index. Yes there was a lock, no I could not find a
>> way to see how it's progressing so there was no way for me to gauge when
>> it would be done.
>
>
> Had it been waiting on a lock, it wouldn't consume 100% of CPU.

When things are going out to disk anyway, you're often better off with
a lower maintenance_work_mem (or work_mem). It's actually kind of
bogus than run size is dictated by these settings. Reducing it will
tend to make tuplesort's maintenance of the heap invariant
inexpensive, while not really making the merge phase more painful. I
would try 128MB of maintenance_work_mem. That could be significantly
faster. Check how the I/O load on the system compares with a higher
maintenance_work_mem setting. Often, this will make the sort less CPU
bound, which is good here.

I am currently working on making this a lot better in Postgres 9.6.
Also, note that text and numeric sorts will be much faster in 9.5.

Of course, as Tomas says, if you don't have the disk space to do the
sort, you're not going to be able to complete it. That much is very
clear.

If you're really worried about these costs, I suggest enabling
trace_sort locally, and monitoring the progress of this sort in the
logs.

--
Regards,
Peter Geoghegan