Thread: Index creation running now for 14 hours
16398 | clsdb | 25765 | 10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 | | 45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07 | 2015-08-25 22:16:03.10099-07 | 2015-08-25 22:16:03.100992-07 | f | active | select "_cls".fini
shTableAfterCopy(143); analyze "torque"."impressions";
I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.
Thanks
Tory
iostat: sdb is the db directory
avg-cpu: %user %nice %system %iowait %steal %idle
3.55 0.00 0.23 0.00 0.00 96.22
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 12.00 0 24
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.57 0.00 0.06 0.00 0.00 96.37
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 21.50 0.00 15484.00 0 30968
avg-cpu: %user %nice %system %iowait %steal %idle
3.72 0.00 0.06 0.00 0.00 96.22
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.00 0.00 20.00 0 40
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
4.06 0.00 0.05 0.02 0.00 95.87
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 4.00 0.00 64.00 0 128
sdb 3.50 0.00 108.00 0 216
avg-cpu: %user %nice %system %iowait %steal %idle
3.36 0.00 0.03 0.00 0.00 96.61
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.41 0.00 0.06 0.00 0.00 96.53
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.45 0.00 0.27 0.00 0.00 96.28
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 1.00 0.00 24.00 0 48
avg-cpu: %user %nice %system %iowait %steal %idle
3.50 0.00 0.30 0.00 0.00 96.20
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.50 0.00 344.00 0 688
sdb 0.00 0.00 0.00 0 0
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours
I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.
CentOS 6.6
16398 | clsdb | 25765 | 10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 | | 45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07 | 2015-08-25 22:16:03.10099-07 | 2015-08-25 22:16:03.100992-07 | f | active | select "_cls".fini
shTableAfterCopy(143); analyze "torque"."impressions";
I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.
Thanks
Tory
iostat: sdb is the db directory
avg-cpu: %user %nice %system %iowait %steal %idle
3.55 0.00 0.23 0.00 0.00 96.22
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 12.00 0 24
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.57 0.00 0.06 0.00 0.00 96.37
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 21.50 0.00 15484.00 0 30968
avg-cpu: %user %nice %system %iowait %steal %idle
3.72 0.00 0.06 0.00 0.00 96.22
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.00 0.00 20.00 0 40
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
4.06 0.00 0.05 0.02 0.00 95.87
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 4.00 0.00 64.00 0 128
sdb 3.50 0.00 108.00 0 216
avg-cpu: %user %nice %system %iowait %steal %idle
3.36 0.00 0.03 0.00 0.00 96.61
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.41 0.00 0.06 0.00 0.00 96.53
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.45 0.00 0.27 0.00 0.00 96.28
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 1.00 0.00 24.00 0 48
avg-cpu: %user %nice %system %iowait %steal %idle
3.50 0.00 0.30 0.00 0.00 96.20
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.50 0.00 344.00 0 688
sdb 0.00 0.00 0.00 0 0
Check pg_locks in regards to the table in question.
Regards,
Igor Neyman
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours
I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.
CentOS 6.6
16398 | clsdb | 25765 | 10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 | | 45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07 | 2015-08-25 22:16:03.10099-07 | 2015-08-25 22:16:03.100992-07 | f | active | select "_cls".fini
shTableAfterCopy(143); analyze "torque"."impressions";
I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.
Thanks
Tory
i
Check pg_locks in regards to the table in question.
Regards,
Igor Neyman
ock | t | f
From: Tory M Blue [mailto:tmblue@gmail.com]
Sent: Wednesday, August 26, 2015 3:26 PM
To: Igor Neyman <ineyman@perceptron.com>
Cc: pgsql-performance <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Index creation running now for 14 hours
On Wed, Aug 26, 2015 at 12:18 PM, Igor Neyman <ineyman@perceptron.com> wrote:
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours
I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.
CentOS 6.6
16398 | clsdb | 25765 | 10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 | | 45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07 | 2015-08-25 22:16:03.10099-07 | 2015-08-25 22:16:03.100992-07 | f | active | select "_cls".fini
shTableAfterCopy(143); analyze "torque"."impressions";
I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.
Thanks
Tory
i
Check pg_locks in regards to the table in question.
Regards,
Igor Neyman
thanks Igor I did, but not clear what that is telling me, there are 249 rows in there, nothing has a table name , they are all for the PID in the "analyze torque.impressions line that I listed above pid 25765.
Here is one for an exclusive lock, but what should I be looking for? There are no other processes on this box other than slon and this index creation.
transactionid | | | | | | 93588453 | | | | 4/25823460 | 25765 | ExclusiveL
ock | t | f
Thanks
Tory
There are objects OIDs in pg_lock, not names.
Find the OID of the table that you create your index for, and search pg_locks for the records referencing your table.
It cannot be that all records in pg_locks are for pid running “analyze”, there should be records with pid running your “create index”.
What’s the size of the table you are indexing?
Also, take a look at pg_stat_activity for long running transactions/queries.
Igor Neyman
From: Tory M Blue [mailto:tmblue@gmail.com]
Sent: Wednesday, August 26, 2015 3:26 PM
To: Igor Neyman <ineyman@perceptron.com>
Cc: pgsql-performance <pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Index creation running now for 14 hours
On Wed, Aug 26, 2015 at 12:18 PM, Igor Neyman <ineyman@perceptron.com> wrote:
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Tory M Blue
Sent: Wednesday, August 26, 2015 3:14 PM
To: pgsql-performance <pgsql-performance@postgresql.org>
Subject: [PERFORM] Index creation running now for 14 hours
I'm running 9.3.4 with slon 2.2.3, I did a drop add last night at 9pm, it started this particular tables index creation at 10:16pm and it's still running. 1 single core is at 100% (32 core box) and there is almost zero I/O activity.
CentOS 6.6
16398 | clsdb | 25765 | 10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 | | 45712 | 2015-08-25 21:12:01.6
19819-07 | 2015-08-25 21:22:08.68766-07 | 2015-08-25 22:16:03.10099-07 | 2015-08-25 22:16:03.100992-07 | f | active | select "_cls".fini
shTableAfterCopy(143); analyze "torque"."impressions";
I was wondering if there were underlying tools to see how it's progressing, or if there is anything I can do to bump the performance mid creation? Nothing I can do really without stopping postgres or slon, but that would start me back at square one.
Thanks
Tory
i
Check pg_locks in regards to the table in question.
Regards,
Igor Neyman
thanks Igor I did, but not clear what that is telling me, there are 249 rows in there, nothing has a table name , they are all for the PID in the "analyze torque.impressions line that I listed above pid 25765.
Here is one for an exclusive lock, but what should I be looking for? There are no other processes on this box other than slon and this index creation.
transactionid | | | | | | 93588453 | | | | 4/25823460 | 25765 | ExclusiveL
ock | t | f
Thanks
Tory
There are objects OIDs in pg_lock, not names.
Find the OID of the table that you create your index for, and search pg_locks for the records referencing your table.
It cannot be that all records in pg_locks are for pid running “analyze”, there should be records with pid running your “create index”.
What’s the size of the table you are indexing?
Also, take a look at pg_stat_activity for long running transactions/queries.
Igor Neyman
16398 | clsdb | 25765 | 10 | postgres | slon.remoteWorkerThread_1 | 10.13.200.232 | | 45712 | 2015-08-25 21:12:01.619819-07 | 2015-08-25 21:22:08.68766-07 | 2015-08-25 22:16:03.10099-07 | 2015-08-25 22:16:03.100992-07 | f | active | select "_cls".finishTableAfterCopy(143); analyze "torque"."impressions";
16398 | lsdb | 25777 | 10 | postgres | slon.local_cleanup | 10.13.200.232 | | 45718 | 2015-08-25 21:12:01.624032-07 | 2015-08-25 21:23:09.103395-07 | 2015-08-25 21:23:09.103395-07 | 2015-08-25 21:23:09.103397-07 | t | active | begin;lock table "_cls".sl_config_lock;select "_cls".cleanupEvent('10 minutes'::interval);commit;
2015-08-25 22:16:03 PDT CONFIG remoteWorkerThread_1: 67254824703 bytes copied for table "torque"."impressions"
The above is when it had finished copying the table and started on the index..
Well as I said I'm running out of storage as the index is creating some serious data on the filesystem, I'll have to kill it, try to massage the data a bit and increase the maintenance_work mem to use some of my 256GB of ram to try to get through this. Right now the 100% cpu process which is this index is only using 3.5GB and has been for the last 15 hours
Tory
On Wed, Aug 26, 2015 at 1:26 PM, Tory M Blue <tmblue@gmail.com> wrote: > > Right now the 100% cpu process which is this index is only using 3.5GB > and has been for the last 15 hours > If 100% cpu, you can do 'sudo perf top' to see what the CPU is busy about. Regards, Qingqing
On Wed, Aug 26, 2015 at 1:26 PM, Tory M Blue <tmblue@gmail.com> wrote:
>
> Right now the 100% cpu process which is this index is only using 3.5GB
> and has been for the last 15 hours
>
If 100% cpu, you can do 'sudo perf top' to see what the CPU is busy about.
Regards,
Qingqing
Hi, On 08/26/2015 11:53 PM, Tory M Blue wrote: > > > On Wed, Aug 26, 2015 at 2:45 PM, Qingqing Zhou > <zhouqq.postgres@gmail.com <mailto:zhouqq.postgres@gmail.com>> wrote: > > On Wed, Aug 26, 2015 at 1:26 PM, Tory M Blue <tmblue@gmail.com > <mailto:tmblue@gmail.com>> wrote: > > > > Right now the 100% cpu process which is this index is only using > 3.5GB > > and has been for the last 15 hours > > > > If 100% cpu, you can do 'sudo perf top' to see what the CPU is busy > about. > > Regards, > Qingqing > > > > I appreciate the attempted help, but I know what it's doing, it's > creating indexes for the last 14+ hours. Sure, but what exactly was it doing? 'perf top' might give us a hint which function is consuming most of the time, for example. > I've killed it now, as it was > about to run my machine out of disk space, stopped it at 97% full, could > not go any longer. Which suggests it's using a lot of temp files. Indexes are built by reading all the necessary data from the table (just the columns), sorted and then an index is built using the sorted data (because it can be done very efficiently - much faster than when simply inserting the tuples into the btree index). The fact that you ran out of disk space probably means that you don't have enough space for the sort (it clearly does not fit into maintenance_work_mem), and there's no way around that - you need enough disk space. > I will now clean up the table a bit but will still have 500million rows > with 6 indexes on it. I will create the indexes after the data is laid > down vs during, so it doesn't block my other table replications. I will > then fire off my index creations in parallel for my other tables so I > can actually use the hardware the DB is sitting on. That's a very bad idea, because each of the index builds will require disk space for the sort, and you're even more likely to run out of disk space. > > But I guess the answer is, no real way to tell what the box is doing > when it's creating an index. Yes there was a lock, no I could not find a > way to see how it's progressing so there was no way for me to gauge when > it would be done. Had it been waiting on a lock, it wouldn't consume 100% of CPU. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 08/26/2015 10:26 PM, Tory M Blue wrote: > > the table is 90GB without indexes, 285GB with indexes and bloat, The > row count is not actually completing.. 125Million rows over 13 months, > this table is probably close to 600million rows. You don't need to do SELECT COUNT(*) if you only need an approximate number. You can look at pg_class.reltuples: SELECT reltuples FROM pg_class WHERE relname = 'impressions'; That should be a sufficiently accurate estimate. > The above is when it had finished copying the table and started on the > index.. > > Well as I said I'm running out of storage as the index is creating some > serious data on the filesystem, I'll have to kill it, try to massage the > data a bit and increase the maintenance_work mem to use some of my 256GB > of ram to try to get through this. Right now the 100% cpu process which > is this index is only using 3.5GB and has been for the last 15 hours Please post details on the configuration (shared_buffer, work_mem, maintenance_work_mem and such). BTW while the the CREATE INDEX is reporting 3.5GB, it most likely wrote a lot of data into on-disk chunks when sorting the data. So it's actually using the memory through page cache (i.e. don't increase maintenance_work_mem too much, you don't want to force the data to disk needlessly). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 26, 2015 at 3:36 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> But I guess the answer is, no real way to tell what the box is doing >> when it's creating an index. Yes there was a lock, no I could not find a >> way to see how it's progressing so there was no way for me to gauge when >> it would be done. > > > Had it been waiting on a lock, it wouldn't consume 100% of CPU. When things are going out to disk anyway, you're often better off with a lower maintenance_work_mem (or work_mem). It's actually kind of bogus than run size is dictated by these settings. Reducing it will tend to make tuplesort's maintenance of the heap invariant inexpensive, while not really making the merge phase more painful. I would try 128MB of maintenance_work_mem. That could be significantly faster. Check how the I/O load on the system compares with a higher maintenance_work_mem setting. Often, this will make the sort less CPU bound, which is good here. I am currently working on making this a lot better in Postgres 9.6. Also, note that text and numeric sorts will be much faster in 9.5. Of course, as Tomas says, if you don't have the disk space to do the sort, you're not going to be able to complete it. That much is very clear. If you're really worried about these costs, I suggest enabling trace_sort locally, and monitoring the progress of this sort in the logs. -- Regards, Peter Geoghegan