Weirdness with the stats collector process - Mailing list pgsql-general

From Matthew Musgrove
Subject Weirdness with the stats collector process
Date
Msg-id D6E71BEFAD7BEB4FBCD8AE74FADB1265011BB40FC749@win-8-eml-ex1.eml.local
Whole thread Raw
Responses Re: Weirdness with the stats collector process  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Weirdness with the stats collector process  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
One of our instances has been behaving -- oddly. Most queries are blazing fast. It appears to just be some of the stat views that are slow.

Queries against the following views are quick: pg_stat_activity, pg_stat_xact_all_tables, pg_stat_xact_sys_tables, pg_stat_xact_user_tables, pg_statio_sys_sequences (no system sequences), pg_stat_xact_user_functions, and pg_stat_replication.

When I query one of the other statistics views, the queries take roughly 10 seconds when they used to take milliseconds. Here's an explain of one such query: https://explain.depesz.com/s/IVEu

Logs show 'using stale statistics instead of current ones because stats collector is not responding' when I query the stats views.

CPU usage is very low. Disk I/O looks okay to me.

-bash-4.1$ iostat
Linux 2.6.32-573.12.1.el6.x86_64 (hostname)         07/25/2016      _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.80    0.00    0.15    0.01    0.00   97.04

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.45         4.85         7.45   33142948   50914838
sdb              31.48      1403.57       713.55 9597023154 4878965216
dm-0              0.14         0.58         0.56    3949720    3847592
dm-1              1.00         4.26         6.83   29155482   46727688

Most of the files in $PGDATA/pg_stat_tmp haven't been updated by PostgreSQL since July 7th at 14:16. The one exception is pgss_query_texts.stat. I shouldn't have but I touched $PGDATA/pg_stat_tmp/db* on July 12th at 15:14 (doing so did trigger an autovacuum though).

I set track_activities and track_counts to off and reloaded config (pg_ctl reload). The log indicated that it noticed the change. I created a directory /dev/shm/pg_stat_tmp, set track_activities and track_counts on, set stats_temp_directory to '/dev/shm/pg_stat_tmp' and reloaded config. Again, the log indicated that it noticed the change but nothing happened with stats. The new directory is empty.

I notified the statistics collector to quit (kill -QUIT). It did and the postmaster restarted it. The new stats directory is still empty. The only updates to the old stats directory is the pgss_query_texts.stat file.

I tried getting stack traces and the like with gdb and strace but it wasn't really helpful.

Most of what I have tried was the result of help by Jim Nasby on postgresteam.slack.com. He and Lukas Fittl suggested that I post about my problem. There's one thing that we all agree on... It should not be so hard to figure out what is going on with the statistics collector.

I tried turning stats off and on again today. Here's what the two pg_stat_tmp directories look like today:
 so now the directory looks like:

-bash-4.1$ ls -alh 9.4/data/pg_stat_tmp/
total 4.3M
drwx------  2 postgres postgres 4.0K Jul  7 14:16 .
drwx------ 19 postgres postgres 4.0K Jul 25 11:20 ..
-rw-------  1 postgres postgres 3.4K Jul 12 15:14 db_0.stat
-rw-------  1 postgres postgres  26K Jul 12 15:14 db_13003.stat
-rw-------  1 postgres postgres  13K Jul 12 15:14 db_1.stat
-rw-------  1 postgres postgres 140K Jul 12 15:14 db_2473933.stat
-rw-------  1 postgres postgres 203K Jul 12 15:14 db_2476698.stat
-rw-------  1 postgres postgres  27K Jul 12 15:14 db_2595334.stat
-rw-------  1 postgres postgres  28K Jul 12 15:14 db_2631762.stat
-rw-------  1 postgres postgres 1.5K Jul  7 14:16 global.stat
-rw-------  1 postgres postgres 3.9M Jul 25 13:39 pgss_query_texts.stat
-bash-4.1$ ls -alh /dev/shm/pg_stat_tmp/
total 0
drwxr-xr-x 2 postgres postgres 40 Jul 12 17:19 .
drwxrwxrwt 3 root     root     80 Jul 11 17:31 ..

Does anyone have any suggestions on how to:
 - see what the statistics collector is doing?
 - tell the postmaster to give the statistics collector the up to date config?
 - get stats working without restarting the instance? (I am not yet sure when that can happen but it would have to be scheduled [in advance] to occur during an after hours maintenance window.

Thanks in advance,
Matt

--
Matthew Musgrove
Senior Software Engineer
Assurant Mortgage Solutions
817.788.4482
mmusgrove@emortgagelogic.com
matthew.musgrove@assurant.com

pgsql-general by date:

Previous
From: Peter Devoy
Date:
Subject: Re: Return results of join with polymorphically-defined table in pl/pgsql
Next
From: Tom Lane
Date:
Subject: Re: Weirdness with the stats collector process