Re: pgstat wait timeout encountered - Mailing list pgsql-admin

From Guillaume Lelarge
Subject Re: pgstat wait timeout encountered
Date
Msg-id CAECtzeUbhXbU9MmtEQkZmr6i8QzLiw6_QJ-Q=hB_BxX8OydoAQ@mail.gmail.com
Whole thread Raw
In response to Re: pgstat wait timeout encountered  (Mark Steben <mark.steben@drivedominion.com>)
List pgsql-admin
2015-03-10 13:27 GMT+01:00 Mark Steben <mark.steben@drivedominion.com>:
explain analyze doesn't reveal much:

explain analyze SELECT * FROM pg_stat_database WHERE datname='prime_production';
                                                QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------
 Seq Scan on pg_database d  (cost=0.00..1.11 rows=1 width=68) (actual time=11.201..11.203 rows=1 loops=1)
   Filter: (datname = 'prime_production'::name)
   Rows Removed by Filter: 4
 Total runtime: 11.320 ms
(4 rows)

explain on another 'offending' query also isn't very helpful

explain analyze select * from pg_stat_bgwriter
;
                                      QUERY PLAN                                     
--------------------------------------------------------------------------------------
 Result  (cost=0.00..0.04 rows=1 width=0) (actual time=10.771..10.771 rows=1 loops=1)
 Total runtime: 10.794 ms


I've performed a pg_stat_reset this morning.  That hasn't been done in awhile. 
We'll see if that makes a difference


That usually doesn't make a difference. "pgstat wait timeout" is the usual message when your disks are way too slow.


On Mon, Mar 9, 2015 at 4:47 PM, John Scalia <jayknowsunix@gmail.com> wrote:
Have you tried doing an "explain" on your query. Returning that many rows sounds like potential corruption.


On 3/9/2015 4:39 PM, Mark Steben wrote:
only 5 rows on this table although huge amount of returned tuples on one db:

email_web_service=# select * from pg_stat_database;
-[ RECORD 1 ]--+------------------------------
datid          | 1
datname        | template1
numbackends    | 0
xact_commit    | 192026
xact_rollback  | 0
blks_read      | 610450
blks_hit       | 7723329
tup_returned   | 33702423
tup_fetched    | 3880133
tup_inserted   | 7620
tup_updated    | 291846
tup_deleted    | 7620
conflicts      | 0
temp_files     | 0
temp_bytes     | 0
deadlocks      | 0
blk_read_time  | 0
blk_write_time | 0
stats_reset    | 2013-12-17 11:30:58.890718-05
-[ RECORD 2 ]--+------------------------------
datid          | 12865
datname        | template0
numbackends    | 0
xact_commit    | 48676
xact_rollback  | 0
blks_read      | 16904
blks_hit       | 1321868
tup_returned   | 15783033
tup_fetched    | 274467
tup_inserted   | 0
tup_updated    | 0
tup_deleted    | 0
conflicts      | 0
temp_files     | 0
temp_bytes     | 0
deadlocks      | 0
blk_read_time  | 0
blk_write_time | 0
stats_reset    | 2014-03-11 06:56:00.139609-04
-[ RECORD 3 ]--+------------------------------
datid          | 12870
datname        | postgres
numbackends    | 0
xact_commit    | 809894
xact_rollback  | 30
blks_read      | 959513
blks_hit       | 59209984
tup_returned   | 144866984
tup_fetched    | 37242299
tup_inserted   | 7726
tup_updated    | 292637
tup_deleted    | 7620
conflicts      | 0
temp_files     | 0
temp_bytes     | 0
deadlocks      | 0
blk_read_time  | 0
blk_write_time | 0
stats_reset    | 2013-10-17 20:13:08.614236-04
-[ RECORD 4 ]--+------------------------------
datid          | 1998756
datname        | prime_production
numbackends    | 198
xact_commit    | 1904931675
xact_rollback  | 8354053
blks_read      | 4085207044
blks_hit       | 1384556112952
tup_returned   | 22953800158363
tup_fetched    | 2347909884098
tup_inserted   | 2977265442
tup_updated    | 252247999

tup_deleted    | 35834382
conflicts      | 0
temp_files     | 25119
temp_bytes     | 1456041951904
deadlocks      | 43
blk_read_time  | 0
blk_write_time | 0
stats_reset    | 2015-01-27 11:54:14.737594-05
-[ RECORD 5 ]--+------------------------------
datid          | 3297782
datname        | email_web_service
numbackends    | 3
xact_commit    | 46
xact_rollback  | 0
blks_read      | 4
blks_hit       | 2764
tup_returned   | 566
tup_fetched    | 336
tup_inserted   | 8
tup_updated    | 24
tup_deleted    | 8
conflicts      | 0
temp_files     | 0
temp_bytes     | 0
deadlocks      | 0
blk_read_time  | 0
blk_write_time | 0
stats_reset    | 2015-03-09 16:36:12.727615-04


On Mon, Mar 9, 2015 at 4:21 PM, John Scalia <jayknowsunix@gmail.com> wrote:
Not to sound stupid, but two possible issues:

1) this is a really big table and
2) you have no index on the field "datname"

Possible?
Jay


On 3/9/2015 3:50 PM, Mark Steben wrote:
Good afternoon,

We have been sporadically receiving a warning:  pgstat wait timeout
in our production environment.  This has been always immediately followed by
a successful execution of the query.

EX:
 
Mar  6 00:31:29 ardbc01 postgres[42903]: [3-1] mavmail@10.93.156.42:prime_production:0WARNING:  pgstat wait timeout
Mar  6 00:31:29 ardbc01 postgres[42903]: [3-2] mavmail@10.93.156.42:prime_production:0STATEMENT:  SELECT * FROM pg_stat_database WHERE datname='prime_production';
Mar  6 00:31:29 ardbc01 postgres[42903]: [4-1] mavmail@10.93.156.42:prime_production:0LOG:  duration: 10137.143 ms  statement: SELECT * FROM pg_stat_database WHERE datname='prime_production';


and

mavmail@10.93.156.42:prime_production:0WARNING:  pgstat wait timeout
Mar  6 00:31:46 ardbc01 postgres[42903]: [6-2] mavmail@10.93.156.42:prime_production:0STATEMENT:  SELECT SUM(idx_blks_hit) AS hits, SUM(idx_blks_read) AS reads FROM pg_statio_user_indexes
Mar  6 00:31:46 ardbc01 postgres[42903]: [7-1] mavmail@10.93.156.42:prime_production:0LOG:  duration: 10144.601 ms  statement: SELECT SUM(idx_blks_hit) AS hits, SUM(idx_blks_read) AS reads FROM pg_statio_user_indexes


This is not confined to queries of pg stat infrastructure tables. In another database we can receive the same warning on a vacuum:

mavmail@10.93.156.45:mavmail WARNING:  pgstat wait timeout
Mar  6 01:16:44 ardbc01 postgres[66365]: [12-2] mavmail@10.93.156.45:mavmail STATEMENT:  VACUUM VERBOSE ANALYZE update_queues;
Mar  6 01:16:44 ardbc01 postgres[66365]: [13-1] mavmail@10.93.156.45:mavmail INFO:  vacuuming "public.update_queues"
Mar  6 01:16:44 ardbc01 postgres[66365]: [13-2] mavmail@10.93.156.45:mavmail STATEMENT:  VACUUM VERBOSE ANALYZE update_queues;
Mar  6 01:16:44 ardbc01 postgres[66365]: [14-1] mavmail@10.93.156.45:mavmail INFO:  index "uq_qid" now contains 0 row versions in 7 pages
Mar  6 01:16:44 ardbc01 postgres[66365]: [14-2] mavmail@10.93.156.45:mavmail DETAIL:  0 index row versions were removed.
Mar  6 01:16:44 ardbc01 postgres[66365]: [14-3] #0114 index pages have been deleted, 4 are currently reusable.
Mar  6 01:16:44 ardbc01 postgres[66365]: [14-4] #011CPU 0.00s/0.00u sec elapsed 0.00 sec.
Mar  6 01:16:44 ardbc01 postgres[66365]: [14-5] mavmail@10.93.156.45:mavmail STATEMENT:  VACUUM VERBOSE ANALYZE update_queues;
Mar  6 01:16:44 ardbc01 postgres[66365]: [15-1] mavmail@10.93.156.45:mavmail INFO:  "update_queues": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
Mar  6 01:16:44 ardbc01 postgres[66365]: [15-2] mavmail@10.93.156.45:mavmail DETAIL:  0 dead row versions cannot be removed yet.
Mar  6 01:16:44 ardbc01 postgres[66365]: [15-3] #011There were 0 unused item pointers.
Mar  6 01:16:44 ardbc01 postgres[66365]: [15-4] #0110 pages are entirely empty.
Mar  6 01:16:44 ardbc01 postgres[66365]: [15-5] #011CPU 0.00s/0.00u sec elapsed 0.00 sec.
Mar  6 01:16:44 ardbc01 postgres[66365]: [15-6] mavmail@10.93.156.45:mavmail STATEMENT:  VACUUM VERBOSE ANALYZE update_queues;

I did check older posts on pgadmin - one insight involved the possible inability of autovacuum to grab the latest stats as the stats collector might have gotten stuck in a loop and could not update the stats file.  That was from an older PSQL version and we don't employ autovacuum in these databases.

Any insights appreciated:
  postgres version: 9.2.5

  platform:
  Linux ardbc01 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Thank you,


Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com









--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com









--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com








--

pgsql-admin by date:

Previous
From: John Scalia
Date:
Subject: Re: pgstat wait timeout encountered
Next
From: "Rossi, Maria"
Date:
Subject: Relocated Postgres