Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system |
Date | |
Msg-id | 5123F1D3.30303@fuzzy.cz Whole thread Raw |
In response to | Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: PATCH: Split stats file per database WAS: autovacuum
stress-testing our system
|
List | pgsql-hackers |
On 19.2.2013 11:27, Tom Lane wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> Dne 19.02.2013 05:46, Alvaro Herrera napsal: >>> Mastodon failed: >>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&dt=2013-02-19%2000%3A00%3A01 >>> >>> probably worth investigating a bit; we might have broken something. > >> Hmmm, interesting. A single Windows machine, while the other Windows >> machines seem to work fine (although some of them were not built for a >> few weeks). > > Could be random chance --- we've seen the same failure before, eg > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&dt=2012-11-25%2006%3A00%3A00 > > regards, tom lane I'm looking at that test, and I'm not really sure about a few details. First, this function seems pretty useless to me: ======================================================================= create function wait_for_stats() returns void as $$ declare start_time timestamptz := clock_timestamp(); updated bool;begin -- we don't want to wait forever; loop will exit after 30 seconds for i in 1 .. 300 loop -- check to see if indexscan has been sensed SELECT (st.idx_scan >= pr.idx_scan + 1) INTO updated FROM pg_stat_user_tablesAS st, pg_class AS cl, prevstats AS pr WHERE st.relname='tenk2' AND cl.relname='tenk2'; exit when updated; -- wait a little perform pg_sleep(0.1); -- reset stats snapshot so we can test again perform pg_stat_clear_snapshot(); end loop; -- report time waited in postmaster log (where it won't change test output) raise log 'wait_for_stats delayed % seconds', extract(epoch from clock_timestamp() - start_time); end $$ languageplpgsql; ======================================================================= AFAIK the stats remain the same within a transaction, and as a function runs within a transaction, it will either get new data on the first iteration, or it will run all 300 of them. I've checked several buildfarm members and I'm yet to see a single duration between 12ms and 30sec. So IMHO we can replace the function call with pg_sleep(30) and we'll get about the same effect. But this obviously does not answer the question why it failed, although on both occasions there's this log message: [50b1b7fa.0568:14] LOG: wait_for_stats delayed 34.75 seconds which essentialy means the stats were not updated before the call to wait_for_stats(). Anyway, there are these two failing queries: ======================================================================= -- check effects SELECT st.seq_scan >= pr.seq_scan + 1, st.seq_tup_read >= pr.seq_tup_read + cl.reltuples, st.idx_scan >= pr.idx_scan+ 1, st.idx_tup_fetch >= pr.idx_tup_fetch + 1 FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats ASprWHERE st.relname='tenk2' AND cl.relname='tenk2';?column? | ?column? | ?column? | ?column? ----------+----------+----------+----------t | t | t | t (1 row) SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages, st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks+ 1 FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS prWHERE st.relname='tenk2' AND cl.relname='tenk2';?column?| ?column? ----------+----------t | t (1 row) ======================================================================= The first one returns just falses, the second one retuns either (t,f) or (f,f) - for the two failures posted by Alvaro and TL earlier today. I'm really wondering how that could happen. The only thing that I can think of is some strange timing issue, causing lost requests to write the stats or maybe some of the stats updates. Hmmm, IIRC the stats are sent over UDP - couldn't that be related? Tomas
pgsql-hackers by date: