Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date
Msg-id 20221125224611.efs6mtrijedkjhzs@awork3.anarazel.de
Whole thread Raw
In response to Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
Hi,

On 2022-11-22 23:43:29 -0600, Justin Pryzby wrote:
> I think there may be a problem/deficiency with hint bits:
> 
> |postgres=# DROP TABLE u2; CREATE TABLE u2 AS SELECT generate_series(1,999999)a; SELECT pg_stat_reset_shared('io');
explain(analyze,buffers) SELECT * FROM u2;
 
> |...
> | Seq Scan on u2  (cost=0.00..15708.75 rows=1128375 width=4) (actual time=0.111..458.239 rows=999999 loops=1)
> |   Buffers: shared hit=2048 read=2377 dirtied=2377 written=2345
> 
> |postgres=# SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM pg_buffercache b LEFT JOIN pg_class c ON
pg_relation_filenode(c.oid)=b.relfilenodeGROUP BY 2 ORDER BY 1 DESC LIMIT 11;
 
> | count |             relname             | count
> |-------+---------------------------------+-------
> | 13619 |                                 |     0
> |  2080 | u2                              |  2080
> |   104 | pg_attribute                    |     4
> |    71 | pg_statistic                    |     1
> |    51 | pg_class                        |     1
> 
> It says that SELECT caused 2377 buffers to be dirtied, of which 2080 are
> associated with the new table in pg_buffercache.

Note that there's 2048 dirty buffers for u2 in shared_buffers before the
SELECT, despite the relation being 4425 blocks long, due to the CTAS using
BAS_BULKWRITE.


> |postgres=# SELECT * FROM pg_stat_io WHERE backend_type!~'autovac|archiver|logger|standalone|startup|^wal|background
worker'or true ORDER BY 2;
 
> |    backend_type     | io_context  |   io_object   | read | written | extended | op_bytes | evicted | reused |
files_synced|          stats_reset
 
> |...
> | client backend      | bulkread    | relation      | 2377 |    2345 |          |     8192 |       0 |   2345 |
      | 2022-11-22 22:32:33.044552-06
 
> 
> I think it's a known behavior that hint bits do not use the strategy
> ring buffer.  For BAS_BULKREAD, ring_size = 256kB (32, 8kB pages), but
> there's 2080 dirty pages in the buffercache (~16MB).

I don't think there's any "circumvention" of the ringbuffer here. There's 2048
buffers for u2 in s_b before, all dirty, there's 2080 after, also all
dirty. So the ringbuffer restricted the increase in shared buffers used for u2
to 2080-2048=32 additional buffers.

The reason hint bits don't prevent pages from being written out here is that a
BAS_BULKREAD strategy doesn't cause all buffer writes to be rejected, it just
causes buffer writes to be rejected when the page LSN would require a WAL
flush. And that's not typically the case when you just set a hint bit, unless
you use wal_log_hint_bits = true.

If I turn on wal_log_hints=true and add a CHECKPOINT after the CTAS I see 0
reuses (and 4425 dirty buffers), which is what I'd expect.


> But the IO view says that 2345 of the pages were "reused", which seems
> misleading to me.  Maybe that just follows from the behavior and the view is
> fine.  If the view is fine, maybe this case should still be specifically
> mentioned in the docs.

I think that's just confusing due to the reset. 2048 + 2345 = 4393, but we
only have 2080 buffers for u2 in s_b.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Zheng Li
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Peter Geoghegan
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation