Re: Tracking last scan time - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Tracking last scan time
Date
Msg-id 20220831162108.qyarvx7edqnobiye@awork3.anarazel.de
Whole thread Raw
In response to Tracking last scan time  (Dave Page <dpage@pgadmin.org>)
Responses Re: Tracking last scan time
List pgsql-hackers
Hi,

On 2022-08-23 10:55:09 +0100, Dave Page wrote:
> Often it is beneficial to review one's schema with a view to removing
> indexes (and sometimes tables) that are no longer required. It's very
> difficult to understand when that is the case by looking at the number of
> scans of a relation as, for example, an index may be used infrequently but
> may be critical in those times when it is used.
> 
> The attached patch against HEAD adds optional tracking of the last scan
> time for relations. It updates pg_stat_*_tables with new last_seq_scan and
> last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to
> help with this.
> 
> Due to the use of gettimeofday(), those values are only maintained if a new
> GUC, track_scans, is set to on. By default, it is off.
> 
> I did run a 12 hour test to see what the performance impact is. pgbench was
> run with scale factor 10000 and 75 users across 4 identical bare metal
> machines running Rocky 8 in parallel which showed roughly a -2% average
> performance penalty against HEAD with track_scans enabled. Machines were
> PowerEdge R7525's with 128GB RAM, dual 16C/32T AMD 7302 CPUs, with the data
> directory on 6 x 800GB 12Gb/s SSD SAS drives in RAID 0. Kernel time source
> is tsc.
> 
>                HEAD               track_scans      Penalty (%)
> box1       19582.49735    19341.8881      -1.22869541
> box2       19936.55513    19928.07479    -0.04253664659
> box3       19631.78895    18649.64379    -5.002830696
> box4       19810.86767    19420.67192    -1.969604525
> Average 19740.42728    19335.06965    -2.05343896

Based on the size of those numbers this was a r/w pgbench. If it has this
noticable an impact for r/w, with a pretty low number of scans/sec, how's the
overhead for r/o (which can have 2 orders of magnitude more scans/sec)? It
must be quite bad.

I don't think we should accept this feature with this overhead - but I also
think we can do better, by accepting a bit less accuracy. For this to be
useful we don't need a perfectly accurate timestamp. The statement start time
is probably not accurate enough, but we could just have bgwriter or such
update one in shared memory every time we wake up? Or perhaps we could go to
an even lower granularity, by putting in the current LSN or such?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Tracking last scan time
Next
From: Robert Haas
Date:
Subject: Re: SQL/JSON features for v15