Re: Tracking last scan time - Mailing list pgsql-hackers

From Dave Page
Subject Re: Tracking last scan time
Date
Msg-id CA+OCxoxPSQukV7kuRuJ3nyrXX44x+bdd3n2HaT8LzCSzFyV3JQ@mail.gmail.com
Whole thread Raw
In response to Re: Tracking last scan time  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Hi

On Tue, 23 Aug 2022 at 13:07, Greg Stark <stark@mit.edu> wrote:
On Tue, 23 Aug 2022 at 11:00, Dave Page <dpage@pgadmin.org> wrote:
>
> Often it is beneficial to review one's schema with a view to removing indexes (and sometimes tables) that are no longer required. It's very difficult to understand when that is the case by looking at the number of scans of a relation as, for example, an index may be used infrequently but may be critical in those times when it is used.

I think this is easy to answer in a prometheus/datadog/etc world since
you can consult the history of the count to see when it was last
incremented. (Or do effectively that continously).

Yes. But not every PostgreSQL instance is monitored in that way.
 

I guess that just reinforces the idea that it should be optional.
Perhaps there's room for some sort of general feature for controlling
various time series aggregates like max() and min() sum() or, uhm,
timeoflastchange() on whatever stats you want. That would let us
remove a bunch of stuff from pg_stat_statements and let users turn on
just the ones they want. And also let users enable things like time of
last rollback or conflict etc. But that's just something to think
about down the road.

It's certainly an interesting idea.
 

> The attached patch against HEAD adds optional tracking of the last scan time for relations. It updates pg_stat_*_tables with new last_seq_scan and last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to help with this.
>
> Due to the use of gettimeofday(), those values are only maintained if a new GUC, track_scans, is set to on. By default, it is off.

Bikeshedding warning -- "track_scans" could equally apply to almost
any stats about scans. I think the really relevant thing here is the
times, not the scans. I think the GUC should be "track_scan_times". Or
could that still be confused with scan durations? Maybe
"track_scan_timestamps"?

The latter seems reasonable.
 

You could maybe make the gettimeofday cheaper by doing it less often.
Like, skipping the increment if the old timestamp is newer than 1s
before the transaction start time (I think that's available free if
some other guc is enabled but I don't recall). Or isn't this cb
normally happening after transaction end? So xactStopTimestamp might
be available already?

Something like:

 if (pgstat_track_scan_timestamps && lstats->t_counts.t_numscans &&
        tabentry->lastscan + USECS_PER_SEC < GetCurrentTransactionStopTimestamp())
    tabentry->lastscan = GetCurrentTimestamp();
 
?

--

pgsql-hackers by date:

Previous
From: Natarajan R
Date:
Subject: Re: Logical replication support for generic wal record
Next
From: Tom Lane
Date:
Subject: Re: Strip -mmacosx-version-min options from plperl build