Thread: Postgres 9.1.4 - high stats collector IO usage

Postgres 9.1.4 - high stats collector IO usage

From

David Barton

Date:

28 July 2012, 04:33:54

Hi,

I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is generating very high IO usage even when nothing appears to be happening on the system.

I have roughly 150 different databases, each of which is running in 1 of roughly 30 tablespaces. The databases are small (the dump of most is are under 100M, and all but 3 are under 1G, nothing larger than 2G).

Previously iotop reported the disk write speed, at ~6MB / second. I went and reset the stats for every database and that shrunk the stats file and brought the IO it down to 1MB / second. I still think this is too high for an idle database. I've now noticed it is growing.

ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
-rw------- 1 postgres postgres 3515080 Jul 28 11:58 /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat

<reset of stats>

ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
-rw------- 1 postgres postgres 514761 Jul 28 12:11 /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat

<watch the file grow>

ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
-rw------- 1 postgres postgres 776711 Jul 28 12:25 /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat

In the 15 minutes since the reset, IO has nearly doubled to 1.6+ MB / second.

FWIW, I just migrated all these databases over to this new server by restoring from pg_dump I was previously experiencing this on 8.3, which was why I upgraded to 9.1 and I also have another server with similar problems on 9.1.

Any help would be sincerely appreciated.

David Barton dave@oneit.com.au

Re: Postgres 9.1.4 - high stats collector IO usage

From

Pavel Stehule

Date:

28 July 2012, 07:08:49

Hello

I had same problem with large numbers of tables - you can move
pg_stat_tmp to tmpfs filesystem - it was solution for us

Regards

Pavel

2012/7/28 David Barton <dave@oneit.com.au>:
> Hi,
>
> I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is
> generating very high IO usage even when nothing appears to be happening on
> the system.
>
> I have roughly 150 different databases, each of which is running in 1 of
> roughly 30 tablespaces.  The databases are small (the dump of most is are
> under 100M, and all but 3 are under 1G, nothing larger than 2G).
>
> Previously iotop reported the disk write speed, at ~6MB / second.  I went
> and reset the stats for every database and that shrunk the stats file and
> brought the IO it down to 1MB / second.  I still think this is too high for
> an idle database.  I've now noticed it is growing.
>
> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
> -rw------- 1 postgres postgres 3515080 Jul 28 11:58
> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>
> <reset of stats>
>
> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
> -rw------- 1 postgres postgres 514761 Jul 28 12:11
> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>
> <watch the file grow>
>
> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
> -rw------- 1 postgres postgres 776711 Jul 28 12:25
> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>
> In the 15 minutes since the reset, IO has nearly doubled to 1.6+ MB /
> second.
>
> FWIW, I just migrated all these databases over to this new server by
> restoring from pg_dump  I was previously experiencing this on 8.3, which was
> why I upgraded to 9.1 and I also have another server with similar problems
> on 9.1.
>
> Any help would be sincerely appreciated.
>
>
> David Barton dave@oneit.com.au

Re: Postgres 9.1.4 - high stats collector IO usage

From

Tom Lane

Date:

28 July 2012, 16:14:15

David Barton <dave@oneit.com.au> writes:
> I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is
> generating very high IO usage even when nothing appears to be happening
> on the system.

> I have roughly 150 different databases, each of which is running in 1 of
> roughly 30 tablespaces.  The databases are small (the dump of most is
> are under 100M, and all but 3 are under 1G, nothing larger than 2G).

That's a lot of databases.  I think your problem probably stems from
autovacuum madly trying to cover all of them.  Backing off (increasing)
autovacuum_naptime to slow its cycle might help.

            regards, tom lane

Re: Postgres 9.1.4 - high stats collector IO usage

From

David Barton

Date:

28 July 2012, 17:08:39

Thanks so much, Tom.

That did the job.  I increased it to every 15 minutes and it has dropped
substantially even though the pgstat.stat file is over 1 MB again.

It is unfortunate that the IO utilisation of this seems to be O(n^2) as
that is a big impediment to shared hosting.  Is there any type of bounty
system where people can contribute to developing features?

Regards, David

On 29/07/12 00:13, Tom Lane wrote:
> David Barton <dave@oneit.com.au> writes:
>> I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is
>> generating very high IO usage even when nothing appears to be happening
>> on the system.
>> I have roughly 150 different databases, each of which is running in 1 of
>> roughly 30 tablespaces.  The databases are small (the dump of most is
>> are under 100M, and all but 3 are under 1G, nothing larger than 2G).
> That's a lot of databases.  I think your problem probably stems from
> autovacuum madly trying to cover all of them.  Backing off (increasing)
> autovacuum_naptime to slow its cycle might help.
>
>             regards, tom lane
>

Re: Postgres 9.1.4 - high stats collector IO usage

From

David Barton

Date:

28 July 2012, 17:12:22

Thanks, Pavel.

I was a bit worried about trying this because of a comment elsewhere that the file was supposed to be permanent. Tom's solution of increasing the vacuum delay has solved it for now.

Regards, David

On 28/07/12 15:07, Pavel Stehule wrote:

Hello

I had same problem with large numbers of tables - you can move
pg_stat_tmp to tmpfs filesystem - it was solution for us

Regards

Pavel

2012/7/28 David Barton <dave@oneit.com.au>:

Hi,

<snip>

David Barton - Managing Director
1iT Pty Ltd "The Power of One"

Tel: (08) 9382 2296
Direct: (08) 9200 4269
Mob: 0404 863 671
Fax: (08) 6210 1354
Web: www.1it.com.au

First Floor
41 Oxford Close
West Leederville, 6007

Attachment

Logo.gif

Re: Postgres 9.1.4 - high stats collector IO usage

From

Mark Kirkwood

Date:

31 July 2012, 01:21:41

On 29/07/12 05:08, David Barton wrote:
> Thanks so much, Tom.
>
> That did the job.  I increased it to every 15 minutes and it has
> dropped substantially even though the pgstat.stat file is over 1 MB
> again.
>
>

I'd be a little concerned that having autovacuum tuned down to run so
infrequently might result in massive table bloat - which could be even
worse than than the original stats file problem.  Keep an eye out for
any table(s) that used to be small but become unexpectedly larger over
the next few days, and if you see any then decrease the naptime again.

Regards

Mark

Re: Postgres 9.1.4 - high stats collector IO usage

From

Magnus Hagander

Date:

06 August 2012, 14:10:43

That's not a good way of doing it, since you loose persistent storage.

Instead, you should set the stats_temp_dir paramter to a filesystem
somewhere else that is tmpfs. Then PostgreSQL will automatically move
the file to and from the main data directory on startup and shutdown,
so you get both the performance of tmpfs and the persistent
statistics.

//Magnus

On Sat, Jul 28, 2012 at 9:07 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> I had same problem with large numbers of tables - you can move
> pg_stat_tmp to tmpfs filesystem - it was solution for us
>
> Regards
>
> Pavel
>
> 2012/7/28 David Barton <dave@oneit.com.au>:
>> Hi,
>>
>> I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is
>> generating very high IO usage even when nothing appears to be happening on
>> the system.
>>
>> I have roughly 150 different databases, each of which is running in 1 of
>> roughly 30 tablespaces.  The databases are small (the dump of most is are
>> under 100M, and all but 3 are under 1G, nothing larger than 2G).
>>
>> Previously iotop reported the disk write speed, at ~6MB / second.  I went
>> and reset the stats for every database and that shrunk the stats file and
>> brought the IO it down to 1MB / second.  I still think this is too high for
>> an idle database.  I've now noticed it is growing.
>>
>> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>> -rw------- 1 postgres postgres 3515080 Jul 28 11:58
>> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>
>> <reset of stats>
>>
>> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>> -rw------- 1 postgres postgres 514761 Jul 28 12:11
>> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>
>> <watch the file grow>
>>
>> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>> -rw------- 1 postgres postgres 776711 Jul 28 12:25
>> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>
>> In the 15 minutes since the reset, IO has nearly doubled to 1.6+ MB /
>> second.
>>
>> FWIW, I just migrated all these databases over to this new server by
>> restoring from pg_dump  I was previously experiencing this on 8.3, which was
>> why I upgraded to 9.1 and I also have another server with similar problems
>> on 9.1.
>>
>> Any help would be sincerely appreciated.
>>
>>
>> David Barton dave@oneit.com.au
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Postgres 9.1.4 - high stats collector IO usage

From

Pavel Stehule

Date:

06 August 2012, 14:17:13

2012/8/6 Magnus Hagander <magnus@hagander.net>:
> That's not a good way of doing it, since you loose persistent storage.
>
> Instead, you should set the stats_temp_dir paramter to a filesystem
> somewhere else that is tmpfs. Then PostgreSQL will automatically move
> the file to and from the main data directory on startup and shutdown,
> so you get both the performance of tmpfs and the persistent
> statistics.

we had to do it because our read/write of  stat file created really
high IO - and it was problem on Amazon :( - probably we had not this
issue elsewhere

Regards

Pavel



>
> //Magnus
>
> On Sat, Jul 28, 2012 at 9:07 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hello
>>
>> I had same problem with large numbers of tables - you can move
>> pg_stat_tmp to tmpfs filesystem - it was solution for us
>>
>> Regards
>>
>> Pavel
>>
>> 2012/7/28 David Barton <dave@oneit.com.au>:
>>> Hi,
>>>
>>> I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is
>>> generating very high IO usage even when nothing appears to be happening on
>>> the system.
>>>
>>> I have roughly 150 different databases, each of which is running in 1 of
>>> roughly 30 tablespaces.  The databases are small (the dump of most is are
>>> under 100M, and all but 3 are under 1G, nothing larger than 2G).
>>>
>>> Previously iotop reported the disk write speed, at ~6MB / second.  I went
>>> and reset the stats for every database and that shrunk the stats file and
>>> brought the IO it down to 1MB / second.  I still think this is too high for
>>> an idle database.  I've now noticed it is growing.
>>>
>>> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>> -rw------- 1 postgres postgres 3515080 Jul 28 11:58
>>> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>>
>>> <reset of stats>
>>>
>>> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>> -rw------- 1 postgres postgres 514761 Jul 28 12:11
>>> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>>
>>> <watch the file grow>
>>>
>>> ls -l /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>> -rw------- 1 postgres postgres 776711 Jul 28 12:25
>>> /var/lib/postgresql/9.1/main/pg_stat_tmp/pgstat.stat
>>>
>>> In the 15 minutes since the reset, IO has nearly doubled to 1.6+ MB /
>>> second.
>>>
>>> FWIW, I just migrated all these databases over to this new server by
>>> restoring from pg_dump  I was previously experiencing this on 8.3, which was
>>> why I upgraded to 9.1 and I also have another server with similar problems
>>> on 9.1.
>>>
>>> Any help would be sincerely appreciated.
>>>
>>>
>>> David Barton dave@oneit.com.au
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/

Re: Postgres 9.1.4 - high stats collector IO usage

From

Magnus Hagander

Date:

06 August 2012, 14:19:05

On Mon, Aug 6, 2012 at 4:16 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2012/8/6 Magnus Hagander <magnus@hagander.net>:
>> That's not a good way of doing it, since you loose persistent storage.
>>
>> Instead, you should set the stats_temp_dir paramter to a filesystem
>> somewhere else that is tmpfs. Then PostgreSQL will automatically move
>> the file to and from the main data directory on startup and shutdown,
>> so you get both the performance of tmpfs and the persistent
>> statistics.
>
> we had to do it because our read/write of  stat file created really
> high IO - and it was problem on Amazon :( - probably we had not this
> issue elsewhere

Uh. You realize that if you set stats_temp_dir, it only ever writes to
the persistent storage once, when you do "pg_ctl stop" (or
equivalent). Are you saying the shutdown took too long?

I've had to change that param many times on Amazon, but I've never had
a problem with the shutdown writes. (And I've seen it often enough on
dedicated hardware as well..)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Postgres 9.1.4 - high stats collector IO usage

From

Pavel Stehule

Date:

06 August 2012, 14:44:40

2012/8/6 Magnus Hagander <magnus@hagander.net>:
> On Mon, Aug 6, 2012 at 4:16 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> 2012/8/6 Magnus Hagander <magnus@hagander.net>:
>>> That's not a good way of doing it, since you loose persistent storage.
>>>
>>> Instead, you should set the stats_temp_dir paramter to a filesystem
>>> somewhere else that is tmpfs. Then PostgreSQL will automatically move
>>> the file to and from the main data directory on startup and shutdown,
>>> so you get both the performance of tmpfs and the persistent
>>> statistics.
>>
>> we had to do it because our read/write of  stat file created really
>> high IO - and it was problem on Amazon :( - probably we had not this
>> issue elsewhere
>
> Uh. You realize that if you set stats_temp_dir, it only ever writes to
> the persistent storage once, when you do "pg_ctl stop" (or
> equivalent). Are you saying the shutdown took too long?
>
> I've had to change that param many times on Amazon, but I've never had
> a problem with the shutdown writes. (And I've seen it often enough on
> dedicated hardware as well..)
aha, this is my mistake - we use a stats_temp_directory GUC already -
my advice was use a tmpfs for statfile, but I though realize it via
stats_temp_directory, but I didn't say it.

Regards

Pavel

>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/

Re: Postgres 9.1.4 - high stats collector IO usage

From

Jeff Janes

Date:

13 August 2012, 00:23:14

On Fri, Jul 27, 2012 at 9:33 PM, David Barton <dave@oneit.com.au> wrote:
> Hi,
>
> I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is
> generating very high IO usage even when nothing appears to be happening on
> the system.
>
> I have roughly 150 different databases, each of which is running in 1 of
> roughly 30 tablespaces.  The databases are small (the dump of most is are
> under 100M, and all but 3 are under 1G, nothing larger than 2G).

It isn't the size of the data that matters, but the number of objects.
 It sounds like your databases have about 150 statistics-containing
objects each, in order to come up with a 3.5MB stats file.

What do you gain by using databases rather than schema to do the segregation?

> Previously iotop reported the disk write speed, at ~6MB / second.

So that corresponds to about 2 physical write-outs of the stats file
per second.  Are you using ext4?  It has the peculiar (to me) property
that when a file is renamed out of existence, it writes out all of
that file's obsolete dirty buffers, rather than just dropping them as
uninteresting to anyone.  That generates about 10 times the physical
IO as the ext3 file system does.  And of course about infinite times
the physical IO as a tmpfs.

> FWIW, I just migrated all these databases over to this new server by
> restoring from pg_dump  I was previously experiencing this on 8.3, which was
> why I upgraded to 9.1 and I also have another server with similar problems
> on 9.1.
>
> Any help would be sincerely appreciated.

I think the first line of defense would be using /dev/shm to hold the
stats file.  I don't see any downside to that.  You are reading and
writing that file so ferociously anyway that it is always going to be
taking up RAM, no matter where you put it.  Indeed, under ext4 you
might use even have several copies of it all locked into RAM as they
wait to reach the disk before being dropped.

Increasing the naptime, as you have already done, will also decrease
the physical IO, but that has the trade-off of risking bloat.  (But
since you are running 150 databases on one machine, I doubt any of
them are active enough for the risk of bloat to be all that great).
However using /dev/shm should eliminate the IO entirely with no
trade-off at all.

But with /dev/shm the CPU usage of repeatedly formatting, writing,
reading, and parsing the stat file will still be considerable, while
increasing the naptime will reduce that as well.

As far as coding changes to overcome the fundamental problem:

A relatively easy change would be to make any given autovacuum worker
on start up tolerate a stats file that is out of date by up to, say,
naptime/5.  That would greatly reduce the amount of writing the stats
collector needs to do (assuming that few tables actually need
vacuuming during any given cycle), but wouldn't change the amount of
reading a worker needs to do because it still needs to read the file
each time as it doesn't inherit the stats from anyone.  I don't think
it would be a problem that a table which becomes eligible for
vacuuming in the last 20% of a cycle would have to wait for one more
round.  Especially as this change might motivate one to reduce the
naptime since doing so will be cheaper.

But it seems like maybe the stats collector could use a ground-up
recoding.  Maybe it could use a shared relation to store the stats
within the database cluster itself, so that edits could be done in
place per database rather than re-writing the entire cluster's stats?
 But I certainly am not volunteering to take on that task.

A compromise might be to have one stats file per database.  That way
any given backend only needs to read in the database file it cares
about, and the stat's collector only needs to write out the one
database asked of it.  This change could be mostly localized to just
pgstat.c, I think.

Cheers,

Jeff

Re: Postgres 9.1.4 - high stats collector IO usage

From

David Barton

Date:

13 August 2012, 05:17:43

Hi Jeff,

Thanks for the detailed reply.

On 13/08/12 05:23, Jeff Janes wrote:
> On Fri, Jul 27, 2012 at 9:33 PM, David Barton <dave@oneit.com.au> wrote:
>> Hi,
>>
>> <snip>
>> I have roughly 150 different databases, each of which is running in 1 of
>> roughly 30 tablespaces.  The databases are small (the dump of most is are
>> under 100M, and all but 3 are under 1G, nothing larger than 2G).
> It isn't the size of the data that matters, but the number of objects.
>   It sounds like your databases have about 150 statistics-containing
> objects each, in order to come up with a 3.5MB stats file.
>
> What do you gain by using databases rather than schema to do the segregation?
I had never imagined that there was such a profound difference between
using schemas and using databases.  I imagine that I could convert from
using databases to using schemas.
>
>> Previously iotop reported the disk write speed, at ~6MB / second.
> So that corresponds to about 2 physical write-outs of the stats file
> per second.  Are you using ext4?  It has the peculiar (to me) property
> that when a file is renamed out of existence, it writes out all of
> that file's obsolete dirty buffers, rather than just dropping them as
> uninteresting to anyone.  That generates about 10 times the physical
> IO as the ext3 file system does.  And of course about infinite times
> the physical IO as a tmpfs.
It was previously on ext3 and moved to ext4.  That didn't seem to make a
difference, I'm guessing that the higher IO on the new server was just
that it was capable of doing it.
>
>> FWIW, I just migrated all these databases over to this new server by
>> restoring from pg_dump  I was previously experiencing this on 8.3, which was
>> why I upgraded to 9.1 and I also have another server with similar problems
>> on 9.1.
>>
>> Any help would be sincerely appreciated.
> I think the first line of defense would be using /dev/shm to hold the
> stats file.  I don't see any downside to that.  You are reading and
> writing that file so ferociously anyway that it is always going to be
> taking up RAM, no matter where you put it.  Indeed, under ext4 you
> might use even have several copies of it all locked into RAM as they
> wait to reach the disk before being dropped.
>
> Increasing the naptime, as you have already done, will also decrease
> the physical IO, but that has the trade-off of risking bloat.  (But
> since you are running 150 databases on one machine, I doubt any of
> them are active enough for the risk of bloat to be all that great).
> However using /dev/shm should eliminate the IO entirely with no
> trade-off at all.
>
> But with /dev/shm the CPU usage of repeatedly formatting, writing,
> reading, and parsing the stat file will still be considerable, while
> increasing the naptime will reduce that as well.
The CPU overhead seems pretty minimal, and a slight reduction in naptime
should be more than enough.
>
> As far as coding changes to overcome the fundamental problem:
>
> A relatively easy change would be to make any given autovacuum worker
> on start up tolerate a stats file that is out of date by up to, say,
> naptime/5.  That would greatly reduce the amount of writing the stats
> collector needs to do (assuming that few tables actually need
> vacuuming during any given cycle), but wouldn't change the amount of
> reading a worker needs to do because it still needs to read the file
> each time as it doesn't inherit the stats from anyone.  I don't think
> it would be a problem that a table which becomes eligible for
> vacuuming in the last 20% of a cycle would have to wait for one more
> round.  Especially as this change might motivate one to reduce the
> naptime since doing so will be cheaper.
If the stats are mirrored in memory, then that makes sense.  Of course,
if that's the case then couldn't we just alter the stats to flush at
maximum once per N seconds / minutes?  If the stats are not mirrored in
memory, doesn't that imply that most of the databases will never flush
updates stats to disk and so the file will become stale?
>
> But it seems like maybe the stats collector could use a ground-up
> recoding.  Maybe it could use a shared relation to store the stats
> within the database cluster itself, so that edits could be done in
> place per database rather than re-writing the entire cluster's stats?
>   But I certainly am not volunteering to take on that task.
>
> A compromise might be to have one stats file per database.  That way
> any given backend only needs to read in the database file it cares
> about, and the stat's collector only needs to write out the one
> database asked of it.  This change could be mostly localized to just
> pgstat.c, I think.
That approach was what I had thought  I am not a C programmer by any
stretch of the imagination, which is why I asked if there was a place to
find this kind of thing.  It seems likely there are a few features that
people would be willing to put money towards.

>
> Cheers,
>
> Jeff
>
>
Regards,
David

Re: Postgres 9.1.4 - high stats collector IO usage

From

Jeff Janes

Date:

13 August 2012, 22:40:34

On Sun, Aug 12, 2012 at 7:17 PM, David Barton <dave@oneit.com.au> wrote:
>>
>> A relatively easy change would be to make any given autovacuum worker
>> on start up tolerate a stats file that is out of date by up to, say,
>> naptime/5.  That would greatly reduce the amount of writing the stats
>> collector needs to do (assuming that few tables actually need
>> vacuuming during any given cycle), but wouldn't change the amount of
>> reading a worker needs to do because it still needs to read the file
>> each time as it doesn't inherit the stats from anyone.  I don't think
>> it would be a problem that a table which becomes eligible for
>> vacuuming in the last 20% of a cycle would have to wait for one more
>> round.  Especially as this change might motivate one to reduce the
>> naptime since doing so will be cheaper.
>
> If the stats are mirrored in memory, then that makes sense.  Of course, if
> that's the case then couldn't we just alter the stats to flush at maximum
> once per N seconds / minutes?

The actual flushing is not under our control, but under the kernel's
control.  But in essence that is what I am suggesting.  If the vacuum
workers ask the stats collector for fresh stats less often, the stats
collector will write them to the kernel less often.  If we are writing
them to the kernel less often, the kernel will flush them less often.
The kernel could choose to flush them less often anyway, but for some
reason with ext4 it doesn't.

> If the stats are not mirrored in memory,
> doesn't that imply that most of the databases will never flush updates stats
> to disk and so the file will become stale?

We don't do anything specific to cause them to be mirrored, it is just
that that is the way the kernel deals with frequently accessed
file-system data.  If the kernel decided to evict the data from
memory, it would have to make sure it reached disk first.  It is the
kernel's job to present a consistent image of all the file-system
data, regardless of whether it is actually in memory, or on disk, or
both.  If the data on disk is stale, the kernel guarantees requests to
read it will be served from memory rather than from disk.

Cheers,

Jeff