Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Date
Msg-id CAMkU=1xzNhxtCQ98u7unPu3YmD2Ku7edBvujg3ACvOP4+tSuEA@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 5.2.2013 19:23, Jeff Janes wrote:
>
>> If I shutdown the server and blow away the stats with "rm
>> data/pg_stat/*", it recovers gracefully when I start it back up.  If a
>> do "rm -r data/pg_stat" then it has problems the next time I shut it
>> down, but I have no right to do that in the first place.  If I initdb
>> a database without this patch, then shut it down and restart with
>> binaries that include this patch, and need to manually make the
>> pg_stat directory.  Does that mean it needs a catalog bump in order to
>> force an initdb?
>
> Ummmm, what you mean by "catalog bump"?

There is a catalog number in src/include/catalog/catversion.h, which
when changed forces one to redo initdb.

Formally I guess it is only for system catalog changes, but I thought
it was used for any on-disk changes during development cycles.  I like
it the way it is, as I can use the same  data directory for both
versions of the binary (patched and unpatched), and just manually
create or remove the directory pg_stat directory when changing modes.
That is ideal for testing this patch, probably not ideal for being
committed into the tree along with all the other ongoing devel work.
But I think this is something the committer has to worry about.


>> I have a question about its completeness.  When I first start up the
>> cluster and have not yet touched it, there is very little stats
>> collector activity, either with or without this patch.  When I kick
>> the cluster sufficiently (I've been using vacuumdb -a to do that) then
>> there is a lot of stats collector activity.  Even once the vacuumdb
>> has long finished, this high level of activity continues even though
>> the database is otherwise completely idle, and this seems to happen
>> for every.  This patch makes that high level of activity much more
>> efficient, but it does not reduce the activity.  I don't understand
>> why an idle database cannot get back into the state right after
>> start-up.
>
> What do you mean by "stats collector activity"?  Is it reading/writing a
> lot of data, or is it just using a lot of CPU?

Basically, the launching of new autovac workers and the work that that
entails.  Your patch reduces the size of data that needs to be
written, read, and parsed for every launch, but not the number of
times that that happens.

> Isn't that just a natural and expected behavior because the database
> needs to actually perform ANALYZE to actually collect the data. Although
> the tables are empty, it costs some CPU / IO and there's a lot of them
> (1000 dbs, each with 200 tables).

It isn't touching the tables at all, just the stats files.

I was wrong about the cluster opening quiet.  It only does that if,
while the cluster was shutdown, you remove the statistics files which
I was doing, as I was switching back and forth between patched and
unpatched.

When the cluster opens, any databases that don't have statistics in
the stat file(s) will not get an autovacuum worker process spawned.
They only start getting spawned once someone asks for statistics for
that database.  But then once that happens, that database then gets a
worker spawned for it every naptime (or, at least, as close to that as
the server can keep up with) for eternity, even if that database is
never used again.  The only way to stop this is the unsupported way of
blowing away the permanent stats files.



>
> I don't think there's a way around this. You may increase the autovacuum
> naptime, but that's about all.
>
>> I do not think that the patch needs to solve this problem in order to
>> be accepted, but if it can be addressed while the author and reviewers
>> are paying attention to this part of the system, that would be ideal.
>> And if not, then we should at least remember that there is future work
>> that could be done here.
>
> If I understand that correctly, you see the same behaviour even without
> the patch, right? In that case I'd vote not to make the patch more
> complex, and try to improve that separately (if it's even possible).

OK.  I just thought that while digging through the code, you might
have a good idea for fixing this part as well.  If so, it would be a
shame for that idea to be lost when you move on to other things.


>
>
>> I created 1000 databases each with 200 single column tables (x serial
>> primary key).
>>
>> After vacuumdb -a, I let it idle for a long time to see what steady
>> state was reached.
>>
>> without the patch:
>> vacuumdb -a   real    11m2.624s
>> idle steady state:  48.17% user, 39.24% system, 11.78% iowait, 0.81% idle.
>>
>> with the patch:
>> vacuumdb -a    real    6m41.306s
>> idle steady state:  7.86% user, 5.00% system  0.09% iowait  87% idle,
>
> Nice. Another interesting numbers would be device utilization, average
> I/O speed

I didn't gather that data, as I never figured out how to interpret
those numbers and so don't have much faith in them.  (But I am pretty
impressed with the numbers I do understand)

> and required space (which should be ~2x the pgstat.stat size
> without the patch).

I didn't study this in depth, but the patch seems to do what it should
(that is, take less space, not more).   If I fill the device up so
that there is less than 3x the size of the stats file available for
use (i.e. space for the file itself and for 1 temp copy version of it
but not space for a complete second temp copy), I occasionally get
out-of-space warning with unpatched.  But never get those errors with
patched.   Indeed, with patch I never get warnings even with only 1.04
times the aggregate size of the stats files available for use.  (That
is, size for all the files, plus just 1/25 that amount to spare.
Obviously this limit is specific to having 1000 databases of equal
size.)


>
>> I also ran pgbench on a scale that fits in memory with fsync=off, on a
>> singe CPU machine.  With the same above-mentioned 1000 databases as
>> unused decoys to bloat the stats file.
>>
>> pgbench_tellers and branches undergo enough turn over that they should
>> get vacuumed every minuted (nap time).
>>
>> Without the patch, they only get vacuumed every 40 minutes or so as
>> the autovac workers are so distracted by reading the bloated stats
>> file. and the TPS is ~680.
>>
>> With the patch, they get vacuumed every 1 to 2 minutes and TPS is ~940
>
> Great, I haven't really aimed to improve pgbench results, but it seems
> natural that the decreased CPU utilization can go somewhere else. Not bad.

My goal there was to prove to myself that the correct tables were
getting vacuumed.  The TPS measurements were just a by-product of
that, but since I had them I figured I'd post them.

> Have you moved the stats somewhere to tmpfs, or have you used the
> default location (on disk)?

All the specific work I reported was with them on disk, except the
part about running out of space, which was done on /dev/shm.  But even
with the data theoretically going to disk, the kernel caches it well
enough that I wouldn't expect things to change very much.

Two more questions I've come up with:

If I invoke pg_stat_reset() from a database, the corresponding file
does not get removed from the pg_stat_tmp directory.  And when the
database is shut down, a file for the reset database does get created
in pg_stat.  Is this OK?

Cheers,

Jeff



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Considering Gerrit for CFs
Next
From: Tom Lane
Date:
Subject: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system