Re: autovacuum stress-testing our system - Mailing list pgsql-hackers

From Robert Haas
Subject Re: autovacuum stress-testing our system
Date
Msg-id CA+TgmoZr3xUQq_OM2V1nmEN6NScVBFkpYJrpFFmbyOPJKvZdzQ@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum stress-testing our system  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: autovacuum stress-testing our system
List pgsql-hackers
On Sun, Nov 18, 2012 at 5:49 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> The two main changes are these:
>
> (1) The stats file is split into a common "db" file, containing all the
>     DB Entries, and per-database files with tables/functions. The common
>     file is still called "pgstat.stat", the per-db files have the
>     database OID appended, so for example "pgstat.stat.12345" etc.
>
>     This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile
>     functions, introducing two new functions:
>
>        pgstat_read_db_statsfile
>        pgstat_write_db_statsfile
>
>     that do the trick of reading/writing stat file for one database.
>
> (2) The pgstat_read_statsfile has an additional parameter "onlydbs" that
>     says that you don't need table/func stats - just the list of db
>     entries. This is used for autovacuum launcher, which does not need
>     to read the table/stats (if I'm reading the code in autovacuum.c
>     correctly - it seems to be working as expected).

I'm not an expert on the stats system, but this seems like a promising
approach to me.

> (a) It does not solve the "many-schema" scenario at all - that'll need
>     a completely new approach I guess :-(

We don't need to solve every problem in the first patch.  I've got no
problem kicking this one down the road.

> (b) It does not solve the writing part at all - the current code uses a
>     single timestamp (last_statwrite) to decide if a new file needs to
>     be written.
>
>     That clearly is not enough for multiple files - there should be one
>     timestamp for each database/file. I'm thinking about how to solve
>     this and how to integrate it with pgstat_send_inquiry etc.

Presumably you need a last_statwrite for each file, in a hash table or
something, and requests need to specify which file is needed.

>     And yet another one I'm thinking about is using a fixed-length
>     array of timestamps (e.g. 256), indexed by mod(dboid,256). That
>     would mean stats for all databases with the same mod(oid,256) would
>     be written at the same time. Seems like an over-engineering though.

That seems like an unnecessary kludge.

> (c) I'm a bit worried about the number of files - right now there's one
>     for each database and I'm thinking about splitting them by type
>     (one for tables, one for functions) which might make it even faster
>     for some apps with a lot of stored procedures etc.
>
>     But is the large number of files actually a problem? After all,
>     we're using one file per relation fork in the "base" directory, so
>     this seems like a minor issue.

I don't see why one file per database would be a problem.  After all,
we already have on directory per database inside base/.  If the user
has so many databases that dirent lookups in a directory of that size
are a problem, they're already hosed, and this will probably still
work out to a net win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] binary heap implementation
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] binary heap implementation