Re: autovacuum stress-testing our system - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: autovacuum stress-testing our system
Date
Msg-id 50638642.2080606@fuzzy.cz
Whole thread Raw
In response to Re: autovacuum stress-testing our system  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 26.9.2012 18:29, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Excerpts from Euler Taveira's message of miĂŠ sep 26 11:53:27 -0300 2012:
>>> On 26-09-2012 09:43, Tomas Vondra wrote:
>>>> 5) splitting the single stat file into multiple pieces - e.g. per database,
>>>> written separately, so that the autovacuum workers don't need to read all
>>>> the data even for databases that don't need to be vacuumed. This might be
>>>> combined with (4).
>
>>> IMHO that's the definitive solution. It would be one file per database plus a
>>> global one. That way, the check would only read the global.stat and process
>>> those database that were modified. Also, an in-memory map could store that
>>> information to speed up the checks.
>
>> +1
>
> That would help for the case of hundreds of databases, but how much
> does it help for lots of tables in a single database?

Well, it wouldn't, but it wouldn't make it worse either. Or at least
that's how I understand it.

> I'm a bit suspicious of the idea that we should encourage people to use
> hundreds of databases per installation anyway: the duplicated system
> catalogs are going to be mighty expensive, both in disk space and in
> their cache footprint in shared buffers.  There was some speculation
> at the last PGCon about how we might avoid the duplication, but I think
> we're years away from any such thing actually happening.

You don't need to encourage us to do that ;-) We know it's not perfect
and considering a good alternative - e.g. several databases (~10) with
schemas inside, replacing the current database-only approach. This way
we'd get multiple stat files (thus gaining the benefits) with less
overhead (shared catalogs).

And yes, using tens of thousands of tables (serving as "caches") for a
reporting solution is "interesting" (as in the old Chinese curse) too.

> What seems to me like it could help more is fixing things so that the
> autovac launcher needn't even launch a child process for databases that
> haven't had any updates lately.  I'm not sure how to do that, but it
> probably involves getting the stats collector to produce some kind of
> summary file.

Yes, I've proposed something like this in my original mail - setting a
"dirty" flag on objects (a database in this case) whenever a table in it
gets eligible for vacuum/analyze.

Tomas



pgsql-hackers by date:

Previous
From: Gavin Flower
Date:
Subject: Re: system_information.triggers & truncate triggers
Next
From: Tomas Vondra
Date:
Subject: Re: autovacuum stress-testing our system