On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:
>>> On 26-09-2012 09:43, Tomas Vondra wrote:
>>>> 5) splitting the single stat file into multiple pieces - e.g. per database,
>>>> written separately, so that the autovacuum workers don't need to read all
>>>> the data even for databases that don't need to be vacuumed. This might be
>>>> combined with (4).
>
>>> IMHO that's the definitive solution. It would be one file per database plus a
>>> global one. That way, the check would only read the global.stat and process
>>> those database that were modified. Also, an in-memory map could store that
>>> information to speed up the checks.
>
>> +1
>
> That would help for the case of hundreds of databases, but how much
> does it help for lots of tables in a single database?
It doesn't help that case, but that case doesn't need much help. If
you have N statistics-kept objects in total spread over M databases,
of which T objects need vacuuming per naptime, the stats file traffic
is proportional to N*(M+T). If T is low, then there is generally is
no problem if M is also low. Or at least, the problem is much smaller
than when M is high for a fixed value of N.
> I'm a bit suspicious of the idea that we should encourage people to use
> hundreds of databases per installation anyway:
I agree with that, but we could still do a better job of tolerating
it; without encouraging it. If someone volunteers to write the code
to do this, what trade-offs would there be?
Cheers,
Jeff