Re: Megabytes of stats saved after every connection - Mailing list pgsql-general

From Phil Endecott
Subject Re: Megabytes of stats saved after every connection
Date
Msg-id 42E93289.7020207@chezphil.org
Whole thread Raw
In response to Re: Megabytes of stats saved after every connection  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Megabytes of stats saved after every connection
Re: Megabytes of stats saved after every connection
List pgsql-general
Hello again,

Just to give a bit of background, in case it is useful: this is my
family tree website, treefic.com.  I have a schema for each user, each
with about a dozen tables.  In most cases the tables are small, i.e.
tens of entries, but the users I care about are the ones with tens of
thousands of people in their trees.  The schemas are independent of each
other.  Example web page: http://treefic.com/treefic/royal92

>>>Jan Wieck <JanWieck@Yahoo.com> writes:
>>>>PostgreSQL itself doesn't work too well with tens of thousands of
>>>>tables.

I've specifically asked about this here before.  This is obviously
important for my application so I invite all readers to share any
thoughts they might have about possible problems with large numbers of
tables.  I also create and drop large numbers of temporary tables - can
anyone think of any additional problems with that?

Issues I have discussed here before include tab-completion in psql
(unimportant) and autovacuum's O(n^2) performance (important).

>>Okay, I should be more specific. The problem with tens of thousands of
>>tables does not exist just because of them being there. It will emerge
>>if all those tables are actually used because it will mean that you'd
>>need all the pg_class and pg_attribute rows cached and also your vfd
>>cache will constantly rotate.

If many trees are being viewed simultaneously, another part of the
system will be the bottleneck.  Within any, say, 5 minute period, only
hundreds of tables will be in use.

>>Then again, the stats file is only written. There is nothing that
>>actually forces the blocks out. On a busy system, one individual stats
>>file will be created, written to, renamed, live for 500ms and be thrown
>>away by the next stat files rename operation. I would assume that with a
>>decent filesystem and appropriate OS buffers, none of the data blocks of
>>most stat files even hit the disk. I must be missing something.

> This is possibly true --- Phil, do you see actual disk I/O happening
> from the stats writes, or is it just kernel calls?

During my tests the system was idle; I would run "psql -c 'select 1;'"
and see the blocks in vmstat's "bo" column a couple of seconds later.
As I understand it that indicates actual I/O, and the delay suggests
that it is being flushed by the kernel.  When the system is busy it is
harder to see what is going on and it is possible that at least some of
this activity was not being written to the disk.  Typically I would see
a lot more write bandwidth than read bandwidth (by a factor of 5 or so)
according to vmstat;  any advice about how to identify what files or
processes are involved would be appreciated.  I had previously imagined
that it could be temporary tables.  This is Linux 2.4.26 and an ext3
filesystem.

Having disabled stats earlier my stats file is still quite small.
Presumably it will gradually grow back.  In the meantime I cannot do any
experiments.

Thanks as ever for your prompt responses.

Regards,

--Phil.


pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: GUID for postgreSQL
Next
From: Phil Endecott
Date:
Subject: Re: Megabytes of stats saved after every connection