Re: pg_stat_*_columns? - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: pg_stat_*_columns?
Date
Msg-id CABUevEwV=_zc8Zfrn9UQR13VDPtVNSHaA3sGNmvJR7MhE1MqYA@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_*_columns?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_stat_*_columns?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Jun 20, 2015 at 11:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Jun 20, 2015 at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But if the structure
>> got too big to map (on a 32-bit system), then you'd be sort of hosed,
>> because there's no way to attach just part of it.  That might not be
>> worth worrying about, but it depends on how big it's likely to get - a
>> 32-bit system is very likely to choke on a 1GB mapping, and maybe even
>> on a much smaller one.
>
> Yeah, I'm quite worried about assuming that we can map a data structure
> that might be of very significant size into shared memory on 32-bit
> machines.  The address space just isn't there.

Considering the advantages of avoiding message queues, I think we
should think a little bit harder about whether we can't find some way
to skin this cat.  As I think about this a little more, I'm not sure
there's really a problem with one stats DSM per database.  Sure, the
system might have 100,000 databases in some crazy pathological case,
but the maximum number of those that can be in use is bounded by
max_connections, which means the maximum number of stats file DSMs we
could ever need at one time is also bounded by max_connections.  There
are a few corner cases to think about, like if the user writes a
client that connects to all 100,000 databases in very quick
succession, we've got to jettison the old DSMs fast enough to make
room for the new DSMs before we run out of slots, but that doesn't
seem like a particularly tough nut to crack.  If the stats collector
ensures that it never attaches to more than MaxBackends stats DSMs at
a time, and each backend ensures that it never attaches to more than
one stats DSM at a time, then 2 * MaxBackends stats DSMs is always
enough.  And that's just a matter of bumping
PG_DYNSHMEM_SLOTS_PER_BACKEND from 2 to 4.

In more realistic cases, it will probably be normal for many or all
backends to be connected to the same database, and the number of stats
DSMs required will be far smaller.



What about a combination in the line of something like this: stats collector keeps the statistics in local memory as before. But when a backend needs to get a snapshot of it's data, it uses a shared memory queue to request it. What the stats collector does in this case is allocate a new DSM, copy the data into that DSM, and hands the DSM over to the backend. At this point the stats collector can forget about it, and it's up to the backend to get rid of it when it's done with it.

That means the address space thing should not be any worse than today, because each backend will still only see "it's own data". And we only need to copy the data for databases that are actually used. 

--

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Insufficient locking for ALTER DEFAULT PRIVILEGES
Next
From: Andres Freund
Date:
Subject: Re: pg_stat_*_columns?