Re: pg_stat_*_columns? - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: pg_stat_*_columns?
Date
Msg-id CABUevEwsv2ZaUhEo3R748CLfaGW0KS93-Wcioxz5avXW8xvTuw@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_*_columns?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Jun 23, 2015 at 3:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jun 21, 2015 at 11:43 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Sat, Jun 20, 2015 at 11:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Jun 20, 2015 at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> But if the structure
>> >> got too big to map (on a 32-bit system), then you'd be sort of hosed,
>> >> because there's no way to attach just part of it.  That might not be
>> >> worth worrying about, but it depends on how big it's likely to get - a
>> >> 32-bit system is very likely to choke on a 1GB mapping, and maybe even
>> >> on a much smaller one.
>> >
>> > Yeah, I'm quite worried about assuming that we can map a data structure
>> > that might be of very significant size into shared memory on 32-bit
>> > machines.  The address space just isn't there.
>>
>> Considering the advantages of avoiding message queues, I think we
>> should think a little bit harder about whether we can't find some way
>> to skin this cat.  As I think about this a little more, I'm not sure
>> there's really a problem with one stats DSM per database.  Sure, the
>> system might have 100,000 databases in some crazy pathological case,
>> but the maximum number of those that can be in use is bounded by
>> max_connections, which means the maximum number of stats file DSMs we
>> could ever need at one time is also bounded by max_connections.  There
>> are a few corner cases to think about, like if the user writes a
>> client that connects to all 100,000 databases in very quick
>> succession, we've got to jettison the old DSMs fast enough to make
>> room for the new DSMs before we run out of slots, but that doesn't
>> seem like a particularly tough nut to crack.  If the stats collector
>> ensures that it never attaches to more than MaxBackends stats DSMs at
>> a time, and each backend ensures that it never attaches to more than
>> one stats DSM at a time, then 2 * MaxBackends stats DSMs is always
>> enough.  And that's just a matter of bumping
>> PG_DYNSHMEM_SLOTS_PER_BACKEND from 2 to 4.
>>
>> In more realistic cases, it will probably be normal for many or all
>> backends to be connected to the same database, and the number of stats
>> DSMs required will be far smaller.
>
> What about a combination in the line of something like this: stats collector
> keeps the statistics in local memory as before. But when a backend needs to
> get a snapshot of it's data, it uses a shared memory queue to request it.
> What the stats collector does in this case is allocate a new DSM, copy the
> data into that DSM, and hands the DSM over to the backend. At this point the
> stats collector can forget about it, and it's up to the backend to get rid
> of it when it's done with it.

Well, there seems to be little point in having the stats collector
forget about a DSM that it could equally well have shared with the
next guy who wants a stats snapshot for the same database.  That case
is surely *plenty* common enough to be worth optimizing for.


Right, we only need to drop it once we have received a stats message for it so something changed. And possibly that with a minimum time as well, as we have now, if we want to limit the potential churn.

--

pgsql-hackers by date:

Previous
From: Piotr Stefaniak
Date:
Subject: Re: NULL passed as an argument to memcmp() in parse_func.c
Next
From: Tom Lane
Date:
Subject: Should we back-patch SSL renegotiation fixes?