Re: pg_stat_*_columns? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_stat_*_columns?
Date
Msg-id 45012.1434814375@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_stat_*_columns?  (Magnus Hagander <magnus@hagander.net>)
Responses Re: pg_stat_*_columns?  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Magnus Hagander <magnus@hagander.net> writes:
> On Sat, Jun 20, 2015 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I dunno that tweaking the format would accomplish much.  Where I'd love
>> to get to is to not have to write the data to disk at all (except at
>> shutdown).  But that seems to require an adjustable-size shared memory
>> block, and I'm not sure how to do that.  One idea, if the DSM stuff
>> could be used, is to allow the stats collector to allocate multiple
>> DSM blocks as needed --- but how well would that work on 32-bit
>> machines?  I'd be worried about running out of address space.

> I've considered both that and to perhaps use a shared memory message queue
> to communicate. Basically, have a backend send a request when it needs a
> snapshot of the stats data and get a copy back through that method instead
> of disk. It would be much easier if we didn't actually take a snapshot of
> the data per transaction, but we really don't want to give that up (if we
> didn't care about that, we could just have a protocol asking for individual
> values).

Yeah, that might work quite nicely, and it would not require nearly as
much surgery on the existing code as mapping the stuff into
constrained-size shmem blocks would do.  The point about needing a data
snapshot is a good one as well; I'm not sure how we'd preserve that
behavior if backends are accessing the collector's data structures
directly through shmem.

I wonder if we should think about replacing the IP-socket-based data
transmission protocol with a shared memory queue, as well.

> We'd need a way to actually transfer the whole hashtables over, without
> rebuilding them on the other end I think. Just the cost of looping over it
> to dump and then rehashing everything on the other end seems quite wasteful
> and unnecessary.

Meh.  All of a sudden you've made it complicated and invasive again,
to get rid of a bottleneck that's not been shown to be a problem.
Let's do the simple thing first, else maybe nothing will happen at all.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Feng Tian
Date:
Subject: Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Next
From: Alvaro Herrera
Date:
Subject: Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H