On Fri, Aug 6, 2010 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Aug 5, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> One thing that I find rather distressing about this is the 25% bloat
>>> in sizeof(SharedInvalidationMessage). Couldn't we avoid that? Is it
>>> really necessary to *ever* send an SI message for a backend-local rel?
>
>> It can be dropped or unlinked by another backend, so, yes.
>
> Really? Surely that should be illegal during normal operation. We
> might be doing such during crash recovery, but we don't need to
> broadcast sinval messages then.
autovacuum.c does it when we start to worry about XID wraparound, but
you can also do it from any normal backend. Just "DROP TABLE
pg_temp_2.foo" or whatever and away you go.
> It might be sufficient to consider that there are "local" and "global"
> smgr inval messages, where the former never get out of the generating
> backend, so a bool is enough in the message struct.
It would be nice to be able to do it that way, but I don't believe
it's the case, per the above.
>> had was that if we could count on the backend ID to fit into an int16
>> we could fit it in to what's currently padding space. That would
>> require rather dramatically lowering the maximum number of backends
>> (currently INT_MAX/4), but it's a little hard to imagine that we can
>> really support more than 32,767 simultaneous backends anyway.
>
> Yeah, that occurred to me too. A further thought is that the id field
> could probably be reduced to 1 byte, leaving 3 for backendid, which
> would certainly be plenty. However representing that in a portable
> struct declaration would be a bit painful I fear.
Well, presumably we'd just represent it as a 1-byte field followed by
a 2-byte field, and do a bit of math. But I don't really see the
point. The whole architecture of a shared invalidation queue is
fundamentally non-scalable because it's a broadcast medium. If we
wanted to efficiently support even thousands of backends (let alone
tens or hundreds of thousands) I assume we would need to rearchitect
this completely with more fine-grained queues, and have backends
subscribe to the queues pertaining to the objects they want to access
before touching them. Or maybe something else entirely. But I don't
think broadcasting to 30,000 backends is going to work for the same
reason that plugging 30,000 machines into an Ethernet *hub* doesn't
work.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company