Re: Keeping temporary tables in shared buffers - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Keeping temporary tables in shared buffers
Date
Msg-id 20180525065048.5vflwv26d7unfwfo@alap3.anarazel.de
Whole thread Raw
In response to Re: Keeping temporary tables in shared buffers  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Keeping temporary tables in shared buffers  (Ashwin Agrawal <aagrawal@pivotal.io>)
List pgsql-hackers
On 2018-05-25 09:40:10 +0300, Heikki Linnakangas wrote:
> On 25/05/18 09:25, Asim Praveen wrote:
> > My parochial vision of the overhead is restricted to 4 * NBuffers of
> > additional shared memory, as 4 bytes are being added to BufferTag.  May I
> > please get some enlightenment?
> 
> Any extra fields in BufferTag make computing the hash more expensive. It's a
> very hot code path, so any cycles spent are significant.

Indeed, very much so.

But I'm not sure we need anything in the tags themselves. We don't
denote buffers for unlogged tables in the tag itself either. As Tom
observed the oids for temp tables are either unique or can be made
unique easy enough.  And the temporaryness can be declared in a bit in
the buffer header, rather than the tag itself. I don't see why a hash
lookup would need to know that.


> In relation to Andres' patches to rewrite the buffer manager with a radix
> tree, there was actually some discussion of trying to make BufferTag
> *smaller*.

FWIW, in the latest version that doesn't matter that much
anymore. Instead of one big tree it's a hashtable of trees (although it
potentially should rather be a tree of trees). The hashtable maps to a
radix tree, and that radix tree is just indexed by the offset.  The root
of the tree is then cached inside the smgr, avoiding the need to
repeatedly look it up.


> For example, we could rearrange things so that
> pg_class.relfilenode is 64 bits wide. Then you could assume that it never
> wraps around, and is unique across all relations in the cluster. Then you
> could replace the 12-byte relfilenode+dbid+spcid triplet, with just the
> 8-byte relfilenode. Doing something like that might be the solution here,
> too.

OTOH it's quite useful to have the buffertag be something that can (or
rather could) be efficiently searched for in a hierachical
fashion. While, by far, not as crucial performancewise as dropping an
individual relation, it would be nice not to have to scan all of s_b to
drop a database.


> > Temp tables have unique filename on disk: t_<backendID>_<relfilenode>.  The
> > logic to assign OIDs and relfilenodes, however, doesn't differ.  Given a
> > RelFileNode, it is not possible to tell if it's a temp table or not.
> > RelFileNodeBackend allows for that distinction but it's not used by buffer
> > manager.
> 
> Could you store the backendid in BufferDesc, outside of BufferTag? Is it
> possible for a normal table and a temporary table to have the same
> relfilenode+dbid+spcid triplet?

When starting to work on the radix tree stuff I had, to address the size
of buffer tag issue you mention above, a prototype patch that created a
shared 'relfilenode' table. That guaranteed that relfilenodes are
unique.  That'd work here as well, and would allow to get rid of a good
chunk of uglyness we have around allocating relfilenodes right now (like
not unlinking files etc).

But more generally, I don't see why it'd be that problematic to just get
rid of the backendid? I don't really see any technical necessity to have
it.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Arseny Sher
Date:
Subject: Re: Possible bug in logical replication.
Next
From: Arseny Sher
Date:
Subject: Re: Possible bug in logical replication.