Re: Keeping temporary tables in shared buffers - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Keeping temporary tables in shared buffers |
Date | |
Msg-id | 20180525065048.5vflwv26d7unfwfo@alap3.anarazel.de Whole thread Raw |
In response to | Re: Keeping temporary tables in shared buffers (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Keeping temporary tables in shared buffers
(Ashwin Agrawal <aagrawal@pivotal.io>)
|
List | pgsql-hackers |
On 2018-05-25 09:40:10 +0300, Heikki Linnakangas wrote: > On 25/05/18 09:25, Asim Praveen wrote: > > My parochial vision of the overhead is restricted to 4 * NBuffers of > > additional shared memory, as 4 bytes are being added to BufferTag. May I > > please get some enlightenment? > > Any extra fields in BufferTag make computing the hash more expensive. It's a > very hot code path, so any cycles spent are significant. Indeed, very much so. But I'm not sure we need anything in the tags themselves. We don't denote buffers for unlogged tables in the tag itself either. As Tom observed the oids for temp tables are either unique or can be made unique easy enough. And the temporaryness can be declared in a bit in the buffer header, rather than the tag itself. I don't see why a hash lookup would need to know that. > In relation to Andres' patches to rewrite the buffer manager with a radix > tree, there was actually some discussion of trying to make BufferTag > *smaller*. FWIW, in the latest version that doesn't matter that much anymore. Instead of one big tree it's a hashtable of trees (although it potentially should rather be a tree of trees). The hashtable maps to a radix tree, and that radix tree is just indexed by the offset. The root of the tree is then cached inside the smgr, avoiding the need to repeatedly look it up. > For example, we could rearrange things so that > pg_class.relfilenode is 64 bits wide. Then you could assume that it never > wraps around, and is unique across all relations in the cluster. Then you > could replace the 12-byte relfilenode+dbid+spcid triplet, with just the > 8-byte relfilenode. Doing something like that might be the solution here, > too. OTOH it's quite useful to have the buffertag be something that can (or rather could) be efficiently searched for in a hierachical fashion. While, by far, not as crucial performancewise as dropping an individual relation, it would be nice not to have to scan all of s_b to drop a database. > > Temp tables have unique filename on disk: t_<backendID>_<relfilenode>. The > > logic to assign OIDs and relfilenodes, however, doesn't differ. Given a > > RelFileNode, it is not possible to tell if it's a temp table or not. > > RelFileNodeBackend allows for that distinction but it's not used by buffer > > manager. > > Could you store the backendid in BufferDesc, outside of BufferTag? Is it > possible for a normal table and a temporary table to have the same > relfilenode+dbid+spcid triplet? When starting to work on the radix tree stuff I had, to address the size of buffer tag issue you mention above, a prototype patch that created a shared 'relfilenode' table. That guaranteed that relfilenodes are unique. That'd work here as well, and would allow to get rid of a good chunk of uglyness we have around allocating relfilenodes right now (like not unlinking files etc). But more generally, I don't see why it'd be that problematic to just get rid of the backendid? I don't really see any technical necessity to have it. Greetings, Andres Freund
pgsql-hackers by date: