Global temporary tables - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Global temporary tables
Date
Msg-id 73954ab7-44d3-b37b-81a3-69bdcbb446f7@postgrespro.ru
Whole thread Raw
Responses Re: Global temporary tables  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Current Postgres implementation of temporary table causes number of 
problems:

1. Catalog bloating: if client creates and deletes too many temporary 
tables, then autovacuum get stuck on catalog.
2. Parallel queries: right now usage of temporary tables in query 
disables parallel plan.
3. It is not possible to use temporary tables at replica. Hot standby 
configuration is frequently used to run OLAP queries on replica
and results of such queries are used to be saved in temporary tables. 
Right now it is not possible (except "hackers" solution with storing 
results in file_fdw).
4. Temporary tables can not be used in prepared transactions.
5. Inefficient memory usage and possible memory overflow: each backend 
maintains its own local buffers for work with temporary tables.
Default size of temporary buffers is 8Mb. It seems to be too small for 
modern servers having hundreds of gigabytes of RAM, causing extra 
copying of data
between OS cache and local buffers. But if there are thousands of 
backends, each executing queries with temporary tables, then  total 
amount of
memory used for temporary buffers can exceed several tens of gigabytes.
6. Connection pooler can not reschedule session which has created 
temporary tables to some other backend
because it's data is stored in local buffers.

There were several attempts to address this problems.
For example Alexandr Alekseev has implemented patch which allows to 
create fast temporary tables without accessing system catalog:
https://www.postgresql.org/message-id/flat/20160301182500.2c81c3dc%40fujitsu
Unfortunately this patch was too invasive and rejected by community.

There was also attempt to allow under some condition use temporary 
tables in 2PC transactions:
https://www.postgresql.org/message-id/flat/m2d0pllvqy.fsf%40dimitris-macbook-pro.home

https://www.postgresql.org/message-id/flat/3a4b3c88-4fa5-1edb-a878-1ed76fa1c82b%40postgrespro.ru#d8a8342d07317d12e3405b903d3b15e4
Them were also rejected.

I try to make yet another attempt to address this problems, first of all 
1), 2), 5) and 6)
To solve this problems I propose notion of "global temporary" tables, 
similar with ones in Oracle.
Definition of this table (metadata) is shared by all backends but data 
is private to the backend. After session termination data is obviously lost.

Suggested syntax for creation of global temporary tables:

     create global temp table
or
     create session table

Once been created it can be used by all backends.
Global temporary tables are accessed though shared buffers (to solve 
problem 2).
Cleanup of temporary tables data (release of shared buffer and deletion 
of relation files) is performed on backend termination.
In case of abnormal server termination, files of global temporary tables 
are cleaned-up in the same way as of local temporary tables.

Certainly there are cases were global temporary tables can not be used, 
i.e. when application is dynamically constructed name and columns of 
temporary table.
Also access to local buffers is more efficient than access to shared 
buffers because it doesn't require any synchronization.
But please notice that it is always possible to create old (local) 
temporary tables which preserves current behavior.

The problem with replica is still not solved. But shared metadata is 
step in this direction.
I am thinking about reimplementation of temporary tables using new table 
access method API.
The drawback of such approach is that it will be necessary to 
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary 
table tuples and decrease size of tuple header.
I still not sure if implementing special table access method for 
temporary tables is good idea.

Patch for global temporary tables is attached to this mail.
The known limitation is that now it supports only B-Tree indexes.
Any feedback is welcome.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Next
From: Tom Lane
Date:
Subject: Re: How to retain lesser paths at add_path()?