Re: Global temporary tables - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Global temporary tables
Date
Msg-id 614c1e9f-c0e6-332c-ba2b-85a7e1efb956@postgrespro.ru
Whole thread Raw
In response to Re: Global temporary tables  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Global temporary tables  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers


On 10.08.2019 5:12, Craig Ringer wrote:
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


Ok, here it is: global_private_temp-1.patch

Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on physical read replicas, though I know there are plenty of other barriers there (not least of which being temp tables using persistent txns not vtxids)

Does it have a CF entry?
 

https://commitfest.postgresql.org/24/2233/

Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch

Nice to see that split out. In addition to giving the first patch more hope of being committed this time around, it'll help with readability and testability too.

To be clear, I have long wanted to see PostgreSQL have the "session" state abstraction you have implemented. I think it's really important for high client count OLTP workloads, working with the endless collection of ORMs out there, etc. So I'm all in favour of it in principle so long as it can be made to work reliably with limited performance impact on existing workloads and without making life lots harder when adding new core functionality, for extension authors etc. The same goes for built-in pooling. I think PostgreSQL has needed some sort of separation of "connection", "backend", "session" and "executor" for a long time and I'm glad to see you working on it.

With that said: How do you intend to address the likelihood that this will cause performance regressions for existing workloads that use temp tables *without* relying on your session state and connection pooler? Consider workloads that use temp tables for mid-long txns where txn pooling is unimportant, where they also do plenty of read and write activity on persistent tables. Classic OLAP/DW stuff. e.g.:

* four clients, four backends, four connections, session-level connections that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large persistent table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate data for its medium-length transactions
* transactions also write to some persistent relations, say to record their summarised results  

How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of temp tables
* the temp table data in shared_buffers may put pressure on shared_buffers space, cached pages for persistent tables all sessions are sharing;
* the temp table data in shared_buffers may put pressure on shared_buffers space for dirty buffers, forcing writes of persistent tables out earlier therefore reducing write-combining opportunities;

I agree that access to local buffers is cheaper than to shared buffers because there is no lock overhead.
And the fact that access to local tables can not affect cached data of persistent tables is also important.
But most of Postgres tables are still normal (persistent) tables access through shared buffers.
And huge amount of efforts were made to make this access as efficient as possible (use clock algorithm which doesn't require global lock,
atomic operations,...). Also using the same replacement discipline for all tables at some workloads may be also preferable.
So it is not so obvious to me that in the described scenario local buffer cache for temporary table really will provide significant advantages.
It will be interesting to perform some benchmarking - I am going to do it.

What I have observed right now is that in type scenario: dumping results of huge query to temporary table with subsequent traverse of this table
old (local) temporary tables provide better performance (may be because of small size of local buffer cache and different eviction policy).
But subsequent accesses to global shared table are faster (because it completely fits in large shared buffer cache).

There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.
 



pgsql-hackers by date:

Previous
From: Roby
Date:
Subject: Feature Request: insert/on conflict update status
Next
From: Pavel Stehule
Date:
Subject: Re: Global temporary tables