Thread: Global temporary tables
Current Postgres implementation of temporary table causes number of problems: 1. Catalog bloating: if client creates and deletes too many temporary tables, then autovacuum get stuck on catalog. 2. Parallel queries: right now usage of temporary tables in query disables parallel plan. 3. It is not possible to use temporary tables at replica. Hot standby configuration is frequently used to run OLAP queries on replica and results of such queries are used to be saved in temporary tables. Right now it is not possible (except "hackers" solution with storing results in file_fdw). 4. Temporary tables can not be used in prepared transactions. 5. Inefficient memory usage and possible memory overflow: each backend maintains its own local buffers for work with temporary tables. Default size of temporary buffers is 8Mb. It seems to be too small for modern servers having hundreds of gigabytes of RAM, causing extra copying of data between OS cache and local buffers. But if there are thousands of backends, each executing queries with temporary tables, then total amount of memory used for temporary buffers can exceed several tens of gigabytes. 6. Connection pooler can not reschedule session which has created temporary tables to some other backend because it's data is stored in local buffers. There were several attempts to address this problems. For example Alexandr Alekseev has implemented patch which allows to create fast temporary tables without accessing system catalog: https://www.postgresql.org/message-id/flat/20160301182500.2c81c3dc%40fujitsu Unfortunately this patch was too invasive and rejected by community. There was also attempt to allow under some condition use temporary tables in 2PC transactions: https://www.postgresql.org/message-id/flat/m2d0pllvqy.fsf%40dimitris-macbook-pro.home https://www.postgresql.org/message-id/flat/3a4b3c88-4fa5-1edb-a878-1ed76fa1c82b%40postgrespro.ru#d8a8342d07317d12e3405b903d3b15e4 Them were also rejected. I try to make yet another attempt to address this problems, first of all 1), 2), 5) and 6) To solve this problems I propose notion of "global temporary" tables, similar with ones in Oracle. Definition of this table (metadata) is shared by all backends but data is private to the backend. After session termination data is obviously lost. Suggested syntax for creation of global temporary tables: create global temp table or create session table Once been created it can be used by all backends. Global temporary tables are accessed though shared buffers (to solve problem 2). Cleanup of temporary tables data (release of shared buffer and deletion of relation files) is performed on backend termination. In case of abnormal server termination, files of global temporary tables are cleaned-up in the same way as of local temporary tables. Certainly there are cases were global temporary tables can not be used, i.e. when application is dynamically constructed name and columns of temporary table. Also access to local buffers is more efficient than access to shared buffers because it doesn't require any synchronization. But please notice that it is always possible to create old (local) temporary tables which preserves current behavior. The problem with replica is still not solved. But shared metadata is step in this direction. I am thinking about reimplementation of temporary tables using new table access method API. The drawback of such approach is that it will be necessary to reimplement large bulk of heapam code. But this approach allows to eliminate visibility check for temporary table tuples and decrease size of tuple header. I still not sure if implementing special table access method for temporary tables is good idea. Patch for global temporary tables is attached to this mail. The known limitation is that now it supports only B-Tree indexes. Any feedback is welcome. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, 31 Jul 2019 at 23:05, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Current Postgres implementation of temporary table causes number of
problems:
1. Catalog bloating: if client creates and deletes too many temporary
tables, then autovacuum get stuck on catalog.
This also upsets logical decoding a little - AFAICS it still has to treat transactions that use temporary tables as catalog-modifying transactions, tracking them in its historic catalog snapshots and doing extra cache flushes etc when decoding them.
This will become even more important as we work to support eager/optimistic output plugin processing of in-progress transactions. We'd have to switch snapshots more, and that can get quite expensive so using temp tables could really hurt performance. Or we'd have to serialize on catalog-changing transactions, in which case using temp tables would negate the benefits of optimistic streaming of in-progress transactions.
3. It is not possible to use temporary tables at replica.
For physical replicas, yes.
Hot standby
configuration is frequently used to run OLAP queries on replica
and results of such queries are used to be saved in temporary tables.
Right now it is not possible (except "hackers" solution with storing
results in file_fdw).
Right. Because we cannot modify pg_class, pg_attribute etc, even though we could reasonably enough write to local-only relfilenodes on a replica if we didn't have to change WAL-logged catalog tables.
I've seen some hacks suggested around this where we have an unlogged fork of each of the needed catalog tables, allowing replicas to write temp table info to them. We'd scan both the logged and unlogged forks when doing relcache management etc. But there are plenty of ugly issues with this. We'd have to reserve oid ranges for them which is ugly; to make it BC friendly those reservations would probably have to take the form of some kind of placeholder entry in the real pg_class. And it gets ickier from there. It hardly seems worth it when we should probably just implement global temp tables instead.
5. Inefficient memory usage and possible memory overflow: each backend
maintains its own local buffers for work with temporary tables.
Is there any reason that would change with global temp tables? We'd still be creating a backend-local relfilenode for each backend that actually writes to the temp table, and I don't see how it'd be useful or practical to keep those in shared_buffers.
Using local buffers has big advantages too. It saves shared_buffers space for data where there's actually some possibility of getting cache hits, or for where we can benefit from lazy/async writeback and write combining. I wouldn't want to keep temp data there if I had the option.
If you're concerned about the memory use of backend local temp buffers, or about how we account for and limit those, that's worth looking into. But I don't think it'd be something that should be affected by global-temp vs backend-local-temp tables.
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there are
thousands of backends, each executing queries with temporary tables,
then total amount of memory used for temporary buffers can exceed
several tens of gigabytes.
Right. But what solution do you propose for this? Putting that in shared_buffers will do nothing except deprive shared_buffers of space that can be used for other more useful things. A server-wide temp buffer would add IPC and locking overheads and AFAICS little benefit. One of the big appeals of temp tables is that we don't need any of that.
If you want to improve server-wide temp buffer memory accounting and management that makes sense. I can see it being useful to have things like a server-wide DSM/DSA pool of temp buffers that backends borrow from and return to based on memory pressure on a LRU-ish basis, maybe. But I can also see how that'd be complex and hard to get right. It'd also be prone to priority inversion problems where an idle/inactive backend must be woken up to release memory or release locks, depriving an actively executing backend of runtime. And it'd be as likely to create inefficiencies with copying and eviction as solve them since backends could easily land up taking turns kicking each other out of memory and re-reading their own data.
I don't think this is something that should be tackled as part of work on global temp tables personally.
6. Connection pooler can not reschedule session which has created temporary tables to some other backend because it's data is stored in local buffers.
Yeah, if you're using transaction-associative pooling. That's just part of a more general problem though, there are piles of related issues with temp tables, session GUCs, session advisory locks and more.
I don't see how global temp tables will do you the slightest bit of good here as the data in them will still be backend-local. If it isn't then you should just be using unlogged tables.
Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.
+1 that's what a global temp table should be, and it's IIRC pretty much how the SQL standard specifies temp tables.
I suspect I'm overlooking some complexities here, because to me it seems like we could implement these fairly simply. A new relkind would identify it as a global temp table and the relfilenode would be 0. Same for indexes on temp tables. We'd extend the relfilenode mapper to support a backend-local non-persistent relfilenode map that's used to track temp table and index relfilenodes. If no relfilenode is defined for the table, the mapper would allocate one. We already happily create missing relfilenodes on write so we don't even have to pre-create the actual file. We'd register the relfilenode as a tempfile and use existing tempfile cleanup mechanisms, and we'd use the temp tablespace to store it.
I must be missing something important because it doesn't seem hard.
Global temporary tables are accessed though shared buffers (to solve
problem 2).
I'm far from convinced of the wisdom or necessity of that, but I haven't spent as much time digging into this problem as you have.
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.
That sounds potentially cool, but perhaps a "next step" thing? Allow the creation of global temp tables to specify reloptions, and you can add it as a reloption later. You can't actually eliminate visibility checks anyway because they're still MVCC heaps. Savepoints can create invisible tuples even if you're using temp tables that are cleared on commit, and of course so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in practice.
On 01.08.2019 6:10, Craig Ringer wrote:
3. It is not possible to use temporary tables at replica.For physical replicas, yes.
Yes, definitely logical replicas (for example our PgPro-EE multimaster based on logical replication) do not suffer from this problem.
But in case of multimaster we have another problem related with temporary tables: we have to use 2PC for each transaction and using temporary tables in prepared transaction is now prohibited.
This was the motivation of the patch proposed by Stas Kelvich which allows to use temporary tables in prepared transactions under some conditions.
Yes, my implementation of global temp tables is using shared buffers.5. Inefficient memory usage and possible memory overflow: each backendmaintains its own local buffers for work with temporary tables.Is there any reason that would change with global temp tables? We'd still be creating a backend-local relfilenode for each backend that actually writes to the temp table, and I don't see how it'd be useful or practical to keep those in shared_buffers.
It was not strictly needed as far as data is local. It is possible to have shared metadata and private data accessed through local buffers.
But I have done it for three reasons:
1, Make it possible to use parallel plans for temp tables.
2. Eliminate memory overflow problem.
3. Make in possible to reschedule session to other backens (connection pooler).
Using local buffers has big advantages too. It saves shared_buffers space for data where there's actually some possibility of getting cache hits, or for where we can benefit from lazy/async writeback and write combining. I wouldn't want to keep temp data there if I had the option.
Definitely local buffers have some advantages:
- do not require synchronization
- avoid flushing data from shared buffers
But global temp tables are not excluding use of original (local) temp tables.
So you will have a choice: either to use local temp tables which can be easily created on demand and accessed through local buffers,
either create global temp tables, which eliminate catalog bloating, allow parallel queries and which data is controlled by the same cache replacement discipline as for normal tables...
Default size of temporary buffers is 8Mb. It seems to be too small for
modern servers having hundreds of gigabytes of RAM, causing extra
copying of data between OS cache and local buffers. But if there arethousands of backends, each executing queries with temporary tables,then total amount of memory used for temporary buffers can exceedseveral tens of gigabytes.Right. But what solution do you propose for this? Putting that in shared_buffers will do nothing except deprive shared_buffers of space that can be used for other more useful things. A server-wide temp buffer would add IPC and locking overheads and AFAICS little benefit. One of the big appeals of temp tables is that we don't need any of that.
I do not think that parallel execution and efficient connection pooling are "little benefit".
If you want to improve server-wide temp buffer memory accounting and management that makes sense. I can see it being useful to have things like a server-wide DSM/DSA pool of temp buffers that backends borrow from and return to based on memory pressure on a LRU-ish basis, maybe. But I can also see how that'd be complex and hard to get right. It'd also be prone to priority inversion problems where an idle/inactive backend must be woken up to release memory or release locks, depriving an actively executing backend of runtime. And it'd be as likely to create inefficiencies with copying and eviction as solve them since backends could easily land up taking turns kicking each other out of memory and re-reading their own data.I don't think this is something that should be tackled as part of work on global temp tables personally.
My assumptions are the following: temporary tables are mostly used in OLAP queries. And OLAP workload means that there are few concurrent queries which are working with large datasets.
So size of produced temporary tables can be quite big. For OLAP it seems to be very important to be able to use parallel query execution and use the same cache eviction rule both for persistent and temp tables
(otherwise you either cause swapping, either extra copying of data between OS and Postgres caches).
6. Connection pooler can not reschedule session which has created temporary tables to some other backend because it's data is stored in local buffers.Yeah, if you're using transaction-associative pooling. That's just part of a more general problem though, there are piles of related issues with temp tables, session GUCs, session advisory locks and more.I don't see how global temp tables will do you the slightest bit of good here as the data in them will still be backend-local. If it isn't then you should just be using unlogged tables.
You can not use the same unlogged table to save intermediate query results in two parallel sessions.
As I already wrote, I tried to kill two bird with one stone: eliminate catalog bloating and allow access to temp tables from multiple backends (to be able to perform parallel queries and connection pooling).Definition of this table (metadata) is shared by all backends but data
is private to the backend. After session termination data is obviously lost.+1 that's what a global temp table should be, and it's IIRC pretty much how the SQL standard specifies temp tables.I suspect I'm overlooking some complexities here, because to me it seems like we could implement these fairly simply. A new relkind would identify it as a global temp table and the relfilenode would be 0. Same for indexes on temp tables. We'd extend the relfilenode mapper to support a backend-local non-persistent relfilenode map that's used to track temp table and index relfilenodes. If no relfilenode is defined for the table, the mapper would allocate one. We already happily create missing relfilenodes on write so we don't even have to pre-create the actual file. We'd register the relfilenode as a tempfile and use existing tempfile cleanup mechanisms, and we'd use the temp tablespace to store it.I must be missing something important because it doesn't seem hard.
This is why I have to use shared buffers for global temp tables.
May be it was not so good idea. But it was one of my primary intention of publishing this patch to know opinion of other people.
In PG-Pro some of my colleagues think that the most critical problem is inability to use temporary tables at replica.
Other think that it is not a problem at all if you are using logical replication.
From my point of view the most critical problem is inability to use parallel plans for temporary tables.
But looks like you don't think so.
I see three different activities related with temporary tables:
1. Shared metadata
2. Shared buffers
3. Alternative concurrency control & reducing tuple header size (specialized table access method for temporary tables)
In my proposal I combined 1 and 2, leaving 3 for next step.
I will be interested to know other suggestions.
One more thing - 1 and 2 are really independent: you can share metadata without sharing buffers.
But introducing yet another kind of temporary tables seems to be really overkill:
- local temp tables (private namespace and lcoal buffers)
- tables with shared metadata but local bufferes
- tables with shared metadata and bufferes
The drawback of such approach is that it will be necessary to
reimplement large bulk of heapam code.
But this approach allows to eliminate visibility check for temporary
table tuples and decrease size of tuple header.That sounds potentially cool, but perhaps a "next step" thing? Allow the creation of global temp tables to specify reloptions, and you can add it as a reloption later. You can't actually eliminate visibility checks anyway because they're still MVCC heaps.
Sorry?
I mean elimination of MVCC overhead (visibility checks) for temp tables only.
I am not sure that we can really fully eliminate it if we support use of temp tables in prepared transactions and autonomous transactions (yet another awful feature we have in PgPro-EE).
Also looks like we need to have some analogue of CID to be able to correctly executed queries like "insert into T (select from T ...)" where T is global temp table.
I didn't think much about it, but I really considering new table access method API for reducing per-tuple storage overhead for temporary and append-only tables.
Yehh, subtransactions can be also a problem for eliminating xmin/xmax for temp tables. Thanks for noticing it.Savepoints can create invisible tuples even if you're using temp tables that are cleared on commit, and of course so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in practice.
I noticed that I have not patched some extension - fixed and rebased version of the patch is attached.
Also you can find this version in our github repository: https://github.com/postgrespro/postgresql.builtin_pool.git
branch global_temp.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
New version of the patch with several fixes is attached. Many thanks to Roman Zharkov for testing. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.
FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.
Can you explain? Because AFAICS so long as temp table contents are backend-private there's absolutely no point ever using shared buffers for their contents.
Perhaps you mean that in a connection pooling case, each backend may land up filling up temp buffers with contents from *multiple different temp tables*? If so, sure, I get that, but the answer there seems to be to improve eviction and memory accounting, not make backends waste precious shared_buffers space on non-shareable data.
Anyhow, I strongly suggest you simplify the feature to add the basic global temp table feature so the need to change pg_class, pg_attribute etc to use temp tables is removed, but separate changes to temp table memory handling etc into a follow-up patch. That'll make it smaller and easier to review and merge too. The two changes are IMO logically quite separate anyway.
Come to think of it, I think connection poolers might benefit from an extension to the DISCARD command, say "DISCARD TEMP_BUFFERS", which evicts temp table buffers from memory *without* dropping the temp tables. If they're currently in-memory tuplestores they'd be written out and evicted. That way a connection pooler could "clean" the backend, at the cost of some probably pretty cheap buffered writes to the system buffer cache. The kernel might not even bother to write out the buffercache and it won't be forced to do so by fsync, checkpoints, etc, nor will the writes go via WAL so such evictions could be pretty cheap - and if not under lots of memory pressure the backend could often read the temp table back in from system buffer cache without disk I/O.
That's my suggestion for how to solve your pooler problem, assuming I've understood it correctly.
Along these lines I suggest adding the following to DISCARD at some point, obviously not as part of your patch:
* DISCARD TEMP_BUFFERS
* DISCARD SHARED_BUFFERS
* DISCARD TEMP_FILES
* DISCARD CATALOG_CACHE
* DISCARD HOLD_CURSORS
* DISCARD ADVISORY_LOCKS
where obviously DISCARD SHARED_BUFFERS would be superuser-only and evict only clean buffers.
(Also, if we extend DISCARD lets also it to be written as DISCARD (LIST, OF, THINGS, TO, DISCARD) so that we can make the syntax extensible for plugins in future).
Thoughts?
Would DISCARD TEMP_BUFFERS meet your needs?
On 08.08.2019 5:40, Craig Ringer wrote:
On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.
Certainly here I mean built-in connection pooler which is not currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067
Sure, there is no such problem with temporary tables now.Can you explain? Because AFAICS so long as temp table contents are backend-private there's absolutely no point ever using shared buffers for their contents.
There is another problem: you can not use temporary table with any existed connection poolers (pgbouncer,...) with pooling level other than session unless temporary table is used inside one transaction.
One of the advantages of built-in connection pooler is that it can provide session semantic (GUCs, prepared statement, temporary tables,...) with limited number of backends (smaller than number of sessions).
In PgPRO-EE this problem was solved by binding session to backend. I.e. one backend can manage multiple sessions,
but session can not migrate to another backend. The drawback of such solution is obvious: one long living transaction can block transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious solutions of the problem. But the main show stopper for it is temporary tables.
This is why I consider moving temporary tables to shared buffers as very important step.
In vanilla version of built-in connection pooler situation is slightly different.
Right now if client is using temporary tables without "ON COMMIT DROP" clause, backend is marked as "tainted" and is pinned for this session.
So it is actually excluded from connection pool and servers only this session. Once again - if I will be able to access temporary table data from other backend, there will be no need to mark backend as tainted in this case.
Certainly it also requires shared metadata. And here we come to the concept of global temp tables (if we forget for a moment that global temp tables were "invented" long time ago by Oracle and many other DBMSes:)
Perhaps you mean that in a connection pooling case, each backend may land up filling up temp buffers with contents from *multiple different temp tables*? If so, sure, I get that, but the answer there seems to be to improve eviction and memory accounting, not make backends waste precious shared_buffers space on non-shareable data.Anyhow, I strongly suggest you simplify the feature to add the basic global temp table feature so the need to change pg_class, pg_attribute etc to use temp tables is removed, but separate changes to temp table memory handling etc into a follow-up patch. That'll make it smaller and easier to review and merge too. The two changes are IMO logically quite separate anyway.
I agree that them are separate.
But even if we forget about built-in connection pooler, don't you think that possibility to use parallel query plans for temporary tables is itself strong enough motivation to access global temp table through shared buffers
(while still supporting private page pool for local temp tables). So both approaches (shared vs. private buffers) have their pros and contras. This is why it seems to be reasonable to support both of them and let user to make choice most suitable for concrete application. Certainly it is possible to provide "global shared temp tables" and "global private temp tables". But IMHO it is overkill.
Come to think of it, I think connection poolers might benefit from an extension to the DISCARD command, say "DISCARD TEMP_BUFFERS", which evicts temp table buffers from memory *without* dropping the temp tables. If they're currently in-memory tuplestores they'd be written out and evicted. That way a connection pooler could "clean" the backend, at the cost of some probably pretty cheap buffered writes to the system buffer cache. The kernel might not even bother to write out the buffercache and it won't be forced to do so by fsync, checkpoints, etc, nor will the writes go via WAL so such evictions could be pretty cheap - and if not under lots of memory pressure the backend could often read the temp table back in from system buffer cache without disk I/O.
Yes, this is one of th possible solutions for session migration. But frankly speaking flushing local buffers on each session reschedule seems to be not so good solution. Even if OS file cache is large enough and flushed buffers are still present in memory (but them will be written to the disk in this case even if data of temp table is not intended to be persisted).
Actually I have already implemented DropLocalBuffers function (three line of code:)That's my suggestion for how to solve your pooler problem, assuming I've understood it correctly.Along these lines I suggest adding the following to DISCARD at some point, obviously not as part of your patch:* DISCARD TEMP_BUFFERS* DISCARD SHARED_BUFFERS* DISCARD TEMP_FILES* DISCARD CATALOG_CACHE* DISCARD HOLD_CURSORS* DISCARD ADVISORY_LOCKSwhere obviously DISCARD SHARED_BUFFERS would be superuser-only and evict only clean buffers.(Also, if we extend DISCARD lets also it to be written as DISCARD (LIST, OF, THINGS, TO, DISCARD) so that we can make the syntax extensible for plugins in future).Thoughts?Would DISCARD TEMP_BUFFERS meet your needs?
void
DropLocalBuffers(void)
{
RelFileNode rnode;
rnode.relNode = InvalidOid; /* drop all local buffers */
DropRelFileNodeAllLocalBuffers(rnode);
}
for yet another Postgres extension which is not yet included even in PgPRO-EE - SnapFS: support of database snapshots.
I do not think that we need such command at user level (i.e. have correspondent SQL command).
But, as I already wrote above, I do not consider flushing all buffers on session reschedule as acceptable solution.
And moreover, just flushing buffers is not enough. There is still some smgr stuff associated with this relation which is local to the backend.
We in any case has to make some changes to be able to access temporary data from other backend even if data is flushed to the file system.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 08.08.2019 5:40, Craig Ringer wrote:On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.
Certainly here I mean built-in connection pooler which is not currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067
OK, that's what I assumed.
You're trying to treat this change as if it's a given that the other functionality you want/propose is present in core or will be present in core. That's far from given. My suggestion is to split it up so that the parts can be reviewed and committed separately.
In PgPRO-EE this problem was solved by binding session to backend. I.e. one backend can manage multiple sessions,
but session can not migrate to another backend. The drawback of such solution is obvious: one long living transaction can block transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious solutions of the problem. But the main show stopper for it is temporary tables.
This is why I consider moving temporary tables to shared buffers as very important step.
I can see why it's important for your use case.
I am not disagreeing.
I am however strongly suggesting that your patch has two fairly distinct functional changes in it, and you should separate them out.
* Introduce global temp tables, a new relkind that works like a temp table but doesn't require catalog changes. Uses per-backend relfilenode and cleanup like existing temp tables. You could extend the relmapper to handle the mapping of relation oid to per-backend relfilenode.
* Associate global temp tables with session state and manage them in shared_buffers so they can work with the in-core connection pooler (if committed)
Historically we've had a few efforts to get in-core connection pooling that haven't gone anywhere. Without your pooler patch the changes you make to use shared_buffers etc are going to be unhelpful at best, if not actively harmful to performance, and will add unnecessary complexity. So I think there's a logical series of patches here:
* global temp table relkind and support for it
* session state separation
* connection pooling
* pooler-friendly temp tables in shared_buffers
Make sense?
But even if we forget about built-in connection pooler, don't you think that possibility to use parallel query plans for temporary tables is itself strong enough motivation to access global temp table through shared buffers?
I can see a way to share temp tables across parallel query backends being very useful for DW/OLAP workloads, yes. But I don't know if putting them in shared_buffers is the right answer for that. We have DSM/DSA, we have shm_mq, various options for making temp buffers share-able with parallel worker backends.
I'm suggesting that you not tie the whole (very useful) global temp tables feature to this, but instead split it up into logical units that can be understood, reviewed and committed separately.
I would gladly participate in review.
Would DISCARD TEMP_BUFFERS meet your needs?Actually I have already implemented DropLocalBuffers function (three line of code:)
[...]
I do not think that we need such command at user level (i.e. have correspondent SQL command).
I'd be very happy to have it personally, but don't think it needs to be tied in with your patch set here. Maybe I can cook up a patch soon.
On 09.08.2019 8:34, Craig Ringer wrote:
On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:On 08.08.2019 5:40, Craig Ringer wrote:On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:New version of the patch with several fixes is attached.
Many thanks to Roman Zharkov for testing.FWIW I still don't understand your argument with regards to using shared_buffers for temp tables having connection pooling benefits. Are you assuming the presence of some other extension in your extended version of PostgreSQL ? In community PostgreSQL a temp table's contents in one backend will not be visible in another backend. So if your connection pooler in transaction pooling mode runs txn 1 on backend 42 and it populates temp table X, then the pooler runs the same app session's txn 2 on backend 45, the contents of temp table X are not visible anymore.
Certainly here I mean built-in connection pooler which is not currently present in Postgres,
but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
https://commitfest.postgresql.org/24/2067OK, that's what I assumed.You're trying to treat this change as if it's a given that the other functionality you want/propose is present in core or will be present in core. That's far from given. My suggestion is to split it up so that the parts can be reviewed and committed separately.In PgPRO-EE this problem was solved by binding session to backend. I.e. one backend can manage multiple sessions,
but session can not migrate to another backend. The drawback of such solution is obvious: one long living transaction can block transactions of all other sessions scheduled to this backend.
Possibility to migrate session to another backend is one of the obvious solutions of the problem. But the main show stopper for it is temporary tables.
This is why I consider moving temporary tables to shared buffers as very important step.I can see why it's important for your use case.I am not disagreeing.I am however strongly suggesting that your patch has two fairly distinct functional changes in it, and you should separate them out.* Introduce global temp tables, a new relkind that works like a temp table but doesn't require catalog changes. Uses per-backend relfilenode and cleanup like existing temp tables. You could extend the relmapper to handle the mapping of relation oid to per-backend relfilenode.* Associate global temp tables with session state and manage them in shared_buffers so they can work with the in-core connection pooler (if committed)Historically we've had a few efforts to get in-core connection pooling that haven't gone anywhere. Without your pooler patch the changes you make to use shared_buffers etc are going to be unhelpful at best, if not actively harmful to performance, and will add unnecessary complexity. So I think there's a logical series of patches here:* global temp table relkind and support for it* session state separation* connection pooling* pooler-friendly temp tables in shared_buffersMake sense?But even if we forget about built-in connection pooler, don't you think that possibility to use parallel query plans for temporary tables is itself strong enough motivation to access global temp table through shared buffers?I can see a way to share temp tables across parallel query backends being very useful for DW/OLAP workloads, yes. But I don't know if putting them in shared_buffers is the right answer for that. We have DSM/DSA, we have shm_mq, various options for making temp buffers share-able with parallel worker backends.I'm suggesting that you not tie the whole (very useful) global temp tables feature to this, but instead split it up into logical units that can be understood, reviewed and committed separately.I would gladly participate in review.
Ok, here it is: global_private_temp-1.patch
Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch
It is certainly larger (~2k lines vs. 1.5k lines) because it is changing BufferTag and related functions.
But I do not think that this different is so critical.
Still have a wish to kill two birds with one stone:)
--
Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Ok, here it is: global_private_temp-1.patch
Fantastic.
I'll put that high on my queue.
I'd love to see something like this get in.
Doubly so if it brings us closer to being able to use temp tables on physical read replicas, though I know there are plenty of other barriers there (not least of which being temp tables using persistent txns not vtxids)
Does it have a CF entry?
Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch
Nice to see that split out. In addition to giving the first patch more hope of being committed this time around, it'll help with readability and testability too.
To be clear, I have long wanted to see PostgreSQL have the "session" state abstraction you have implemented. I think it's really important for high client count OLTP workloads, working with the endless collection of ORMs out there, etc. So I'm all in favour of it in principle so long as it can be made to work reliably with limited performance impact on existing workloads and without making life lots harder when adding new core functionality, for extension authors etc. The same goes for built-in pooling. I think PostgreSQL has needed some sort of separation of "connection", "backend", "session" and "executor" for a long time and I'm glad to see you working on it.
With that said: How do you intend to address the likelihood that this will cause performance regressions for existing workloads that use temp tables *without* relying on your session state and connection pooler? Consider workloads that use temp tables for mid-long txns where txn pooling is unimportant, where they also do plenty of read and write activity on persistent tables. Classic OLAP/DW stuff. e.g.:
* four clients, four backends, four connections, session-level connections that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large persistent table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate data for its medium-length transactions
* transactions also write to some persistent relations, say to record their summarised results
How does it perform with and without your patch? I'm concerned that:
* the extra buffer locking and various IPC may degrade performance of temp tables
* the temp table data in shared_buffers may put pressure on shared_buffers space, cached pages for persistent tables all sessions are sharing;
* the temp table data in shared_buffers may put pressure on shared_buffers space for dirty buffers, forcing writes of persistent tables out earlier therefore reducing write-combining opportunities;
On 10.08.2019 5:12, Craig Ringer wrote:
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Ok, here it is: global_private_temp-1.patchFantastic.I'll put that high on my queue.I'd love to see something like this get in.Doubly so if it brings us closer to being able to use temp tables on physical read replicas, though I know there are plenty of other barriers there (not least of which being temp tables using persistent txns not vtxids)Does it have a CF entry?
https://commitfest.postgresql.org/24/2233/
I agree that access to local buffers is cheaper than to shared buffers because there is no lock overhead.Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patchNice to see that split out. In addition to giving the first patch more hope of being committed this time around, it'll help with readability and testability too.To be clear, I have long wanted to see PostgreSQL have the "session" state abstraction you have implemented. I think it's really important for high client count OLTP workloads, working with the endless collection of ORMs out there, etc. So I'm all in favour of it in principle so long as it can be made to work reliably with limited performance impact on existing workloads and without making life lots harder when adding new core functionality, for extension authors etc. The same goes for built-in pooling. I think PostgreSQL has needed some sort of separation of "connection", "backend", "session" and "executor" for a long time and I'm glad to see you working on it.With that said: How do you intend to address the likelihood that this will cause performance regressions for existing workloads that use temp tables *without* relying on your session state and connection pooler? Consider workloads that use temp tables for mid-long txns where txn pooling is unimportant, where they also do plenty of read and write activity on persistent tables. Classic OLAP/DW stuff. e.g.:* four clients, four backends, four connections, session-level connections that stay busy with minimal client sleeps* All sessions run the same bench code* transactions all read plenty of data from a medium to large persistent table (think fact tables, etc)* transactions store a filtered, joined dataset with some pre-computed window results or something in temp tables* benchmark workload makes big-ish temp tables to store intermediate data for its medium-length transactions* transactions also write to some persistent relations, say to record their summarised resultsHow does it perform with and without your patch? I'm concerned that:* the extra buffer locking and various IPC may degrade performance of temp tables* the temp table data in shared_buffers may put pressure on shared_buffers space, cached pages for persistent tables all sessions are sharing;* the temp table data in shared_buffers may put pressure on shared_buffers space for dirty buffers, forcing writes of persistent tables out earlier therefore reducing write-combining opportunities;
And the fact that access to local tables can not affect cached data of persistent tables is also important.
But most of Postgres tables are still normal (persistent) tables access through shared buffers.
And huge amount of efforts were made to make this access as efficient as possible (use clock algorithm which doesn't require global lock,
atomic operations,...). Also using the same replacement discipline for all tables at some workloads may be also preferable.
So it is not so obvious to me that in the described scenario local buffer cache for temporary table really will provide significant advantages.
It will be interesting to perform some benchmarking - I am going to do it.
What I have observed right now is that in type scenario: dumping results of huge query to temporary table with subsequent traverse of this table
old (local) temporary tables provide better performance (may be because of small size of local buffer cache and different eviction policy).
But subsequent accesses to global shared table are faster (because it completely fits in large shared buffer cache).
There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.
There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.
Last point is probably the most difficult issue and I think about it years.
I have a experience with my customers so 99% of usage temp tables is without statistics - just with good information only about rows. Only few customers know so manual ANALYZE is necessary for temp tables (when it is really necessary).
Sharing meta data about global temporary tables can real problem - probably not about statistics, but surely about number of pages and number of rows.
There are two requirements:
a) we need some special meta data for any instance (per session) of global temporary table (row, pages, statistics, maybe multicolumn statistics, ...)
b) we would not to use persistent global catalogue (against catalogue bloating)
I see two possible solution:
1. hold these data only in memory in special buffers
2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.
I prefer @2 because changes can be implemented on deeper level.
Sharing metadata for global temp tables (current state if I understand well) is good enough for develop stage, but It is hard to expect so it can work generally in production environment.
Regards
p.s. I am very happy so you are working on this topic. It is interesting and important problem.
Pavel
Hi,
But Postgres is not storing this information now anywhere else except statistic, isn't it?
There was proposal to cache relation size, but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?
I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.
On 11.08.2019 10:14, Pavel Stehule wrote:
Hi
There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.Last point is probably the most difficult issue and I think about it years.I have a experience with my customers so 99% of usage temp tables is without statistics - just with good information only about rows. Only few customers know so manual ANALYZE is necessary for temp tables (when it is really necessary).Sharing meta data about global temporary tables can real problem - probably not about statistics, but surely about number of pages and number of rows.
But Postgres is not storing this information now anywhere else except statistic, isn't it?
There was proposal to cache relation size, but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?
There are two requirements:a) we need some special meta data for any instance (per session) of global temporary table (row, pages, statistics, maybe multicolumn statistics, ...)b) we would not to use persistent global catalogue (against catalogue bloating)I see two possible solution:1. hold these data only in memory in special buffers2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.I prefer @2 because changes can be implemented on deeper level.Sharing metadata for global temp tables (current state if I understand well) is good enough for develop stage, but It is hard to expect so it can work generally in production environment.
I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
po 12. 8. 2019 v 18:19 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
Hi,On 11.08.2019 10:14, Pavel Stehule wrote:Hi
There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.Last point is probably the most difficult issue and I think about it years.I have a experience with my customers so 99% of usage temp tables is without statistics - just with good information only about rows. Only few customers know so manual ANALYZE is necessary for temp tables (when it is really necessary).Sharing meta data about global temporary tables can real problem - probably not about statistics, but surely about number of pages and number of rows.
But Postgres is not storing this information now anywhere else except statistic, isn't it?
not only - critical numbers are reltuples, relpages from pg_class
There was proposal to cache relation size, but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?
they use default statistics based on relpages. But for 1% of applications statistics are critical - almost always for OLAP applications.
There are two requirements:a) we need some special meta data for any instance (per session) of global temporary table (row, pages, statistics, maybe multicolumn statistics, ...)b) we would not to use persistent global catalogue (against catalogue bloating)I see two possible solution:1. hold these data only in memory in special buffers2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.I prefer @2 because changes can be implemented on deeper level.Sharing metadata for global temp tables (current state if I understand well) is good enough for develop stage, but It is hard to expect so it can work generally in production environment.
I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.
I spoke only about metadata. Data should be stored in temp buffers (and possibly in temp files)
Pavel
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
But Postgres is not storing this information now anywhere else except statistic, isn't it?not only - critical numbers are reltuples, relpages from pg_class
That's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?
What happens right now if you ANALYZE or VACUUM ANALYZE a global temp table? Is it just disallowed?
I'll need to check, but I wonder if periodically updating those fields in pg_class impacts logical decoding too. Logical decoding must treat transactions with catalog changes as special cases where it creates custom snapshots and does other expensive additional work. (See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its callsites). We don't actually need to know relpages and reltuples during logical decoding. It can probably ignore relfrozenxid and relminmxid changes too, maybe even pg_statistic changes though I'd be less confident about that one.
At some point I need to patch in a bunch of extra tracepoints and do some perf tracing to see how often we do potentially unnecessary snapshot related work in logical decoding.
There was proposal to cache relation size, but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?they use default statistics based on relpages. But for 1% of applications statistics are critical - almost always for OLAP applications.
Agreed. It's actually quite a common solution to user problem reports / support queries about temp table performance: "Run ANALYZE. Consider creating indexes too."
Which reminds me - if global temp tables do get added, it'll further increase the desirability of exposing a feature to let users disable+invalidate and then later reindex+enable indexes without icky catalog hacking. So they can disable indexes for their local copy, load data, re-enable indexes. That'd be "interesting" to implement for global temp tables given that index state is part of the pg_index row associated with an index rel though.
1. hold these data only in memory in special buffers
I don't see that working well for pg_statistic or anything else that holds nontrivial user data though.
2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.
Very meta. Syscache and relcache are extremely performance critical but could probably skip scans entirely on backends that haven't used any global temp tables.
I don't know the relevant caches well enough to have a useful opinion here.
I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.
It'd only be the metadata, but if it includes things like column histograms and most frequent value data that'd still be undesirable to have pinned in backend memory.
On 13.08.2019 8:34, Craig Ringer wrote:
As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:But Postgres is not storing this information now anywhere else except statistic, isn't it?not only - critical numbers are reltuples, relpages from pg_classThat's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?
What happens right now if you ANALYZE or VACUUM ANALYZE a global temp table? Is it just disallowed?
No, it is not disallowed now.
It updates the statistic and also fields in pg_class which are shared by all backends.
So all backends will now build plans according to this statistic. Certainly it may lead to not so efficient plans if there are large differences in number of tuples stored in this table in different backends.
But seems to me critical mostly in case of presence of indexes for temporary table. And it seems to me that users are created indexes for temporary tables even rarely than doing analyze for them.
I'll need to check, but I wonder if periodically updating those fields in pg_class impacts logical decoding too. Logical decoding must treat transactions with catalog changes as special cases where it creates custom snapshots and does other expensive additional work. (See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its callsites). We don't actually need to know relpages and reltuples during logical decoding. It can probably ignore relfrozenxid and relminmxid changes too, maybe even pg_statistic changes though I'd be less confident about that one.At some point I need to patch in a bunch of extra tracepoints and do some perf tracing to see how often we do potentially unnecessary snapshot related work in logical decoding.
Temporary tables (both local and global) as well as unlogged tables are not subject of logical replication, aren't them?
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Ok, here it is: global_private_temp-1.patch
Initial pass review follows.
Relation name "SESSION" is odd. I guess you're avoiding "global" because the data is session-scoped, not globally temporary. But I'm not sure "session" fits either - after all regular temp tables are also session temporary tables. I won't focus on naming further beyond asking that it be consistent though, right now there's a mix of "global" in some places and "session" in others.
Why do you need to do all this indirection with changing RelFileNode to RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly, your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp . Did you look into my suggestion of extending the relmapper so that global temp tables would have a relfilenode of 0 like pg_class etc, and use a backend-local map of oid-to-relfilenode mappings? I'm guessing you did it the way you did instead to lay the groundwork for cross-backend sharing, but if so it should IMO be in that patch. Maybe my understanding of the existing temp table mechanics is just insufficient as I see RelFileNodeBackendIsTemp is already used in some aspects of existing temp relation handling.
Similarly, TruncateSessionRelations probably shouldn't need to exist in this patch in its current form; there's no shared_buffers use to clean and the same file cleanup mechanism should handle both session-temp and local-temp relfilenodes.
A number of places make a change like this:
+ rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION
and I'd like to see a test macro or inline static for it since it's repeated so much. Mostly to make the intent clear: "is this a relation with temporary backend-scoped data?"
This test:
+ if (blkno == BTREE_METAPAGE && PageIsNew(BufferGetPage(buf)) && IsSessionRelationBackendId(rel->rd_backend))
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);
seems sensible but I'm wondering if there's a way to channel initialization of global-temp objects through a bit more of a common path, so it reads more obviously as a common test applying to all global-temp tables. "Global temp table not initialized in session yet? Initialize it." So we don't have to have every object type do an object type specific test for "am I actually uninitialized?" in all paths it might hit. I guess I expected to see something more like a
if (RelGlobalTempUninitialized(rel))
but maybe I've been doing too much Java ;)
A similar test reappears here for sequences:
+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION && PageIsNew(page))
Why is this written differently?
Sequence initialization ignores sequence startval/firstval settings. Why?
- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)
Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?
In PreCommit_on_commit_actions, in the the ONCOMMIT_DELETE_ROWS case, is there any way to still respect the XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag if the oid is for a backend-temp table not a global-temp table?
+ bool isLocalBuf = SmgrIsTemp(smgr) && relpersistence == RELPERSISTENCE_TEMP;
Won't session-temp tables have local buffers too? Until your next patch that adds shared_buffers storage for them anyway?
+ * These is no need to separate them at file system level, so just subtract SessionRelFirstBackendId
+ * to avoid too long file names.
+ * to avoid too long file names.
I agree that there's no reason to be able to differentiate between local-temp and session-temp relfilenodes at the filesystem level.
Also I have attached updated version of the global temp tables with shared buffers - global_shared_temp-1.patch
It is certainly larger (~2k lines vs. 1.5k lines) because it is changing BufferTag and related functions.
But I do not think that this different is so critical.
Still have a wish to kill two birds with one stone:)--Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, 13 Aug 2019 at 16:19, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 13.08.2019 8:34, Craig Ringer wrote:As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:But Postgres is not storing this information now anywhere else except statistic, isn't it?not only - critical numbers are reltuples, relpages from pg_classThat's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?
Also autovacuum's autoanalyze.
What happens right now if you ANALYZE or VACUUM ANALYZE a global temp table? Is it just disallowed?
No, it is not disallowed now.
It updates the statistic and also fields in pg_class which are shared by all backends.
So all backends will now build plans according to this statistic. Certainly it may lead to not so efficient plans if there are large differences in number of tuples stored in this table in different backends.
But seems to me critical mostly in case of presence of indexes for temporary table. And it seems to me that users are created indexes for temporary tables even rarely than doing analyze for them.
That doesn't seem too bad TBH. Hacky but it doesn't seem dangerously wrong and as likely to be helpful as not if clearly documented.
Temporary tables (both local and global) as well as unlogged tables are not subject of logical replication, aren't them?
Right. But in the same way that they're still present in the catalogs, I think they still affect catalog snapshots maintained by logical decoding's historic snapshot manager as temp table creation/drop will still AFAIK cause catalog invalidations to be written on commit. I need to double check that.
On 13.08.2019 11:21, Craig Ringer wrote:
I have supported both forms "create session table" and "create global temp".On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Ok, here it is: global_private_temp-1.patchInitial pass review follows.Relation name "SESSION" is odd. I guess you're avoiding "global" because the data is session-scoped, not globally temporary. But I'm not sure "session" fits either - after all regular temp tables are also session temporary tables. I won't focus on naming further beyond asking that it be consistent though, right now there's a mix of "global" in some places and "session" in others.
Both "session" and "global" are expected PostgreSQL keywords so we do not need to introduce new one (unlike "public" or "shared").
The form "global temp" is used in Oracle so it seems to be natural to provide similar syntax.
I am not insisting on this syntax and will consider all other suggestions.
But IMHO almost any SQL keyword is overloaded and have different meaning.
Temporary tables has session as living area (or transaction if created with "ON COMMIT DROP" clause).
So calling them "session tables" is actually more clear than just "temporary tables".
But "local temp tables" can be also considered as session tables. So may be it is really not so good idea
to use "session" keyword for them.
Why do you need to do all this indirection with changing RelFileNode to RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly, your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp . Did you look into my suggestion of extending the relmapper so that global temp tables would have a relfilenode of 0 like pg_class etc, and use a backend-local map of oid-to-relfilenode mappings? I'm guessing you did it the way you did instead to lay the groundwork for cross-backend sharing, but if so it should IMO be in that patch. Maybe my understanding of the existing temp table mechanics is just insufficient as I see RelFileNodeBackendIsTemp is already used in some aspects of existing temp relation handling.
Sorry, are you really speaking about global_private_temp-1.patch?
This patch doesn't change bufmgr file at all.
May be you looked at another patch - global_shared_temp-1.patch
which is accessing shared tables though shared buffers and so have to change buffer tag to include backend ID in it.
In global_private_temp-1.patch TruncateSessionRelations does nothing with shared buffers, it just delete relation files.Similarly, TruncateSessionRelations probably shouldn't need to exist in this patch in its current form; there's no shared_buffers use to clean and the same file cleanup mechanism should handle both session-temp and local-temp relfilenodes.
I consider to call such macro IsSessionRelation() or something like this but you do not like notion "session".A number of places make a change like this:rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||+ rel->rd_rel->relpersistence == RELPERSISTENCE_SESSIONand I'd like to see a test macro or inline static for it since it's repeated so much. Mostly to make the intent clear: "is this a relation with temporary backend-scoped data?"
Is IsBackendScopedRelation() a better choice?
Just because I wrote them in different moment of times:)This test:+ if (blkno == BTREE_METAPAGE && PageIsNew(BufferGetPage(buf)) && IsSessionRelationBackendId(rel->rd_backend))
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);seems sensible but I'm wondering if there's a way to channel initialization of global-temp objects through a bit more of a common path, so it reads more obviously as a common test applying to all global-temp tables. "Global temp table not initialized in session yet? Initialize it." So we don't have to have every object type do an object type specific test for "am I actually uninitialized?" in all paths it might hit. I guess I expected to see something more like aif (RelGlobalTempUninitialized(rel))but maybe I've been doing too much Java ;)A similar test reappears here for sequences:+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION && PageIsNew(page))Why is this written differently?
I think that adding RelGlobalTempUninitialized is really good idea.
I am handling only case of implicitly created sequences for SERIAL/BIGSERIAL columns.Sequence initialization ignores sequence startval/firstval settings. Why?
Is it possible to explicitly specify initial value and step for them?
If so, this place should definitely be rewritten.
RELPERSISTENCE_UNLOGGED case is handle in previous IF branch.- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?
If it is local temp table, then XACT_FLAGS_ACCESSEDTEMPNAMESPACE is set and so there is no need to check this flag.In PreCommit_on_commit_actions, in the the ONCOMMIT_DELETE_ROWS case, is there any way to still respect the XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag if the oid is for a backend-temp table not a global-temp table?
And as far as XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set now for global temp tables, I have to remove this check.
So for local temp table original behavior is preserved.
The question is why I do not set XACT_FLAGS_ACCESSEDTEMPNAMESPACE for global temp tables?
I wanted to avoid current limitation for using temp tables in prepared transactions.
Global metadata allows to eliminate some problems related with using temp tables in 2PC.
But I am not sure that it eliminates ALL problems and there are no strange effects related with
prepared transactions&global temp tables.
Once again, it is change from global_shared_temp-1.patch, not from global_private_temp-1.patch+ bool isLocalBuf = SmgrIsTemp(smgr) && relpersistence == RELPERSISTENCE_TEMP;Won't session-temp tables have local buffers too? Until your next patch that adds shared_buffers storage for them anyway?
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 13.08.2019 11:27, Craig Ringer wrote:
On Tue, 13 Aug 2019 at 16:19, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:On 13.08.2019 8:34, Craig Ringer wrote:As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:But Postgres is not storing this information now anywhere else except statistic, isn't it?not only - critical numbers are reltuples, relpages from pg_classThat's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?Also autovacuum's autoanalyze.
When it happen?
I have created normal table, populated it with some data and then wait several hours but pg_class was not updated for this table.
I attach to this mail slightly refactored versions of this patches with fixes of issues reported in your review.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, 13 Aug 2019 at 21:50, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.Also autovacuum's autoanalyze.
When it happen?
I have created normal table, populated it with some data and then wait several hours but pg_class was not updated for this table.
heap_vacuum_rel() in src/backend/access/heap/vacuumlazy.c below
* Update statistics in pg_class.
which I'm pretty sure is common to explicit vacuum and autovacuum. I haven't run up a test to verify 100% but most DBs would never have relpages etc set if autovac didn't do it since most aren't explicitly VACUUMed at all.
I thought it was done when autovac ran an analyze, but it looks like it's all autovac. Try setting very aggressive autovac thresholds and inserting + deleting a bunch of tuples maybe.
I attach to this mail slightly refactored versions of this patches with fixes of issues reported in your review.
Thanks.
Did you have a chance to consider my questions too? I see a couple of things where there's no patch change, which is fine, but I'd be interested in your thoughts on the question/issue in those cases.
On 16.08.2019 9:25, Craig Ringer wrote:
On Tue, 13 Aug 2019 at 21:50, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:As far as I understand relpages and reltuples are set only when you perform "analyze" of the table.Also autovacuum's autoanalyze.
When it happen?
I have created normal table, populated it with some data and then wait several hours but pg_class was not updated for this table.heap_vacuum_rel() in src/backend/access/heap/vacuumlazy.c below* Update statistics in pg_class.which I'm pretty sure is common to explicit vacuum and autovacuum. I haven't run up a test to verify 100% but most DBs would never have relpages etc set if autovac didn't do it since most aren't explicitly VACUUMed at all.
Sorry, I already understood it myself.
But to make vacuum process the table it is necessary to remove or update some rows in it.
It seems to be yet another Postgres problem, which was noticed by Darafei Praliaskouski some time ago: append-only tables are never proceeded by autovacuum.
Sorry, may be I didn't notice some your questions. I have a filling that I have replied on all your comments/questions.I thought it was done when autovac ran an analyze, but it looks like it's all autovac. Try setting very aggressive autovac thresholds and inserting + deleting a bunch of tuples maybe.I attach to this mail slightly refactored versions of this patches with fixes of issues reported in your review.Thanks.Did you have a chance to consider my questions too? I see a couple of things where there's no patch change, which is fine, but I'd be interested in your thoughts on the question/issue in those cases.
Right now I reread all this thread and see two open issues:
1. Statistic for global temporary tables (including number of tuples, pages and all visible flag).
My position is the following: while in most cases it should not be a problem, because users rarely create indexes or do analyze for temporary tables,
there can be situations when differences in data sets of global temporary tables in different backends can really be a problem.
Unfortunately I can not propose good solution for this problem. It is certainly possible to create some private (per-backend) cache for this metadata.
But it seems to requires changes in many places.
2. Your concerns about performance penalty of global temp tables accessed through shared buffers comparing with local temp tables access through local buffers.
I think that this concern is not actual any more because there is implementation of global temp tables using local buffers.
But my experiments doesn't show significant difference in access speed of shared and local buffers. As far as shared buffers are used to be much larger than local buffers,
there are more chances to hold all temp relation in memory without spilling it to the disk. In this case access to global temp table will be much faster comparing with access to
local temp tables. But the fact is that right now in the most frequent scenario of temp table usage:
SELECT ... FROM PersistentTable INTO TempTable WHERE ...;
SELECT * FROM TempTable;
local temp table are more efficient than global temp table access through shared buffer.
I think it is explained by caching and eviction policies.
In case of pulling all content of temp table in memory (pg_prewarm) global temp table with shared buffers becomes faster.
I forget or do not notice some of your questions, would you be so kind as to repeat them?
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, 16 Aug 2019 at 15:30, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
1. Statistic for global temporary tables (including number of tuples, pages and all visible flag).
My position is the following: while in most cases it should not be a problem, because users rarely create indexes or do analyze for temporary tables,
there can be situations when differences in data sets of global temporary tables in different backends can really be a problem.
Unfortunately I can not propose good solution for this problem. It is certainly possible to create some private (per-backend) cache for this metadata.
But it seems to requires changes in many places.
Yeah. I don't really like just sharing them but it's not that bad either.
2. Your concerns about performance penalty of global temp tables accessed through shared buffers comparing with local temp tables access through local buffers.
I think that this concern is not actual any more because there is implementation of global temp tables using local buffers.
But my experiments doesn't show significant difference in access speed of shared and local buffers. As far as shared buffers are used to be much larger than local buffers,
there are more chances to hold all temp relation in memory without spilling it to the disk. In this case access to global temp table will be much faster comparing with access to
local temp tables.
You ignore the costs of evicting non-temporary data from shared_buffers, i.e. contention for space. Also increased chance of backends being forced to do direct write-out due to lack of s_b space for dirty buffers.
> In case of pulling all content of temp table in memory (pg_prewarm) global temp table with shared buffers becomes faster.
Who would ever do that?
I forget or do not notice some of your questions, would you be so kind as to repeat them?
I forget or do not notice some of your questions, would you be so kind as to repeat them?
Repeating questions:
Why do you need to do all this indirection with changing RelFileNode to RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly, your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp . I'm guessing you did it the way you did instead to lay the groundwork for cross-backend sharing, but if so it should IMO be in your second patch that adds support for using shared_buffers for temp tables, not in the first patch that adds a minimal global temp tables implementation. Maybe my understanding of the existing temp table mechanics is just insufficient as I see RelFileNodeBackendIsTemp is already used in some aspects of existing temp relation handling.
Did you look into my suggestion of extending the relmapper so that global temp tables would have a relfilenode of 0 like pg_class etc, and use a backend-local map of oid-to-relfilenode mappings?
Similarly, TruncateSessionRelations probably shouldn't need to exist in this patch in its current form; there's no shared_buffers use to clean and the same file cleanup mechanism should handle both session-temp and local-temp relfilenodes.
Sequence initialization ignores sequence startval/firstval settings. Why?
+ value[SEQ_COL_LASTVAL-1] = Int64GetDatumFast(1); /* start sequence with 1 */
Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?:
- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)
On 16.08.2019 11:37, Craig Ringer wrote:
I forget or do not notice some of your questions, would you be so kind as to repeat them?Sent early by accident.Repeating questions:
Sorry, but I have answered them (my e-mail from 13.08)!
Looks like you have looed at wrong version of the patch:
global_shared_temp-1.patch instead of global_private_temp-1.patch which implements global tables accessed through local buffers.
Why do you need to do all this indirection with changing RelFileNode to RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly, your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp . I'm guessing you did it the way you did instead to lay the groundwork for cross-backend sharing, but if so it should IMO be in your second patch that adds support for using shared_buffers for temp tables, not in the first patch that adds a minimal global temp tables implementation. Maybe my understanding of the existing temp table mechanics is just insufficient as I see RelFileNodeBackendIsTemp is already used in some aspects of existing temp relation handling.
Sorry, are you really speaking about global_private_temp-1.patch?
This patch doesn't change bufmgr file at all.
May be you looked at another patch - global_shared_temp-1.patch
which is accessing shared tables though shared buffers and so have to change buffer tag to include backend ID in it.
Did you look into my suggestion of extending the relmapper so that global temp tables would have a relfilenode of 0 like pg_class etc, and use a backend-local map of oid-to-relfilenode mappings?Similarly, TruncateSessionRelations probably shouldn't need to exist in this patch in its current form; there's no shared_buffers use to clean and the same file cleanup mechanism should handle both session-temp and local-temp relfilenodes.
In global_private_temp-1.patch TruncateSessionRelations does nothing with shared buffers, it just delete relation files.
Sequence initialization ignores sequence startval/firstval settings. Why?+ value[SEQ_COL_LASTVAL-1] = Int64GetDatumFast(1); /* start sequence with 1 */
I am handling only case of implicitly created sequences for SERIAL/BIGSERIAL columns.
Is it possible to explicitly specify initial value and step for them?
If so, this place should definitely be rewritten.
Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?:- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)
RELPERSISTENCE_UNLOGGED case is handle in previous IF branch.
-
Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 16.08.2019 11:32, Craig Ringer wrote:
You ignore the costs of evicting non-temporary data from shared_buffers, i.e. contention for space. Also increased chance of backends being forced to do direct write-out due to lack of s_b space for dirty buffers.> In case of pulling all content of temp table in memory (pg_prewarm) global temp table with shared buffers becomes faster.Who would ever do that?
I decided to redo my experiments and now get different results which illustrates advantages of global temp tables with shared buffer.
I performed the following test at my desktop with SSD and 16GB of RAM and Postgres with default configuration except shared-buffers increased to 1Gb.
postgres=# create table big(pk bigint primary key, val bigint);
CREATE TABLE
postgres=# insert into big values (generate_series(1,100000000),generate_series(1,100000000)/100);
INSERT 0 100000000
postgres=# select * from buffer_usage limit 3;
relname | buffered | buffer_percent | percent_of_relation
----------------+------------+----------------+---------------------
big | 678 MB | 66.2 | 16.1
big_pkey | 344 MB | 33.6 | 16.1
pg_am | 8192 bytes | 0.0 | 20.0
postgres=# create temp table lt(key bigint, count bigint);
postgres=# \timing
Timing is on.
postgres=# insert into lt (select count(*),val as key from big group by val);
INSERT 0 1000001
Time: 43265.491 ms (00:43.265)
postgres=# select sum(count) from lt;
sum
--------------
500000500000
(1 row)
Time: 94.194 ms
postgres=# insert into gt (select count(*),val as key from big group by val);
INSERT 0 1000001
Time: 42952.671 ms (00:42.953)
postgres=# select sum(count) from gt;
sum
--------------
500000500000
(1 row)
Time: 35.906 ms
postgres=# select * from buffer_usage limit 3;
relname | buffered | buffer_percent | percent_of_relation
----------+----------+----------------+---------------------
big | 679 MB | 66.3 | 16.1
big_pkey | 300 MB | 29.3 | 14.0
gt | 42 MB | 4.1 | 100.0
So time of storing result in global temp table is slightly smaller than time of storing it in local temp table and time of scanning global temp table is twice smaller!
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data | 1% | 10% | 100% |
Local temp table | 44610 90 | 47920 891 | 63414 21612 |
Global temp table | 44669 35 | 47939 298 | 59159 26015 |
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert | Select | |
Local temp table | 37489 | 48322 |
Global temp table | 44358 | 3003 |
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data 1% 10% 100% Local temp table 44610
9047920
89163414
21612Global temp table 44669
3547939
29859159
26015
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert Select Local temp table 37489 48322 Global temp table 44358 3003
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.
Probably there is not a reason why shared buffers should be slower than local buffers when system is under low load.
access to shared memory is protected by spin locks (are cheap for few processes), so tests in one or few process are not too important (or it is just one side of space)
another topic can be performance on MS Sys - there are stories about not perfect performance of shared memory there.
Regards
Pavel
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 16.08.2019 20:17, Pavel Stehule wrote:
One more test which is used to simulate access to temp tables under high load.pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data 1% 10% 100% Local temp table 44610
9047920
89163414
21612Global temp table 44669
3547939
29859159
26015
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert Select Local temp table 37489 48322 Global temp table 44358 3003
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.Probably there is not a reason why shared buffers should be slower than local buffers when system is under low load.access to shared memory is protected by spin locks (are cheap for few processes), so tests in one or few process are not too important (or it is just one side of space)another topic can be performance on MS Sys - there are stories about not perfect performance of shared memory there.RegardsPavel
I am using "upsert" into temp table in multiple connections.
create global temp table gtemp (x integer primary key, y bigint);
upsert.sql:
insert into gtemp values (random() * 1000000, 0) on conflict(x) do update set y=gtemp.y+1;
pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres
I failed to find some standard way in pgbech to perform per-session initialization to create local temp table,
so I just insert this code in pgbench code:
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..af6a431 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,7 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
}
}
Results are the following:
Global temp table: 117526 TPS
Local temp table: 107802 TPS
So even for this workload global temp table with shared buffers are a little bit faster.
I will be pleased if you can propose some other testing scenario.
ne 18. 8. 2019 v 9:02 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 16.08.2019 20:17, Pavel Stehule wrote:One more test which is used to simulate access to temp tables under high load.pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data 1% 10% 100% Local temp table 44610
9047920
89163414
21612Global temp table 44669
3547939
29859159
26015
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert Select Local temp table 37489 48322 Global temp table 44358 3003
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.Probably there is not a reason why shared buffers should be slower than local buffers when system is under low load.access to shared memory is protected by spin locks (are cheap for few processes), so tests in one or few process are not too important (or it is just one side of space)another topic can be performance on MS Sys - there are stories about not perfect performance of shared memory there.RegardsPavel
I am using "upsert" into temp table in multiple connections.
create global temp table gtemp (x integer primary key, y bigint);
upsert.sql:
insert into gtemp values (random() * 1000000, 0) on conflict(x) do update set y=gtemp.y+1;
pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres
I failed to find some standard way in pgbech to perform per-session initialization to create local temp table,
so I just insert this code in pgbench code:
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..af6a431 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,7 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
}
}
Results are the following:
Global temp table: 117526 TPS
Local temp table: 107802 TPS
So even for this workload global temp table with shared buffers are a little bit faster.
I will be pleased if you can propose some other testing scenario.
please, try to increase number of connections.
Regards
Pavel
On 18.08.2019 11:28, Pavel Stehule wrote:
ne 18. 8. 2019 v 9:02 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:On 16.08.2019 20:17, Pavel Stehule wrote:One more test which is used to simulate access to temp tables under high load.pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data 1% 10% 100% Local temp table 44610
9047920
89163414
21612Global temp table 44669
3547939
29859159
26015
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert Select Local temp table 37489 48322 Global temp table 44358 3003
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.Probably there is not a reason why shared buffers should be slower than local buffers when system is under low load.access to shared memory is protected by spin locks (are cheap for few processes), so tests in one or few process are not too important (or it is just one side of space)another topic can be performance on MS Sys - there are stories about not perfect performance of shared memory there.RegardsPavel
I am using "upsert" into temp table in multiple connections.
create global temp table gtemp (x integer primary key, y bigint);
upsert.sql:
insert into gtemp values (random() * 1000000, 0) on conflict(x) do update set y=gtemp.y+1;
pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres
I failed to find some standard way in pgbech to perform per-session initialization to create local temp table,
so I just insert this code in pgbench code:
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..af6a431 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,7 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
}
}
Results are the following:
Global temp table: 117526 TPS
Local temp table: 107802 TPS
So even for this workload global temp table with shared buffers are a little bit faster.
I will be pleased if you can propose some other testing scenario.please, try to increase number of connections.
With 20 connections and 4 pgbench threads results are similar: 119k TPS for global temp tables and 115k TPS for local temp tables.
I have tried yet another scenario: read-only access to temp tables:
\set id random(1,10000000)
select sum(y) from ltemp where x=:id;
Tables are created and initialized in pgbench session startup:
knizhnik@knizhnik:~/postgresql$ git diff
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..95295b0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,8 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
+ executeStatement(state[i].con, "insert into ltemp values (generate_series(1,1000000), generate_series(1,1000000))");
}
}
Results for 10 connections with 10 million inserted records per table and 100 connections with 1 million inserted record per table :
#connections: | 10 | 100 |
local temp | 68k | 90k |
global temp, shared_buffers=1G | 63k | 61k |
global temp, shared_buffers=10G | 150k | 150k |
So temporary tables with local buffers are slightly faster when data doesn't fit in shared buffers, but significantly slower when it fits.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 19.08.2019 11:51, Konstantin Knizhnik wrote:
On 18.08.2019 11:28, Pavel Stehule wrote:ne 18. 8. 2019 v 9:02 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:On 16.08.2019 20:17, Pavel Stehule wrote:One more test which is used to simulate access to temp tables under high load.pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data 1% 10% 100% Local temp table 44610
9047920
89163414
21612Global temp table 44669
3547939
29859159
26015
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert Select Local temp table 37489 48322 Global temp table 44358 3003
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.Probably there is not a reason why shared buffers should be slower than local buffers when system is under low load.access to shared memory is protected by spin locks (are cheap for few processes), so tests in one or few process are not too important (or it is just one side of space)another topic can be performance on MS Sys - there are stories about not perfect performance of shared memory there.RegardsPavel
I am using "upsert" into temp table in multiple connections.
create global temp table gtemp (x integer primary key, y bigint);
upsert.sql:
insert into gtemp values (random() * 1000000, 0) on conflict(x) do update set y=gtemp.y+1;
pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres
I failed to find some standard way in pgbech to perform per-session initialization to create local temp table,
so I just insert this code in pgbench code:
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..af6a431 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,7 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
}
}
Results are the following:
Global temp table: 117526 TPS
Local temp table: 107802 TPS
So even for this workload global temp table with shared buffers are a little bit faster.
I will be pleased if you can propose some other testing scenario.please, try to increase number of connections.
With 20 connections and 4 pgbench threads results are similar: 119k TPS for global temp tables and 115k TPS for local temp tables.
I have tried yet another scenario: read-only access to temp tables:
\set id random(1,10000000)
select sum(y) from ltemp where x=:id;
Tables are created and initialized in pgbench session startup:
knizhnik@knizhnik:~/postgresql$ git diff
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..95295b0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,8 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
+ executeStatement(state[i].con, "insert into ltemp values (generate_series(1,1000000), generate_series(1,1000000))");
}
}
Results for 10 connections with 10 million inserted records per table and 100 connections with 1 million inserted record per table :
#connections: 10 100 local temp 68k 90k global temp, shared_buffers=1G 63k 61k global temp, shared_buffers=10G 150k 150k
So temporary tables with local buffers are slightly faster when data doesn't fit in shared buffers, but significantly slower when it fits.
All previously reported results were produced at my desktop.
I also run this read-only test on huge IBM server (POWER9, 2 NUMA nodes, 176 CPU, 1Tb RAM).
Here the difference between local and global tables is not so large:
Local temp: 739k TPS
Global temp: 924k TPS
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
po 19. 8. 2019 v 13:16 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 19.08.2019 11:51, Konstantin Knizhnik wrote:On 18.08.2019 11:28, Pavel Stehule wrote:ne 18. 8. 2019 v 9:02 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:On 16.08.2019 20:17, Pavel Stehule wrote:One more test which is used to simulate access to temp tables under high load.pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data 1% 10% 100% Local temp table 44610
9047920
89163414
21612Global temp table 44669
3547939
29859159
26015
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert Select Local temp table 37489 48322 Global temp table 44358 3003
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.Probably there is not a reason why shared buffers should be slower than local buffers when system is under low load.access to shared memory is protected by spin locks (are cheap for few processes), so tests in one or few process are not too important (or it is just one side of space)another topic can be performance on MS Sys - there are stories about not perfect performance of shared memory there.RegardsPavel
I am using "upsert" into temp table in multiple connections.
create global temp table gtemp (x integer primary key, y bigint);
upsert.sql:
insert into gtemp values (random() * 1000000, 0) on conflict(x) do update set y=gtemp.y+1;
pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres
I failed to find some standard way in pgbech to perform per-session initialization to create local temp table,
so I just insert this code in pgbench code:
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..af6a431 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,7 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
}
}
Results are the following:
Global temp table: 117526 TPS
Local temp table: 107802 TPS
So even for this workload global temp table with shared buffers are a little bit faster.
I will be pleased if you can propose some other testing scenario.please, try to increase number of connections.
With 20 connections and 4 pgbench threads results are similar: 119k TPS for global temp tables and 115k TPS for local temp tables.
I have tried yet another scenario: read-only access to temp tables:
\set id random(1,10000000)
select sum(y) from ltemp where x=:id;
Tables are created and initialized in pgbench session startup:
knizhnik@knizhnik:~/postgresql$ git diff
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..95295b0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,8 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
+ executeStatement(state[i].con, "insert into ltemp values (generate_series(1,1000000), generate_series(1,1000000))");
}
}
Results for 10 connections with 10 million inserted records per table and 100 connections with 1 million inserted record per table :
#connections: 10 100 local temp 68k 90k global temp, shared_buffers=1G 63k 61k global temp, shared_buffers=10G 150k 150k
So temporary tables with local buffers are slightly faster when data doesn't fit in shared buffers, but significantly slower when it fits.
All previously reported results were produced at my desktop.
I also run this read-only test on huge IBM server (POWER9, 2 NUMA nodes, 176 CPU, 1Tb RAM).
Here the difference between local and global tables is not so large:
Local temp: 739k TPS
Global temp: 924k TPS
is not difference between local temp buffers and global temp buffers by too low value of TEMP_BUFFERS?
Pavel
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 19.08.2019 14:25, Pavel Stehule wrote:
po 19. 8. 2019 v 13:16 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:On 19.08.2019 11:51, Konstantin Knizhnik wrote:On 18.08.2019 11:28, Pavel Stehule wrote:ne 18. 8. 2019 v 9:02 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:On 16.08.2019 20:17, Pavel Stehule wrote:One more test which is used to simulate access to temp tables under high load.pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:I did more investigations of performance of global temp tables with shared buffers vs. vanilla (local) temp tables.
1. Combination of persistent and temporary tables in the same query.
Preparation:
create table big(pk bigint primary key, val bigint);
insert into big values (generate_series(1,100000000),generate_series(1,100000000));
create temp table lt(key bigint, count bigint);
create global temp table gt(key bigint, count bigint);
Size of table is about 6Gb, I run this test on desktop with 16GB of RAM and postgres with 1Gb shared buffers.
I run two queries:
insert into T (select count(*),pk/P as key from big group by key);
select sum(count) from T;
where P is (100,10,1) and T is name of temp table (lt or gt).
The table below contains times of both queries in msec:
Percent of selected data 1% 10% 100% Local temp table 44610
9047920
89163414
21612Global temp table 44669
3547939
29859159
26015
As you can see, time of insertion in temporary table is almost the same
and time of traversal of temporary table is about twice smaller for global temp table
when it fits in RAM together with persistent table and slightly worser when it doesn't fit.
2. Temporary table only access.
The same system, but Postgres is configured with shared_buffers=10GB, max_parallel_workers = 4, max_parallel_workers_per_gather = 4
Local temp tables:
create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into local_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from local_temp;
Global temp tables:
create global temporary table global_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint);
insert into global_temp values (generate_series(1,100000000),0,0,0,0,0,0,0,0);
select sum(x1) from global_temp;
Results (msec):
Insert Select Local temp table 37489 48322 Global temp table 44358 3003
So insertion in local temp table is performed slightly faster but select is 16 times slower!
Conclusion:
In the assumption then temp table fits in memory, global temp tables with shared buffers provides better performance than local temp table.
I didn't consider here global temp tables with local buffers because for them results should be similar with local temp tables.Probably there is not a reason why shared buffers should be slower than local buffers when system is under low load.access to shared memory is protected by spin locks (are cheap for few processes), so tests in one or few process are not too important (or it is just one side of space)another topic can be performance on MS Sys - there are stories about not perfect performance of shared memory there.RegardsPavel
I am using "upsert" into temp table in multiple connections.
create global temp table gtemp (x integer primary key, y bigint);
upsert.sql:
insert into gtemp values (random() * 1000000, 0) on conflict(x) do update set y=gtemp.y+1;
pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres
I failed to find some standard way in pgbech to perform per-session initialization to create local temp table,
so I just insert this code in pgbench code:
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..af6a431 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,7 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
}
}
Results are the following:
Global temp table: 117526 TPS
Local temp table: 107802 TPS
So even for this workload global temp table with shared buffers are a little bit faster.
I will be pleased if you can propose some other testing scenario.please, try to increase number of connections.
With 20 connections and 4 pgbench threads results are similar: 119k TPS for global temp tables and 115k TPS for local temp tables.
I have tried yet another scenario: read-only access to temp tables:
\set id random(1,10000000)
select sum(y) from ltemp where x=:id;
Tables are created and initialized in pgbench session startup:
knizhnik@knizhnik:~/postgresql$ git diff
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf33..95295b0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5994,6 +5994,8 @@ threadRun(void *arg)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
+ executeStatement(state[i].con, "create temp table ltemp(x integer primary key, y bigint)");
+ executeStatement(state[i].con, "insert into ltemp values (generate_series(1,1000000), generate_series(1,1000000))");
}
}
Results for 10 connections with 10 million inserted records per table and 100 connections with 1 million inserted record per table :
#connections: 10 100 local temp 68k 90k global temp, shared_buffers=1G 63k 61k global temp, shared_buffers=10G 150k 150k
So temporary tables with local buffers are slightly faster when data doesn't fit in shared buffers, but significantly slower when it fits.
All previously reported results were produced at my desktop.
I also run this read-only test on huge IBM server (POWER9, 2 NUMA nodes, 176 CPU, 1Tb RAM).
Here the difference between local and global tables is not so large:
Local temp: 739k TPS
Global temp: 924k TPSis not difference between local temp buffers and global temp buffers by too low value of TEMP_BUFFERS?
Certainly, default (small) temp buffer size plays roles.
But it this IPC host this difference is not so important.
Result with local temp tables and temp_buffers = 1GB: 859k TPS.
--
Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Certainly, default (small) temp buffer size plays roles.
But it this IPC host this difference is not so important.
Result with local temp tables and temp_buffers = 1GB: 859k TPS.
It is little bit unexpected result.I understand so it partially it is generic problem access to smaller dedicated caches versus access to bigger shared cache.
But it is hard to imagine so access to local cache is 10% slower than access to shared cache. Maybe there is some bottle neck - maybe our implementation of local buffers are suboptimal.
Using local buffers for global temporary tables can be interesting from another reason - it uses temporary files, and temporary files can be forwarded on ephemeral IO on Amazon cloud (with much better performance than persistent IO).
--Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 19.08.2019 18:53, Pavel Stehule wrote:
Certainly, default (small) temp buffer size plays roles.
But it this IPC host this difference is not so important.
Result with local temp tables and temp_buffers = 1GB: 859k TPS.It is little bit unexpected result.I understand so it partially it is generic problem access to smaller dedicated caches versus access to bigger shared cache.But it is hard to imagine so access to local cache is 10% slower than access to shared cache. Maybe there is some bottle neck - maybe our implementation of local buffers are suboptimal.
It may be caused by system memory allocator - in case of using shared buffers we do not need to ask OS to allocate more memory.
Using local buffers for global temporary tables can be interesting from another reason - it uses temporary files, and temporary files can be forwarded on ephemeral IO on Amazon cloud (with much better performance than persistent IO).
My assumption is that temporary tables almost always fit in memory. So in most cases there is on need to write data to file at all.
As I wrote at the beginning of this thread, one of the problems with temporary table sis that it is not possible to use them at replica.
Global temp tables allows to share metadata between master and replica.
I perform small investigation: how difficult it will be to support inserts in temp tables at replica.
First my impression was that it can be done in tricky but simple way.
By making small changes changing just three places:
1. Prohibit non-select statements in read-only transactions
2. Xid assignment (return FrozenTransactionId)
3. Transaction commit/abort
I managed to provide normal work with global temp tables at replica.
But there is one problem with this approach: it is not possible to undo changes in temp tables so rollback doesn't work.
I tried another solution, but assigning some dummy Xids to standby transactions.
But this approach require much more changes:
- Initialize page for such transaction in CLOG
- Mark transaction as committed/aborted in XCLOG
- Change snapshot check in visibility function
And still I didn't find safe way to cleanup CLOG space.
Alternative solution is to implement "local CLOG" for such transactions.
The straightforward solution is to use hashtable. But it may cause memory overflow if we have long living backend which performs huge number of transactions.
Also in this case we need to change visibility check functions.
So I have implemented simplest solution with frozen xid and force backend termination in case of transaction rollback (so user will no see inconsistent behavior).
Attached please find global_private_temp_replica.patch which implements this approach.
It will be nice if somebody can suggest better solution for temporary tables at replica.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
út 20. 8. 2019 v 16:51 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 19.08.2019 18:53, Pavel Stehule wrote:
Certainly, default (small) temp buffer size plays roles.
But it this IPC host this difference is not so important.
Result with local temp tables and temp_buffers = 1GB: 859k TPS.It is little bit unexpected result.I understand so it partially it is generic problem access to smaller dedicated caches versus access to bigger shared cache.But it is hard to imagine so access to local cache is 10% slower than access to shared cache. Maybe there is some bottle neck - maybe our implementation of local buffers are suboptimal.
It may be caused by system memory allocator - in case of using shared buffers we do not need to ask OS to allocate more memory.
maybe, but shared buffers you have a overhead with searching free buffers and some overhead with synchronization processes.
Using local buffers for global temporary tables can be interesting from another reason - it uses temporary files, and temporary files can be forwarded on ephemeral IO on Amazon cloud (with much better performance than persistent IO).
My assumption is that temporary tables almost always fit in memory. So in most cases there is on need to write data to file at all.
As I wrote at the beginning of this thread, one of the problems with temporary table sis that it is not possible to use them at replica.
Global temp tables allows to share metadata between master and replica.
I am not sure if I understand to last sentence. Global temp tables should be replicated on replica servers. But the content should not be replicated. This should be session specific.
I perform small investigation: how difficult it will be to support inserts in temp tables at replica.
First my impression was that it can be done in tricky but simple way.
By making small changes changing just three places:
1. Prohibit non-select statements in read-only transactions
2. Xid assignment (return FrozenTransactionId)
3. Transaction commit/abort
I managed to provide normal work with global temp tables at replica.
But there is one problem with this approach: it is not possible to undo changes in temp tables so rollback doesn't work.
I tried another solution, but assigning some dummy Xids to standby transactions.
But this approach require much more changes:
- Initialize page for such transaction in CLOG
- Mark transaction as committed/aborted in XCLOG
- Change snapshot check in visibility function
And still I didn't find safe way to cleanup CLOG space.
Alternative solution is to implement "local CLOG" for such transactions.
The straightforward solution is to use hashtable. But it may cause memory overflow if we have long living backend which performs huge number of transactions.
Also in this case we need to change visibility check functions.
So I have implemented simplest solution with frozen xid and force backend termination in case of transaction rollback (so user will no see inconsistent behavior).
Attached please find global_private_temp_replica.patch which implements this approach.
It will be nice if somebody can suggest better solution for temporary tables at replica.
This is another hard issue. Probably backend temination should be acceptable solution. I don't understand well to this area, but if replica allows writing (to global temp tables), then replica have to have local CLOG.
CLOG for global temp tables can be more simple then standard CLOG. Data are not shared, and life of data (and number of transactions) can be low.
Another solution is wait on ZHeap storage and replica can to have own UNDO log.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 20.08.2019 19:06, Pavel Stehule wrote:
As I wrote at the beginning of this thread, one of the problems with temporary table sis that it is not possible to use them at replica.
Global temp tables allows to share metadata between master and replica.I am not sure if I understand to last sentence. Global temp tables should be replicated on replica servers. But the content should not be replicated. This should be session specific.
Obviously.
When we run OLAP queries at replica, it will be great if we can do
insert into temp_table (select ...);
With local temp tables it is not possible just because you can not create temp table at replica.
But global temp table can be created at master and populated with data at replica.
I perform small investigation: how difficult it will be to support inserts in temp tables at replica.
First my impression was that it can be done in tricky but simple way.
By making small changes changing just three places:
1. Prohibit non-select statements in read-only transactions
2. Xid assignment (return FrozenTransactionId)
3. Transaction commit/abort
I managed to provide normal work with global temp tables at replica.
But there is one problem with this approach: it is not possible to undo changes in temp tables so rollback doesn't work.
I tried another solution, but assigning some dummy Xids to standby transactions.
But this approach require much more changes:
- Initialize page for such transaction in CLOG
- Mark transaction as committed/aborted in XCLOG
- Change snapshot check in visibility function
And still I didn't find safe way to cleanup CLOG space.
Alternative solution is to implement "local CLOG" for such transactions.
The straightforward solution is to use hashtable. But it may cause memory overflow if we have long living backend which performs huge number of transactions.
Also in this case we need to change visibility check functions.
So I have implemented simplest solution with frozen xid and force backend termination in case of transaction rollback (so user will no see inconsistent behavior).
Attached please find global_private_temp_replica.patch which implements this approach.
It will be nice if somebody can suggest better solution for temporary tables at replica.This is another hard issue. Probably backend temination should be acceptable solution. I don't understand well to this area, but if replica allows writing (to global temp tables), then replica have to have local CLOG.
There are several problems:
1. How to choose XID for writing transaction at standby. The simplest solution is to just add 0x7fffffff to the current XID.
It eliminates possibility of conflict with normal XIDs (received from master).
But requires changes in visibility functions. Visibility check function do not know OID of tuple owner, just XID stored in the tuple header. It should make a decision just based on this XID.
2. How to perform cleanup of not needed XIDs. Right now there is quite complex logic of how to free CLOG pages.
3. How to implement visibility rules to such XIDs.
I thought about implementation of special table access method for temporary tables.CLOG for global temp tables can be more simple then standard CLOG. Data are not shared, and life of data (and number of transactions) can be low.Another solution is wait on ZHeap storage and replica can to have own UNDO log.
I am trying to understand now if it is the only possible approach or there are simpler solutions.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
út 20. 8. 2019 v 18:42 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 20.08.2019 19:06, Pavel Stehule wrote:As I wrote at the beginning of this thread, one of the problems with temporary table sis that it is not possible to use them at replica.
Global temp tables allows to share metadata between master and replica.I am not sure if I understand to last sentence. Global temp tables should be replicated on replica servers. But the content should not be replicated. This should be session specific.
Obviously.
When we run OLAP queries at replica, it will be great if we can do
insert into temp_table (select ...);
With local temp tables it is not possible just because you can not create temp table at replica.
But global temp table can be created at master and populated with data at replica.
yes
I perform small investigation: how difficult it will be to support inserts in temp tables at replica.
First my impression was that it can be done in tricky but simple way.
By making small changes changing just three places:
1. Prohibit non-select statements in read-only transactions
2. Xid assignment (return FrozenTransactionId)
3. Transaction commit/abort
I managed to provide normal work with global temp tables at replica.
But there is one problem with this approach: it is not possible to undo changes in temp tables so rollback doesn't work.
I tried another solution, but assigning some dummy Xids to standby transactions.
But this approach require much more changes:
- Initialize page for such transaction in CLOG
- Mark transaction as committed/aborted in XCLOG
- Change snapshot check in visibility function
And still I didn't find safe way to cleanup CLOG space.
Alternative solution is to implement "local CLOG" for such transactions.
The straightforward solution is to use hashtable. But it may cause memory overflow if we have long living backend which performs huge number of transactions.
Also in this case we need to change visibility check functions.
So I have implemented simplest solution with frozen xid and force backend termination in case of transaction rollback (so user will no see inconsistent behavior).
Attached please find global_private_temp_replica.patch which implements this approach.
It will be nice if somebody can suggest better solution for temporary tables at replica.This is another hard issue. Probably backend temination should be acceptable solution. I don't understand well to this area, but if replica allows writing (to global temp tables), then replica have to have local CLOG.
There are several problems:
1. How to choose XID for writing transaction at standby. The simplest solution is to just add 0x7fffffff to the current XID.
It eliminates possibility of conflict with normal XIDs (received from master).
But requires changes in visibility functions. Visibility check function do not know OID of tuple owner, just XID stored in the tuple header. It should make a decision just based on this XID.
2. How to perform cleanup of not needed XIDs. Right now there is quite complex logic of how to free CLOG pages.
3. How to implement visibility rules to such XIDs.
in theory every session can have own CLOG. When you finish session, you can truncate this file.
I thought about implementation of special table access method for temporary tables.CLOG for global temp tables can be more simple then standard CLOG. Data are not shared, and life of data (and number of transactions) can be low.Another solution is wait on ZHeap storage and replica can to have own UNDO log.
+1
I am trying to understand now if it is the only possible approach or there are simpler solutions.-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 20.08.2019 20:01, Pavel Stehule wrote:
Unfortunately implementing special table access method for temporary tables doesn't solve all problems.Another solution is wait on ZHeap storage and replica can to have own UNDO log.I thought about implementation of special table access method for temporary tables.+1
XID generation is not part of table access methods.
So we still need to assign some XID to write transaction at replica which will not conflict with XIDs received from master.
Actually only global temp tables can be updated at replica and so assigned XIDs can be stored only in tuples of such relations.
But still I am not sure that we can use arbitrary XID for such transactions at replica.
Also I upset by amount of functionality which has to be reimplemented for global temp tables if we really want to provide access method for them:
1. CLOG
2. vacuum
3. MVCC visibility
And still it is not possible to encapsulate all changes need to support writes to temp tables at replica inside table access method.
XID assignment, transaction commit and abort, subtransactions - all this places need to be patched.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 21.08.2019 11:54, Konstantin Knizhnik wrote:
On 20.08.2019 20:01, Pavel Stehule wrote:Unfortunately implementing special table access method for temporary tables doesn't solve all problems.Another solution is wait on ZHeap storage and replica can to have own UNDO log.I thought about implementation of special table access method for temporary tables.+1
XID generation is not part of table access methods.
So we still need to assign some XID to write transaction at replica which will not conflict with XIDs received from master.
Actually only global temp tables can be updated at replica and so assigned XIDs can be stored only in tuples of such relations.
But still I am not sure that we can use arbitrary XID for such transactions at replica.
Also I upset by amount of functionality which has to be reimplemented for global temp tables if we really want to provide access method for them:
1. CLOG
2. vacuum
3. MVCC visibility
And still it is not possible to encapsulate all changes need to support writes to temp tables at replica inside table access method.
XID assignment, transaction commit and abort, subtransactions - all this places need to be patched.
I was able to fully support work with global temp tables at replica (including subtransactions).
The patch is attached. Also you can find this version in https://github.com/postgrespro/postgresql.builtin_pool/tree/global_temp_hot
Right now transactions at replica updating global temp table are assigned special kind of GIDs which are not related with XIDs received from master.
So special visibility rules are used for such tables at replica. Also I have to patch TransactionIdIsInProgress, TransactionIdDidCommit, TransactionIdGetCurrent
functions to correctly handle such XIDs. In principle it is possible to implement global temp tables as special heap access method. But it will require copying a lot of code (heapam.c)
so I prefer to add few checks to existed functions.
There are still some limitations:
- Number of transactions at replica which update temp tables is limited by 2^32 (wraparound problem is not addressed).
- I have to maintain in-memory analog of CLOG for such transactions which is also not cropped. It means that for 2^32 transaction size of bitmap can grow up to 0.5Gb.
I try to understand what are the following steps in global temp tables support.
This is why I want to perform short survey - what people are expecting from global temp tables:
1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
I have added support of all indexes (brin, btree, gin, gist, hash, spgist) for global temp tables (before only B-Tree index was supported). It will be nice to have some generic mechanism for it, but I do not understand how it can look like. The problem is that normal relations are initialized at the moment of their creation. But for global temp relations metadata already exists while data is absent. We should somehow catch such access to not initialized page (but not not all pages, but just first page of relation) and perform initialization on demand. New patch for global temp tables with shared buffers is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
st 18. 9. 2019 v 12:04 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 21.08.2019 11:54, Konstantin Knizhnik wrote:On 20.08.2019 20:01, Pavel Stehule wrote:Unfortunately implementing special table access method for temporary tables doesn't solve all problems.Another solution is wait on ZHeap storage and replica can to have own UNDO log.I thought about implementation of special table access method for temporary tables.+1
XID generation is not part of table access methods.
So we still need to assign some XID to write transaction at replica which will not conflict with XIDs received from master.
Actually only global temp tables can be updated at replica and so assigned XIDs can be stored only in tuples of such relations.
But still I am not sure that we can use arbitrary XID for such transactions at replica.
Also I upset by amount of functionality which has to be reimplemented for global temp tables if we really want to provide access method for them:
1. CLOG
2. vacuum
3. MVCC visibility
And still it is not possible to encapsulate all changes need to support writes to temp tables at replica inside table access method.
XID assignment, transaction commit and abort, subtransactions - all this places need to be patched.
I was able to fully support work with global temp tables at replica (including subtransactions).
The patch is attached. Also you can find this version in https://github.com/postgrespro/postgresql.builtin_pool/tree/global_temp_hot
Right now transactions at replica updating global temp table are assigned special kind of GIDs which are not related with XIDs received from master.
So special visibility rules are used for such tables at replica. Also I have to patch TransactionIdIsInProgress, TransactionIdDidCommit, TransactionIdGetCurrent
functions to correctly handle such XIDs. In principle it is possible to implement global temp tables as special heap access method. But it will require copying a lot of code (heapam.c)
so I prefer to add few checks to existed functions.
There are still some limitations:
- Number of transactions at replica which update temp tables is limited by 2^32 (wraparound problem is not addressed).
- I have to maintain in-memory analog of CLOG for such transactions which is also not cropped. It means that for 2^32 transaction size of bitmap can grow up to 0.5Gb.
I try to understand what are the following steps in global temp tables support.
This is why I want to perform short survey - what people are expecting from global temp tables:
1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).
There can be other point important for cloud. Inside some cloud usually there are two types of discs - persistent (slow) and ephemeral (fast). We effectively used temp tables there because we moved temp tablespace to ephemeral discs.
I missing one point in your list - developer's comfort - using temp tables is just much more comfortable - you don't need create it again, again, .. Due this behave is possible to reduce @2 and @3 can be nice side effect. If you reduce @2 to zero, then @5 should be possible without any other.
Pavel
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 20.09.2019 19:43, Pavel Stehule wrote:
1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).There can be other point important for cloud. Inside some cloud usually there are two types of discs - persistent (slow) and ephemeral (fast). We effectively used temp tables there because we moved temp tablespace to ephemeral discs.
Yes, I already heard this argument and agree with it.
I just want to notice two things:
1. My assumption is that in most cases data of temporary table can fit in memory (certainly if we are not limiting them by temp_buffers = 8MB, but store in shared buffers) and so there is on need to write them to the persistent media at all.
2. Global temp tables do not substitute local temp tables, accessed through local buffers. So if you want to use temporary storage, you will always have a way to do it.
The question is whether we need to support two kinds of global temp tables (with shared or private buffers) or just implement one of them.
Sorry, I do not completely understand your point hereI missing one point in your list - developer's comfort - using temp tables is just much more comfortable - you don't need create it again, again, .. Due this behave is possible to reduce @2 and @3 can be nice side effect. If you reduce @2 to zero, then @5 should be possible without any other.
You can use normal (permanent) table and you will not have to create them again and again. It is also possible to use them for storing temporary data - just need to truncate table when data is not needed any more.
Certainly you can not use the same table in more than one backend. Here is the main advantage of temp tables - you can have storage of per-session data and do not worry about possible name conflicts.
From the other side: there are many cases where format of temporary data is not statically known: it is determined dynamically during program execution.
In this case local temp table provides the most convenient mechanism for working with such data.
This is why I think that ewe need to have both local and global temp tables.
Also I do not agree with your statement "If you reduce @2 to zero, then @5 should be possible without any other".
In the solution implemented by Aleksander Alekseev metadata of temporary tables is kept in memory and not affecting catalog at all.
But them still can not be used at replica.
There are still some serious problems which need to be fixed to able it:
allow insert/update/delete statements for read-only transactions, somehow assign XIDs for them, implement savepoints and rollback of such transactions.
All this was done in the last version of my patch.
Yes, it doesn't depend on whether we are using shared or private buffers for temporary tables. The same approach can be implemented for both of them.
The question is whether we are really need temp tables at replica and if so, do we need full transaction support for them, including rollbacks, subtransactions.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
po 23. 9. 2019 v 9:57 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
On 20.09.2019 19:43, Pavel Stehule wrote:1. I do not need them at all.
2. Eliminate catalog bloating.
3. Mostly needed for compatibility with Oracle (simplify porting,...).
4. Parallel query execution.
5. Can be used at replica.
6. More efficient use of resources (first of all memory).There can be other point important for cloud. Inside some cloud usually there are two types of discs - persistent (slow) and ephemeral (fast). We effectively used temp tables there because we moved temp tablespace to ephemeral discs.
Yes, I already heard this argument and agree with it.
I just want to notice two things:
1. My assumption is that in most cases data of temporary table can fit in memory (certainly if we are not limiting them by temp_buffers = 8MB, but store in shared buffers) and so there is on need to write them to the persistent media at all.
2. Global temp tables do not substitute local temp tables, accessed through local buffers. So if you want to use temporary storage, you will always have a way to do it.
The question is whether we need to support two kinds of global temp tables (with shared or private buffers) or just implement one of them.
It's valid only for OLTP. OLAP world is totally different. More if all users used temporary tables, and you should to calculate with it - it is one reason for global temp tables, then you need multiply size by max_connection.
hard to say what is best from implementation perspective, but it can be unhappy if global temporary tables has different performance characteristics and configuration than local temporary tables.
Sorry, I do not completely understand your point hereI missing one point in your list - developer's comfort - using temp tables is just much more comfortable - you don't need create it again, again, .. Due this behave is possible to reduce @2 and @3 can be nice side effect. If you reduce @2 to zero, then @5 should be possible without any other.
You can use normal (permanent) table and you will not have to create them again and again. It is also possible to use them for storing temporary data - just need to truncate table when data is not needed any more.
Certainly you can not use the same table in more than one backend. Here is the main advantage of temp tables - you can have storage of per-session data and do not worry about possible name conflicts.
You use temporary tables because you know so you share data between session never. I don't remember any situation when I designed temp tables with different schema for different sessions.
Using global temp table is not effective - you are work with large tables, you need to use delete, .. so you cannot to use classic table like temp tables effectively.
From the other side: there are many cases where format of temporary data is not statically known: it is determined dynamically during program execution.
In this case local temp table provides the most convenient mechanism for working with such data.
This is why I think that ewe need to have both local and global temp tables.
Also I do not agree with your statement "If you reduce @2 to zero, then @5 should be possible without any other".
In the solution implemented by Aleksander Alekseev metadata of temporary tables is kept in memory and not affecting catalog at all.
But them still can not be used at replica.
There are still some serious problems which need to be fixed to able it:
allow insert/update/delete statements for read-only transactions, somehow assign XIDs for them, implement savepoints and rollback of such transactions.
All this was done in the last version of my patch.
Yes, it doesn't depend on whether we are using shared or private buffers for temporary tables. The same approach can be implemented for both of them.
The question is whether we are really need temp tables at replica and if so, do we need full transaction support for them, including rollbacks, subtransactions.
temporary tables (of any type) on replica is interesting feature that opens some possibilities. Some queries cannot be optimized and should be divided and some results should be stored to temporary tables, analysed (to get correct statistics), maybe indexed, and after that the calculation can continue. Now you can do this just only on master. More - on HotStandBy the data are read only, and without direct impact on master (production), so you can do some harder calculation there. And temporary tables is used technique how to fix estimation errors.
I don't think so subtransaction, transaction, rollbacks are necessary for these tables. On second hand with out it, it is half cooked features, and can looks pretty strange in pg environment.
I am very happy, how much work you do in this area, I had not a courage to start this job, but I don't think so this work can be reduced just to some supported scenarios - and I hope so correct implementation is possible - although it is not simply work.
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
This broke recently. Can you please rebase? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 25.09.2019 23:28, Alvaro Herrera wrote: > This broke recently. Can you please rebase? > Rebased version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
As far as both Robert and Pavel think that aspects of using GTT in parallel queries and at replica should be considered separately. I have prepared simplest version of the patch for GTT which introduces minimal differences with current (local) temporary table. So GTT are stored in private buffers, can not be accessed at replica, in prepared transactions and parallel queries. But it supports all existed built-on indexes (hash, nbtree, btrin, git, gist, spgist) and per-backend statistic. There are no any DDL limitations for GTT. Also I have not yet introduced pg_statistic view (as proposed by Pavel). I afraid that it may break compatibility with some existed extensions and applications. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Yet another version of my GTT patch addressing issues reported by 曾文旌(义从) <wenjing.zwj@alibaba-inc.com> * Bug in TRUNCATE is fixed, * ON COMMIT DELETE ROWS option is supported * ALTER TABLE is correctly handled -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Now pg_gtt_statistic view is provided for global temp tables. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Nov 20, 2019 at 07:32:14PM +0300, Konstantin Knizhnik wrote: > Now pg_gtt_statistic view is provided for global temp tables. Latest patch fails to apply, per Mr Robot's report. Could you please rebase and send an updated version? For now I have moved the patch to next CF, waiting on author. -- Michael
Attachment
On 01.12.2019 4:56, Michael Paquier wrote: > On Wed, Nov 20, 2019 at 07:32:14PM +0300, Konstantin Knizhnik wrote: >> Now pg_gtt_statistic view is provided for global temp tables. > Latest patch fails to apply, per Mr Robot's report. Could you please > rebase and send an updated version? For now I have moved the patch to > next CF, waiting on author. > -- > Michael Rebeased version of the patch is attached. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi all, I am not aware enough in the Postgres internals to give advice about the implementation. But my feeling is that there is another big interest for this feature: simplify the Oracle to PostgreSQL migration of applicationsthat use global termporary tables. And this is quite common when stored procedures are used. In such a case,we currently need to modify the logic of the code, always implementing an ugly solution (either add CREATE TEMP TABLEstatements in the code everywhere it is needed, or use a regular table with additional TRUNCATE statements if we canensure that only a single connection uses the table at a time). So, Konstantin and all, Thanks by advance for all that could be done on this feature :-) Best regards.
Hi, this patch was marked as waiting on author since the beginning of the CF, most likely because it no longer applies (not sure). As there has been very little activity since then, I've marked it as returned with feedback. Feel free to re-submit an updated patch for 2020-03. This definitely does not mean the feature is not desirable, but my feeling is most of the discussion happens on the other thread dealing with global temp tables [1] so maybe we should keep just that one and combine the efforts. [1] https://commitfest.postgresql.org/26/2349/ -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01.02.2020 14:49, Tomas Vondra wrote: > Hi, > > this patch was marked as waiting on author since the beginning of the > CF, most likely because it no longer applies (not sure). As there has > been very little activity since then, I've marked it as returned with > feedback. Feel free to re-submit an updated patch for 2020-03. > > This definitely does not mean the feature is not desirable, but my > feeling is most of the discussion happens on the other thread dealing > with global temp tables [1] so maybe we should keep just that one and > combine the efforts. > > [1] https://commitfest.postgresql.org/26/2349/ > New version of the patch with new method of GTT index construction is attached. Now GTT indexes are checked before query execution and are initialized using AM build method. So now GTT is supported for all indexes, including custom indexes. -- Konstantin Knizhnik Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Attachment
Fix GTT index initialization. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Sorry, small typo in the last patch. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi,
I am very interested in this feature that will conform to the SQL standard and I read that :
Session 1:
create global temp table gtt(x integer);
insert into gtt values (generate_series(1,100000));
Session 2:
insert into gtt values (generate_series(1,200000));
Session1:
create index on gtt(x);
explain select * from gtt where x = 1;
Session2:
explain select * from gtt where x = 1;
??? Should we use index here?
My answer is - yes.
Just because:
- Such behavior is compatible with regular tables. So it will not
confuse users and doesn't require some complex explanations.
- It is compatible with Oracle.
There is a confusion. Sadly it does not work like that at all with Oracle. Their implementation is buggy in my opinion.
Here is a very simple test case to prove it with the latest version (january 2020) :
Connected to:
-- session 1
create global temporary table gtt(x integer);
Table created.
-- session 2
insert into gtt SELECT level FROM dual CONNECT BY LEVEL <= 100000;
100000 rows created.
-- session 1
create index igtt on gtt(x);
Index created.
-- session 2
select * from gtt where x = 9;
no rows selected
select /*+ FULL(gtt) */ * from gtt where x = 9;
X
----------
9
What happened ? The optimizer (planner) knows the new index igtt can be efficient via dynamic sampling. Hence, igtt is used at execution time...but it is NOT populated. By default I obtained no line. If I force a full scan of the table with a hint /*+ FULL */ you can see that I obtain my line 9. Different results with different exec plans it's a WRONG RESULT bug, the worst kind of bugs.
Please don't consider Oracle as a reference for your implementation. I am 100% sure you can implement and document that better than Oracle. E.g index is populated and considered only for transactions that started after the index creation or something like that. It would be far better than this misleading behaviour.
Regards,
Phil
Télécharger Outlook pour Android
From: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Sent: Monday, February 10, 2020 5:48:29 PM
To: Tomas Vondra <tomas.vondra@2ndquadrant.com>; Philippe BEAUDOIN <phb07@apra.asso.fr>
Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>; Konstantin Knizhnik <knizhnik@garret.ru>
Subject: Re: Global temporary tables
Sent: Monday, February 10, 2020 5:48:29 PM
To: Tomas Vondra <tomas.vondra@2ndquadrant.com>; Philippe BEAUDOIN <phb07@apra.asso.fr>
Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>; Konstantin Knizhnik <knizhnik@garret.ru>
Subject: Re: Global temporary tables
Sorry, small typo in the last patch.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company