Re: Global temporary tables - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: Global temporary tables |
Date | |
Msg-id | CAMsr+YFh5N0TCtaMmUXZ85wJx1EkMwV+cjWWbQvMbtyJsc2ONg@mail.gmail.com Whole thread Raw |
In response to | Re: Global temporary tables (Pavel Stehule <pavel.stehule@gmail.com>) |
Responses |
Re: Global temporary tables
|
List | pgsql-hackers |
On Tue, 13 Aug 2019 at 00:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
But Postgres is not storing this information now anywhere else except statistic, isn't it?not only - critical numbers are reltuples, relpages from pg_class
That's a very good point. relallvisible too. How's the global temp table impl handling that right now, since you won't be changing the pg_class row? AFAICS relpages doesn't need to be up to date (and reltuples certainly doesn't) so presumably you're just leaving them as zero?
What happens right now if you ANALYZE or VACUUM ANALYZE a global temp table? Is it just disallowed?
I'll need to check, but I wonder if periodically updating those fields in pg_class impacts logical decoding too. Logical decoding must treat transactions with catalog changes as special cases where it creates custom snapshots and does other expensive additional work. (See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its callsites). We don't actually need to know relpages and reltuples during logical decoding. It can probably ignore relfrozenxid and relminmxid changes too, maybe even pg_statistic changes though I'd be less confident about that one.
At some point I need to patch in a bunch of extra tracepoints and do some perf tracing to see how often we do potentially unnecessary snapshot related work in logical decoding.
There was proposal to cache relation size, but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?they use default statistics based on relpages. But for 1% of applications statistics are critical - almost always for OLAP applications.
Agreed. It's actually quite a common solution to user problem reports / support queries about temp table performance: "Run ANALYZE. Consider creating indexes too."
Which reminds me - if global temp tables do get added, it'll further increase the desirability of exposing a feature to let users disable+invalidate and then later reindex+enable indexes without icky catalog hacking. So they can disable indexes for their local copy, load data, re-enable indexes. That'd be "interesting" to implement for global temp tables given that index state is part of the pg_index row associated with an index rel though.
1. hold these data only in memory in special buffers
I don't see that working well for pg_statistic or anything else that holds nontrivial user data though.
2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.
Very meta. Syscache and relcache are extremely performance critical but could probably skip scans entirely on backends that haven't used any global temp tables.
I don't know the relevant caches well enough to have a useful opinion here.
I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.
It'd only be the metadata, but if it includes things like column histograms and most frequent value data that'd still be undesirable to have pinned in backend memory.
pgsql-hackers by date: