Temporary tables under hot standby - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Temporary tables under hot standby |
Date | |
Msg-id | 20120425035515.GA2236@tornado.leadboat.com Whole thread Raw |
Responses |
Re: Temporary tables under hot standby
Re: Temporary tables under hot standby Re: Temporary tables under hot standby Re: Temporary tables under hot standby Re: Temporary tables under hot standby |
List | pgsql-hackers |
A key barrier to migrations from trigger-based replication to WAL-based replication is the lack of temporary tables under hot standby. I'd like to close that gap; the changes needed will also reduce the master-side cost of temporary table usage. Here is a high-level design for your advice and comments. Much of this builds on ideas from past postings noted below. Our temporary tables are cataloged and filled like permanent tables. This has the major advantage of making most code operate on tables with minimal regard for their relpersistence. It also brings disadvantages: 1. High catalog turnover in rapid create/drop workloads. Heavy temporary table users often need to REINDEX relation-orientedcatalogs. Hot standby cannot assign OIDs or modify system catalogs at all. 2. Consumption of permanent XIDs for DML on the table. This increases COMMIT cost on the master and is a non-starter underhot standby. 3. autovacuum cannot VACUUM temporary tables, but their relfrozenxid values delay pg_clog truncation and can trigger a wraparound-preventionshutdown. 4. sinval traffic from every CREATE TEMP TABLE et al. 5. CREATE TEMP TABLE records AccessExclusiveLock in the global lock table. 6. We don't automatically drop temporary tables that existed at the point of a crash, because they look much like permanenttables. To resolve points 2 and 3, let's change the XID values stored in temporary tables from regular TransactionId to LocalTransactionId. This entails an lxid counterpart for clog.c. Functions like GetCurrentTransactionId() and HeapTupleSatisfiesVisibility() will take a Relation argument to identify the XID type. One open question is whether to add conditional logic to functions like HeapTupleSatisfiesMVCC() or to have parallel implementations like HeapTupleSatisfiesMVCCLocal(). I lean toward the latter, perhaps with the help of some code generation. I don't think a counterpart for pg_subtrans will be necessary; the backend knows its own XID tree, and the TRANSACTION_STATUS_SUB_COMMITTED interlock is superfluous with only one backend as reader and writer. I'm also thinking the local clog can live strictly in memory; a session that retains a temporary table across 2B local transactions can afford 512 MiB of RAM. With this change, VACUUM can ignore relfrozenxid of temporary tables when calculating a new datfrozenxid. This change can form an independent patch. I do not see a clean behind-the-scenes fix for points 1, 4 and 5. We can resolve those by adding a new variety of temporary table, one coincidentally matching the SQL standard's notion of a temporary table. The developer will declare it once, after which all sessions observe it as an initially-empty table whose contents remain local to the session. Most relation catalog entries, including all OIDs, are readily sharable among sessions. The exceptions are relpages, reltuples, relallvisible, relfrozenxid, and pg_statistic rows. I will handle the pg_class columns by introducing new backend APIs abstracting them. Those APIs will consult the relcache for permanent tables and a local-memory hash for temporary tables. For statistics, add a new catalog pg_temp_statistic, an inheritance child of pg_statistic and itself one of these new-variety temporary tables. Past discussions have raised the issue of interaction between commands like ALTER TABLE and sessions using the new-variety temporary table. As a first cut, let's keep this simple and have ongoing use of the table block operations requiring AccessExclusiveLock. Note that you can always just make a new temporary table with a different name to deploy a change quickly. Implement this with a heavyweight lock having a novel life cycle. When a session first takes an ordinary relation lock on the table, acquire the longer-term lock and schedule it for release on transaction abort. On TRUNCATE, schedule a release on transaction commit. Of course, also release the lock at session end. For new-variety temporary tables, change file names from "relfilenode[_fork]" to "refilenode[_fork].pid.localnode". During crash recovery, delete all files conforming to that pattern for refilenodes of known temporary tables. This also lets backends share pg_class.relfilenode. The "localnode" field would distinguish multiple generations of a table across VACUUM FULL, CLUSTER, and TRUNCATE. We could build on this strategy to safely resolve point 6 for the existing kind of temporary table, but that entails enough other details to probably not mix it into the same patch. A third patch will permit the following commands in read-only transactions, where they will throw an error if the subject is not a temporary table: INSERT UPDATE DELETE COPY ... FROM TRUNCATE ANALYZE VACUUM (including VACUUM FULL) CLUSTER (without USING clause) REINDEX I considered whether to instead separate the set of commands allowed in a read-only transaction from the set allowed under hot standby. This proposal is closer to the SQL standard, which explicitly allows INSERT, UPDATE and DELETE on temporary tables during read-only transactions. Only the first five commands are essential; support for the rest could wait for follow-on patches. Concerning everyone's favorite topic, how to name the new type of table, I liked Tom's proposal[1] to make CREATE TEMP TABLE retain current behavior and have CREATE GLOBAL TEMP TABLE and/or CREATE LOCAL TEMP TABLE request the new SQL-standard variety. (I'd vote for using CREATE GLOBAL and retaining CREATE LOCAL for future expansion.) As he mentions, to get there, we'd ideally start by producing a warning instead of silently accepting GLOBAL as a noise word. Should we put such a warning into 9.2? How important is support for VACUUM on these tables under hot standby? The alternative is to fail when a session retains a temporary table across 2B local transactions. I do not currently see any challenges sufficient to motivate not supporting VACUUM, but it might be a useful simplification to keep in mind. What about ANALYZE support; how important is the ability to collect statistics on temporary tables? Again, I tentatively expect to support it regardless of the answer. Key past threads: http://archives.postgresql.org/message-id/7903.1050417344@sss.pgh.pa.us http://archives.postgresql.org/message-id/162867790707011431u71e53543x19e64e5bb160b124@mail.gmail.com http://archives.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc7806f@mail.gmail.com http://archives.postgresql.org/message-id/u2o603c8f071004231952i36642ae6u9d6a7eae6eb6ff32@mail.gmail.com http://archives.postgresql.org/message-id/BANLkTin1Gha0SS77E64jczPfAPn6Oxb8hQ%40mail.gmail.com Thanks, nm [1] http://archives.postgresql.org/message-id/5422.1240936705@sss.pgh.pa.us
pgsql-hackers by date: