PROPOSAL: Fast temporary tables - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | PROPOSAL: Fast temporary tables |
Date | |
Msg-id | 20160301182500.2c81c3dc@fujitsu Whole thread Raw |
Responses |
Re: PROPOSAL: Fast temporary tables
Re: PROPOSAL: Fast temporary tables |
List | pgsql-hackers |
Hello There are applications that create and delete a lot of temporary tables. Currently PostgreSQL doesn't handle such a use case well. Consider the following benchmark/example. postgresql.conf: ``` autovacuum = on log_min_messages = debug2 ``` temp-table.pgbench: ``` create temporary table tt1(x jsonb); drop table tt1; ``` Benchmark: # pgbench-server: pgbench -h (ip) -f temp-table.pgbench -T 600 -P 1 -c 40 -j 12 test # postgresql-server: tail -f path/to/logfile | grep 'DEBUG: vacuuming' At first everything is OK, PostgreSQL handles ~ 970 TPS. But after some time this value starts to drop to 10-100 TPS, then return to normal. In logfile we see: ``` DEBUG: vacuuming "pg_catalog.pg_class" DEBUG: vacuuming "pg_catalog.pg_type" DEBUG: vacuuming "pg_catalog.pg_index" DEBUG: vacuuming "pg_catalog.pg_class" DEBUG: vacuuming "pg_catalog.pg_type" DEBUG: vacuuming "pg_catalog.pg_index" DEBUG: vacuuming "pg_catalog.pg_depend" DEBUG: vacuuming "pg_catalog.pg_type" DEBUG: vacuuming "pg_catalog.pg_index" DEBUG: vacuuming "pg_catalog.pg_class" ... ``` Catalog tables are bloating. But there was no real reason to write anything to these tables in the first place since temporary tables could be seen only from one session. Except to make possible for autovacuum to find these tables. I propose to solve described issue by introducing a new entity - fast temporary tables (or maybe lightweight temporary tables? - name is discussable): create fast temporary table tt1(x jsonb); Fast temporary tables work almost as usual temporary tables but they are not present in the catalog. Information about tables is stored in shared memory instead. This way we solve a bloating problem. We should use *shared* memory so autovacuum could find these tables. Note that state should be restored properly and acquired locks should be released if one of backends terminates abnormally while accessing shared memory. Usually memory is considered an expensive resource. For these reason we can't just change current behaviour of temporary tables. It would cause a lot of problems for existing users. Also introducing a new type of tables allows us to make some other changes. For instance, we could drop trigger support for these tables if it would give us some sort of benefit, e.g. better performance. As I understand this feature is not too hard to implement. Basically all usages (read and write) of catalog in context of temporary tables should be found and carefully modified as described above. It reminds me how interception of system API works. All procedures should receive and return exact the same types of values as before, but implementation should be changed a little bit. Frankly so far I don't have a clear understanding which files exactly would be modified, but I believe it would be at least: * relcache.c * genam.c --- systable_* procedures in particular * heapam.c --- I would like to avoid this, but as I understand \d will not see temporary tables otherwise A few hints from people more experienced in this area would be appreciated. Then we carefully check that everything works as expected (indexes, autovacuum of temporary tables, etc), write regression tests and we are done. Here is what makes suggested approach in particular so interesting. I think that using similar method in the future we could implement writable temporary tables on replicas. This feature is very helpful in OLAP tasks. What do you think regarding described problem and proposed method of solving it? Best regards, Aleksander
pgsql-hackers by date: