Thread: Feature request (or at least discussion): enable autovaccum on temp tables

Hello,

The reason why we are using temp tables is to allow concurrent runs on some very large reports which involve creating and heavily churning dozens of very large tables.

The problem we're facing is that if we DON'T use temp tables, the reports finish in a couple of hours. If we DO use temp tables, they finish in a couple of weeks. While using regular tables, autovacuum runs pretty much continuously.

We'll try manually vacuuming the tables after major operations, but that's kind of kludgy. What would help is a GUC setting which toggles this behaviour.

Since AFAIK temp tables are very close to unlogged ordinary tables, what would stop this feature from being implemented?

Hi

pá 31. 5. 2019 v 17:26 odesílatel Ivan Voras <ivoras@gmail.com> napsal:
Hello,

The reason why we are using temp tables is to allow concurrent runs on some very large reports which involve creating and heavily churning dozens of very large tables.

The problem we're facing is that if we DON'T use temp tables, the reports finish in a couple of hours. If we DO use temp tables, they finish in a couple of weeks. While using regular tables, autovacuum runs pretty much continuously.

We'll try manually vacuuming the tables after major operations, but that's kind of kludgy. What would help is a GUC setting which toggles this behaviour.

Since AFAIK temp tables are very close to unlogged ordinary tables, what would stop this feature from being implemented?

I agree so this is not consistent behave. On second hand, for almost temp table usage using autovacuum has not benefit.

autovacuum check tables once per minute, and working on closed transactions. Lot of times temporary tables are filled inside some batch, and immediately some query is started over this table. It is very low probability to analyze temp table at good time automatically. So only explicit call of ANALYZE is correct solution.

Regards

Pavel


On Fri, 31 May 2019 at 17:34, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

pá 31. 5. 2019 v 17:26 odesílatel Ivan Voras <ivoras@gmail.com> napsal:
Hello,


autovacuum check tables once per minute, and working on closed transactions. Lot of times temporary tables are filled inside some batch, and immediately some query is started over this table. It is very low probability to analyze temp table at good time automatically. So only explicit call of ANALYZE is correct solution.


There are two types of workloads we're talking about: short ones, for which autovacuum is probably not worth it, and long-running ones (in our case, hours), where it is. So that's why I'm advocating a GUC toggle.
 

Re: Feature request (or at least discussion): enable autovaccum ontemp tables

From
Adam Brusselback
Date:
Temp tables are not visibile outside of a single connection, so the autovacuum worker connection isn't able to see it.

Are you sure that it's actually an issue with accumulating dead tuples, and not an issue with bad statistics?

In my processes which are heavy on temp tables, I have to manually run analyze after populating any temp tables.

Just a thought.
-Adam
On Fri, 31 May 2019 at 17:38, Adam Brusselback <adambrusselback@gmail.com> wrote:
Temp tables are not visibile outside of a single connection, so the autovacuum worker connection isn't able to see it.

I can connect as a superuser via psql while temp tables are being used and I can examine tables in the pg_temp_* schemas, so that particular separation isn't really a strong one. If there's a reason why autovacuum isn't doable on temp tables, it's not that one. (maybe autovacuum would work on the tables while the session disconnects, preventing the temp tables from being dropped?)

 
>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:

 Ivan> Since AFAIK temp tables are very close to unlogged ordinary
 Ivan> tables, what would stop this feature from being implemented?

The key difference between temp tables and other tables is that temp
table data does not use the shared_buffers but is buffered only in
backend-local memory.

This means that other processes (like, say, an autovacuum process) can
not access the content of temp tables. So what you want is not possible.

-- 
Andrew (irc:RhodiumToad)



Re: Feature request (or at least discussion): enable autovaccum ontemp tables

From
Christopher Browne
Date:
On Fri, 31 May 2019 at 11:26, Ivan Voras <ivoras@gmail.com> wrote:
Hello,

The reason why we are using temp tables is to allow concurrent runs on some very large reports which involve creating and heavily churning dozens of very large tables.

The problem we're facing is that if we DON'T use temp tables, the reports finish in a couple of hours. If we DO use temp tables, they finish in a couple of weeks. While using regular tables, autovacuum runs pretty much continuously.

We'll try manually vacuuming the tables after major operations, but that's kind of kludgy. What would help is a GUC setting which toggles this behaviour.

Since AFAIK temp tables are very close to unlogged ordinary tables, what would stop this feature from being implemented?


I do hit some situations similar to this; it does NOT lead me to wanting to VACUUM a lot; rather, after any query where I populate a lot of data in a temporary table, I need to add an explicit ANALYZE, which tends to be more than sufficient.

It is problematic for autovacuum to do this for two reasons:

1. Frequently, the moment that the ANALYZE *needs* to be run is the instant after the table is populated, because if it waits any longer, that will be too late to help the Immediately Following Query where the freshly populated temp table's data was used.

In effect, the only "safe" answer for those processes is for the SQL code to include an ANALYZE in the crucial spot.   If autovacuum tried to do this work, it might get to the table 5 seconds late, and thereby not do the good that you need.

2.  Temp tables are only associated (and visible) in the session in which you are doing the work.  autovacuum operates inside an ordinary session context, and in a separate connected session, so it can't see your temp tables; they are not handled in shared memory that crosses contexts to the one in which those temp tables exist.  Changing that would be way more difficult than meets the eye.

I would actually suggest that it is likely that VACUUM never needs to be done, as there's something wrong if a lot of garbage is being generated in temp tables.  All that should be needed are ANALYZE requests, and it makes sense to annotate the code with an ANALYZE after any time a table is massively updated.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
hi,

On Fri, 31 May 2019 at 17:54, Christopher Browne <cbbrowne@gmail.com> wrote:
 
2.  Temp tables are only associated (and visible) in the session in which you are doing the work.  autovacuum operates inside an ordinary session context, and in a separate connected session, so it can't see your temp tables; they are not handled in shared memory that crosses contexts to the one in which those temp tables exist.  Changing that would be way more difficult than meets the eye.

Hmm yeah, I forgot about that aspect - temp tables not going though shared buffers.

Thanks!

Hi,
On Fri, 31 May 2019 at 17:43, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:

 Ivan> Since AFAIK temp tables are very close to unlogged ordinary
 Ivan> tables, what would stop this feature from being implemented?

The key difference between temp tables and other tables is that temp
table data does not use the shared_buffers but is buffered only in
backend-local memory.

This means that other processes (like, say, an autovacuum process) can
not access the content of temp tables. So what you want is not possible.

Understood.

So, a related question, since we have dozens of temp tables and a lot of code, is there a way to look up what temp tables are being created by the current session, so I can do a VACUUM or ANALYZE on all of them in bulk? I know I can inspect pg_temp_* schema, but how to figure out which one is from the current session?


Re: Feature request (or at least discussion): enable autovaccum ontemp tables

From
Christopher Browne
Date:


On Fri, May 31, 2019, 12:26 PM Ivan Voras <ivoras@gmail.com> wrote:
Hi,
On Fri, 31 May 2019 at 17:43, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Ivan" == Ivan Voras <ivoras@gmail.com> writes:

 Ivan> Since AFAIK temp tables are very close to unlogged ordinary
 Ivan> tables, what would stop this feature from being implemented?

The key difference between temp tables and other tables is that temp
table data does not use the shared_buffers but is buffered only in
backend-local memory.

This means that other processes (like, say, an autovacuum process) can
not access the content of temp tables. So what you want is not possible.

Understood.

So, a related question, since we have dozens of temp tables and a lot of code, is there a way to look up what temp tables are being created by the current session, so I can do a VACUUM or ANALYZE on all of them in bulk? I know I can inspect pg_temp_* schema, but how to figure out which one is from the current session?


I think you might want to look at information_schema.tables; I seem to recall that view restricting itself to the tables actually visible to your connection.
So, a related question, since we have dozens of temp tables and a lot of code, is there a way to look up what temp tables are being created by the current session, so I can do a VACUUM or ANALYZE on all of them in bulk? I know I can inspect pg_temp_* schema, but how to figure out which one is from the current session?

Any table in pg_temp schema would belong to the current session. Technically the schema has some number appended on the end, but to the creating connection of those temp tables, it is just pg_temp. Probably you should be analyzing the temp tables immediately after insert/update/delete, OR just before first use. As others have said, you have no stats of most common values or historgrams or any other info unless you analyze. If your temp tables have many columns which are only selected and not used for joins or ordering, then you may not need stats on those columns and could save some cycles by specifying the columns, eg. analyze pg_temp.temp_table_name( join_field1, join_field2, order_by_field );