Re: Global temporary tables - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Global temporary tables
Date
Msg-id CAFj8pRAJ_fAZrwcp9FgKsS+mGk3g5t2iZnqpcDiPXp=O6DmLFQ@mail.gmail.com
Whole thread Raw
In response to Re: Global temporary tables  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Global temporary tables
List pgsql-hackers


po 12. 8. 2019 v 18:19 odesílatel Konstantin Knizhnik <k.knizhnik@postgrespro.ru> napsal:
Hi,

On 11.08.2019 10:14, Pavel Stehule wrote:

Hi


There is one more problem with global temporary tables for which I do not know good solution now: collecting statistic.
As far as each backend has its own data, generally them may need different query plans.
Right now if you perform "analyze table" in one backend, then it will affect plans in all backends.
It can be considered not as bug, but as feature if we assume that distribution if data in all backens is similar.
But if this assumption is not true, then it can be a problem.

Last point is probably the most difficult issue and I think about it years.

I have a experience with my customers so 99% of usage temp tables is without statistics - just with good information only about rows. Only few customers know so manual ANALYZE is necessary for temp tables (when it is really necessary).

Sharing meta data about global temporary tables can real problem - probably not about statistics, but surely about number of pages and number of rows.

But Postgres is not storing this information now anywhere else except statistic, isn't it?

not only - critical numbers are reltuples, relpages from pg_class

There was proposal to cache relation size,  but it is not implemented yet. If such cache exists, then we can use it to store local information about global temporary tables.
So if 99% of users do not perform analyze for temporary tables, then them will not be faced with this problem, right?

they use default statistics based on relpages. But for 1% of applications statistics are critical - almost always for OLAP applications.




There are two requirements:

a) we need some special meta data for any instance (per session) of global temporary table (row, pages, statistics, maybe multicolumn statistics, ...)

b) we would not to use persistent global catalogue (against catalogue bloating)

I see two possible solution:

1. hold these data only in memory in special buffers

2. hold these data in global temporary tables - it is similar to normal tables - we can use global temp tables for metadata like classic persistent tables are used for metadata of classic persistent tables. Next syscache can be enhanced to work with union of two system tables.

I prefer @2 because changes can be implemented on deeper level.

Sharing metadata for global temp tables (current state if I understand well) is good enough for develop stage, but It is hard to expect so it can work generally in production environment.


I think that it not possible to assume that temporary data will aways fir in memory.
So 1) seems to be not acceptable solution.

I spoke only about metadata. Data should be stored in temp buffers (and possibly in temp files)

Pavel

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: Removing unneeded downlink field from nbtree stack struct
Next
From: legrand legrand
Date:
Subject: Re: [survey] New "Stable" QueryId based on normalized query text