Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [Proposal] Global temporary tables
Date
Msg-id 3f360d7b-a46a-79f6-27b6-1b1df8c3ee82@postgrespro.ru
Whole thread Raw
In response to Re: [Proposal] Global temporary tables  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [Proposal] Global temporary tables  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers

On 06.01.2020 14:01, Tomas Vondra wrote:
> On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>> In the previous communication
>>
>> 1 we agreed on the general direction
>> 1.1 gtt use local (private) buffer
>> 1.2 no replica access in first version
>>
>
> OK, good.
>
>> 2 We feel that gtt needs to maintain statistics, but there is no
>> agreement on what it will be done.
>>
>
> I certainly agree GTT needs to maintain statistics, otherwise it'll lead
> to poor query plans. AFAIK the current patch stores the info in a hash
> table in a backend private memory, and I don't see how else to do that
> (e.g. storing it in a catalog would cause catalog bloat).
>
> FWIW this is a reasons why I think just using shared buffers (instead of
> local ones) is not sufficient to support parallel queriesl as proposed
> by Alexander. The workers would not know the stats, breaking planning of
> queries in PARALLEL SAFE plpgsql functions etc.


I do not think that "all or nothing" approach is so good for software 
development as for database transactions.
Yes, if we have function in PL/pgSQL which performs queries om temporary 
tables, then
parallel workers may build inefficient plan for this queries due to lack 
of statistics.
 From my point of view this is not a pitfall of GTT but result of lack 
of global plan cache in Postgres. And it should be fixed not at GTT level.

Also I never see real use cases with such functions, even in the systems 
which using hard temporary tables and stored procedures.
But there are many other real problems with temp tables  (except already 
mentioned in this thread).
In PgPro/EE we have fixes for some of them, for example:

1. Do not reserve space in the file for temp relations. Right now append 
of relation cause writing zero page to the disk by mdextend.
It cause useless disk IO for temp tables which in most cases fit in 
memory and should not be written at disk.

2. Implicitly perform analyze of temp table intermediately after storing 
data in it. Usually tables are analyzed by autovacuum in background.
But it doesn't work for temp tables which are not processes by 
autovacuum and are accessed immediately after filling them with data and 
lack of statistic  may cause
building very inefficient plan. We have online_analyze extension which 
force analyze of the table after appending some bulk of data to it.
It can be used for normal table but most of all it is useful for temp 
relations.

Unlike hypothetical example with parallel safe function working with 
temp tables,
this are real problems observed by some of our customers.
Them are applicable both to local and global temp tables and this is why 
I do not want to discuss them in context of GTT.


>
>> 3 Still no one commented on GTT's transaction information processing, 
>> they include
>> 3.1 Should gtt's frozenxid need to be care?
>> 3.2 gtt’s clog clean
>> 3.3 How to deal with "too old" gtt data
>>
>
> No idea what to do about this.
>

I wonder what is the specific of GTT here?
The same problem takes place for normal (local) temp tables, doesn't it?


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Recognizing superuser in pg_hba.conf
Next
From: Stephen Frost
Date:
Subject: Re: Removing pg_pltemplate and creating "trustable" extensions