Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [Proposal] Global temporary tables |
Date | |
Msg-id | 20200113163253.ja34iqsors3o44sv@development Whole thread Raw |
In response to | Re: [Proposal] Global temporary tables (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: [Proposal] Global temporary tables
|
List | pgsql-hackers |
On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote: > > >On 12.01.2020 4:51, Tomas Vondra wrote: >>On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote: >>> >>> >>>On 09.01.2020 19:48, Tomas Vondra wrote: >>>> >>>>>The most complex and challenged task is to support GTT for all >>>>>kind of indexes. Unfortunately I can not proposed some good >>>>>universal solution for it. >>>>>Just patching all existed indexes implementation seems to be >>>>>the only choice. >>>>> >>>> >>>>I haven't looked at the indexing issue closely, but IMO we need to >>>>ensure that every session sees/uses only indexes on GTT that were >>>>defined before the seesion started using the table. >>> >>>Why? It contradicts with behavior of normal tables. >>>Assume that you have active clients and at some point of time DBA >>>recognizes that them are spending to much time in scanning some >>>GTT. >>>It cab create index for this GTT but if existed client will not be >>>able to use this index, then we need somehow make this clients to >>>restart their sessions? >>>In my patch I have implemented building indexes for GTT on demand: >>>if accessed index on GTT is not yet initialized, then it is filled >>>with local data. >> >>Yes, I know the behavior would be different from behavior for regular >>tables. And yes, it would not allow fixing slow queries in sessions >>without interrupting those sessions. >> >>I proposed just ignoring those new indexes because it seems much simpler >>than alternative solutions that I can think of, and it's not like those >>other solutions don't have other issues. > >Quit opposite: prohibiting sessions to see indexes created before >session start to use GTT requires more efforts. We need to somehow >maintain and check GTT first access time. > Hmmm, OK. I'd expect such check to be much simpler than the on-demand index building, but I admit I haven't tried implementing either of those options. >> >>For example, I've looked at the "on demand" building as implemented in >>global_private_temp-8.patch, I kinda doubt adding a bunch of index build >>calls into various places in index code seems somewht suspicious. > >We in any case has to initialize GTT indexes on demand even if we >prohibit usages of indexes created after first access by session to >GTT. >So the difference is only in one thing: should we just initialize >empty index or populate it with local data (if rules for index >usability are the same for GTT as for normal tables). >From implementation point of view there is no big difference. Actually >building index in standard way is even simpler than constructing empty >index. Originally I have implemented >first approach (I just forgot to consider case when GTT was already >user by a session). Then I rewrited it using second approach and patch >even became simpler. > >> >>* brinbuild is added to brinRevmapInitialize, which is meant to >> initialize state for scanning. It seems wrong to build the index we're >> scanning from this function (layering and all that). >> >>* btbuild is called from _bt_getbuf. That seems a bit ... suspicious? > > >As I already mentioned - support of indexes for GTT is one of the most >challenged things in my patch. >I didn't find good and universal solution. So I agreed that call of >btbuild from _bt_getbuf may be considered as suspicious. >I will be pleased if you or sombody else can propose better >elternative and not only for B-Tree, but for all other indexes. > >But as I already wrote above, prohibiting session to used indexes >created after first access to GTT doesn't solve the problem. >For normal tables (and for local temp tables) indexes are initialized >at the time of their creation. >With GTT it doesn't work, because each session has its own local data >of GTT. >We should either initialize/build index on demand (when it is first >accessed), either at the moment of session start initialize indexes >for all existed GTTs. >Last options seem to be much worser from my point of view: there may >me huge number of GTT and session may not need to access GTT at all. >> >>... and so on for other index types. Also, what about custom indexes >>implemented in extensions? It seems a bit strange each of them has to >>support this separately. > >I have already complained about it: my patch supports GTT for all >built-in indexes, but custom indexes has to handle it themselves. >Looks like to provide some generic solution we need to extend index >API, providing two diffrent operations: creation and initialization. >But extending index API is very critical change... And also it doesn't >solve the problem with all existed extensions: them in any case have >to be rewritten to implement new API version in order to support GTT. > Why not to allow creating only indexes implementing this new API method (on GTT)? >> >>IMHO if this really is the right solution, we need to make it work for >>existing indexes without having to tweak them individually. Why don't we >>track a flag whether an index on GTT was initialized in a given session, >>and if it was not then call the build function before calling any other >>function from the index AM? >>But let's talk about other issues caused by "on demand" build. Imagine >>you have 50 sessions, each using the same GTT with a GB of per-session >>data. Now you create a new index on the GTT, which forces the sessions >>to build it's "local" index. Those builds will use maintenance_work_mem >>each, so 50 * m_w_m. I doubt that's expected/sensible. > >I do not see principle difference here with scenario when 50 sessions >create (local) temp table, >populate it with GB of data and create index for it. > I'd say the high memory consumption is pretty significant. >> >>So I suggest we start by just ignoring the *new* indexes, and improve >>this in the future (by building the indexes on demand or whatever). > >Sorry, but still do not agree with this suggestions: >- it doesn't simplify things >- it makes behavior of GTT incompatible with normal tables. >- it doesn't prevent some bad or unexpected behavior which can't be >currently reproduced with normal (local) temp tables. > >> >>>> >>>>I think someone pointed out pushing stuff directly into the cache is >>>>rather problematic, but I don't recall the details. >>>> >>>I have not encountered any problems, so if you can point me on >>>what is wrong with this approach, I will think about alternative >>>solution. >>> >> >>I meant this comment by Robert: >> >>https://www.postgresql.org/message-id/CA%2BTgmoZFWaND4PpT_CJbeu6VZGZKi2rrTuSTL-Ykd97fexTN-w%40mail.gmail.com >> >> >"if any code tried to access the statistics directly from the table, >rather than via the caches". > >Currently optimizer is accessing statistic though caches. So this >approach works. If somebody will rewrite optimizer or provide own >custom optimizer in extension which access statistic directly >then it we really be a problem. But I wonder why bypassing catalog >cache may be needed. > I don't know, but it seems extensions like hypopg do it. >Moreover, if we implement alternative solution - for example make >pg_statistic a view which combines results for normal tables and GTT, >then existed optimizer has to be rewritten >because it can not access statistic in the way it is doing now. And >there will be all problem with all existed extensions which are >accessing statistic in most natural way - through system cache. > Perhaps. I don't know enough about this part of the code to have a strong opinion. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: