Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [Proposal] Global temporary tables |
Date | |
Msg-id | 20200112015109.caab6qdr2ob6d5jv@development Whole thread Raw |
In response to | Re: [Proposal] Global temporary tables (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: [Proposal] Global temporary tables
Re: [Proposal] Global temporary tables |
List | pgsql-hackers |
On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote: > > >On 09.01.2020 19:48, Tomas Vondra wrote: >> >>>The most complex and challenged task is to support GTT for all >>>kind of indexes. Unfortunately I can not proposed some good >>>universal solution for it. >>>Just patching all existed indexes implementation seems to be the >>>only choice. >>> >> >>I haven't looked at the indexing issue closely, but IMO we need to >>ensure that every session sees/uses only indexes on GTT that were >>defined before the seesion started using the table. > >Why? It contradicts with behavior of normal tables. >Assume that you have active clients and at some point of time DBA >recognizes that them are spending to much time in scanning some GTT. >It cab create index for this GTT but if existed client will not be >able to use this index, then we need somehow make this clients to >restart their sessions? >In my patch I have implemented building indexes for GTT on demand: if >accessed index on GTT is not yet initialized, then it is filled with >local data. Yes, I know the behavior would be different from behavior for regular tables. And yes, it would not allow fixing slow queries in sessions without interrupting those sessions. I proposed just ignoring those new indexes because it seems much simpler than alternative solutions that I can think of, and it's not like those other solutions don't have other issues. For example, I've looked at the "on demand" building as implemented in global_private_temp-8.patch, I kinda doubt adding a bunch of index build calls into various places in index code seems somewht suspicious. * brinbuild is added to brinRevmapInitialize, which is meant to initialize state for scanning. It seems wrong to build the index we're scanning from this function (layering and all that). * btbuild is called from _bt_getbuf. That seems a bit ... suspicious? ... and so on for other index types. Also, what about custom indexes implemented in extensions? It seems a bit strange each of them has to support this separately. IMHO if this really is the right solution, we need to make it work for existing indexes without having to tweak them individually. Why don't we track a flag whether an index on GTT was initialized in a given session, and if it was not then call the build function before calling any other function from the index AM? But let's talk about other issues caused by "on demand" build. Imagine you have 50 sessions, each using the same GTT with a GB of per-session data. Now you create a new index on the GTT, which forces the sessions to build it's "local" index. Those builds will use maintenance_work_mem each, so 50 * m_w_m. I doubt that's expected/sensible. So I suggest we start by just ignoring the *new* indexes, and improve this in the future (by building the indexes on demand or whatever). >> >>Can't we track which indexes a particular session sees, somehow? >> >>>Statistic is another important case. >>>But once again I do not completely understand why we want to >>>address all this issues with statistic in first version of the >>>patch? >> >>I think the question is which "issues with statistic" you mean. I'm sure >>we can ignore some of them, e.g. the one with parallel workers not >>having any stats (assuming we consider functions using GTT to be >>parallel restricted). > >If we do not use shared buffers for GTT then parallel processing of >GTT is not possible at all, so there is no problem with statistic for >parallel workers. > Right. >> >>I think someone pointed out pushing stuff directly into the cache is >>rather problematic, but I don't recall the details. >> >I have not encountered any problems, so if you can point me on what is >wrong with this approach, I will think about alternative solution. > I meant this comment by Robert: https://www.postgresql.org/message-id/CA%2BTgmoZFWaND4PpT_CJbeu6VZGZKi2rrTuSTL-Ykd97fexTN-w%40mail.gmail.com regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: