Re: [Proposal] Global temporary tables - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [Proposal] Global temporary tables |
Date | |
Msg-id | CA+TgmoaNYmuW7oUSeOviXM_8pFfgroDJbY5TXx7j=P7eXhpHqw@mail.gmail.com Whole thread Raw |
In response to | Re: [Proposal] Global temporary tables (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: [Proposal] Global temporary tables
|
List | pgsql-hackers |
On Thu, Jan 30, 2020 at 4:33 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > On 29.01.2020 21:16, Robert Haas wrote: > > On Wed, Jan 29, 2020 at 10:30 AM Konstantin Knizhnik > > <k.knizhnik@postgrespro.ru> wrote: > > > > I think that the idea of calling ambuild() on the fly is not going to > > work, because, again, I don't think that calling that from random > > places in the code is safe. > > It is not a random place in the code. > Actually it is just one place - _bt_getbuf > Why it can be unsafe if it affects only private backends data? Because, as I already said, not every operation is safe at every point in the code. This is true even when there's no concurrency involved. For example, executing user-defined code is not safe while holding a buffer lock, because the user-defined code might try to do something that locks the same buffer, which would cause an undetected, uninterruptible deadlock. > But GTT case is different. Heap and indexes can be easily initialized by > backend using existed functions. That would be nice if we could make it work. Someone would need to show, however, that it's safe. > You say that it not safe. But you have not explained why it is unsafe. > Yes, I agree that it is my responsibility to prove that it is safe. > And as I already wrote, I can not provide such proof now. I will be > pleased if you or anybody else can help to convince that this approach > is safe or demonstrate problems with this approach. That's fair, but nobody's obliged to spend time on that. > But I really like to receive more constructive critics rather than "this > approach is wrong because it is unsafe". I'm sure, and that's probably valid. Equally, however, I'd like to receive more analysis of why it is safe than "I don't see anything wrong with it so it's probably fine." And I think that's pretty valid, too. > Actually index is not created on the fly. > Index is created is usual way, by executing "create index" command. > So all components of the Postgres (planner, executor,...) treat GTT > indexes in the same way as regular indexes. > Locking and invalidations policies are exactly the same for them. > The only difference is that content of GTT index is constructed on > demand using private backend data. > Is it safe or not? We are just reading data from local buffers/files and > writing them here. > May be I missed something but I do not see any unsafety here. > There are issues with updating statistic but them can be solved. But that's not all you are doing. To build the index, you'll have to sort the data. To sort the data, you'll have to call btree support functions. Those functions can be user-defined, and can do complex operations like catalog access that depend on a good transaction state, no buffer locks being held, and nothing already in progress within this backend that can get confused as a result of this operation. Just as a quick test, I tried doing this in _bt_getbuf: + if (InterruptHoldoffCount != 0) + elog(WARNING, "INTERRUPTS ARE HELD OFF"); That causes 103 out of 196 regression tests to fail, which means that it's pretty common to arrive in _bt_getbuf() with interrupts held off. At the very least, that means that the index build would be uninterruptible, which already seems unacceptable. Probably, it means that the calling code is already holding an LWLock, because that's normally what causes HOLD_INTERRUPTS() to happen. And as I have already explained, that is super-problematic, because of deadlock risks, and because it risks putting other backends into non-interruptible waits if they should happen to need the LWLock we're holding here. I really don't understand why the basic point here remains obscure. In general, it tends to be unsafe to call high-level code from low-level code, not just in PostgreSQL but in pretty much any program. Do you think that we can safely add a GUC that executes a user-defined SQL query every time an LWLock is acquired? If you do, why don't you try adding code to do that to LWLockAcquire and testing it out a little bit? Try making the SQL query do something like query pg_class, find a table name that's not in use, and create a table by that name. Then run the regression tests with the GUC set to run that query and see how it goes. I always hate to say that things are "obvious," because what's obvious to me may not be obvious to somebody else, but it is clear to me, at least, that this has no chance of working. Even though I can't say exactly what will break, or what will break first, I'm very sure that a lot of things will break and that most of them are unfixable. Now, your idea is not quite as crazy as that, but it has the same basic problem: you can't insert code into a low-level facility that uses a high level facility which may in turn use and depend on that very same low-level facility to not be in the middle of an operation. If you do, it's going to break somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: