Thread: TIME QUALIFICATION
It's time, as we've seen on the recent question, the time qualification code has to be modified again. I think a little extension to Vadim's SnapshotData could do the trick. Following is what I so far have in mind to support deferred queries later in v6.6. 1. Add a command counter field to the SnapshotData struct. The command counter in the snapshot is that used in heap scanning instead of the global command counter. 3. Add QueryID fields to the querytree and rangetable entry structures. 3. Create a new global memory context "Snapshot". The lifetime of this memory context is one transaction (at every transaction end/abort an AllocSetReset() is issued on it). 4. Create a new internal counter, the QueryCounter. The counter is also reset between transactions. At parse time, the query and all it's initial RTE's get the same, new QueryCounter. When the rule system generates new queries, only the RTE's coming with the rule (except NEW and OLD) get the QueryId of the new query. All others remain as they are. For every QueryId an entry in the "Snapshot" context is created, which holds the number of RTE's using this snapshot. RTE's in different queries (copied by rules) count multipe. 5. On ExecutorStart(), the actual QuerySnapshot data is copied into the "Snapshot" context and held in the array of Snapshot pointers. The CommandId of the snapshot it set to the current command ID. 6. The executor uses the saved snapshots on heap_beginscan(). The RTE's QueryID tells, which of the snapshots to use. This way, every Scan node in a plan can have a different snapshot and command ID. So we have different visibilities in one query execution. 7. On ExecutorEnd() the snapshot's reference counts is decremented and unused snapshot's thrown away. In v6.6 we could also implement the suggested named snapshots. This only requires that a query Id can be associated with a name. The CREATE SNAPSHOT utilities query Id is that of the snapshot and during parse this Id is placed into the RTE's. Named snapshots are never freed until transaction end or FREE SNAPSHOT. This should be the core functionality that makes deferred queries possible at all. And it must solve the problem with the portal where inserts/updates inside the fetch loop get visible too. Since the portals heapgettup() will use the command counter from the CREATE CURSOR instead of the current command counter, the portal will not see them. The portal will see the database exactly in the state at CREATE CURSOR time. But another SELECT issued after an UPDATE in the same transaction will, as it is supposed to. Have I forgotten something? Vadim, please comment on this. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > 1. Add a command counter field to the SnapshotData struct. > The command counter in the snapshot is that used in heap > scanning instead of the global command counter. Ok. For the SnapshotNow and SnapshotSelf, used in catalog scans, global command counter will be used, as now. > > 3. Add QueryID fields to the querytree and rangetable entry > structures. > > 3. Create a new global memory context "Snapshot". The > lifetime of this memory context is one transaction (at > every transaction end/abort an AllocSetReset() is issued > on it). > > 4. Create a new internal counter, the QueryCounter. The > counter is also reset between transactions. At parse > time, the query and all it's initial RTE's get the same, > new QueryCounter. When the rule system generates new > queries, only the RTE's coming with the rule (except NEW > and OLD) get the QueryId of the new query. All others > remain as they are. For every QueryId an entry in the > "Snapshot" context is created, which holds the number of > RTE's using this snapshot. RTE's in different queries > (copied by rules) count multipe. > > 5. On ExecutorStart(), the actual QuerySnapshot data is > copied into the "Snapshot" context and held in the array > of Snapshot pointers. The CommandId of the snapshot it > set to the current command ID. > > 6. The executor uses the saved snapshots on > heap_beginscan(). The RTE's QueryID tells, which of the > snapshots to use. This way, every Scan node in a plan can > have a different snapshot and command ID. So we have > different visibilities in one query execution. > > 7. On ExecutorEnd() the snapshot's reference counts is > decremented and unused snapshot's thrown away. It seems too complex to me. I again propose to use refcount inside snapshot itself to prevent free-ing of snapshots. Benefits: no copying in Executor, no QueryId --> Snapshot lookup. Just add pointer to RTE. Parser will put NULL there: as flag that current snapshot has to be used. ExecutorStart and deffered rules will increment refcount of current snapshot. Deffered rules will also set snapshot pointers of appropriate RTEs (to the current snapshot). > In v6.6 we could also implement the suggested named > snapshots. This only requires that a query Id can be > associated with a name. The CREATE SNAPSHOT utilities query > Id is that of the snapshot and during parse this Id is placed ^^^^^^^^^^^^ Snapshot names have to be resolved by Executor or just before execution: someday we'll implement stored procedures/functions: no parsing before execution... We could add bool to RTE: use name from snapshot pointer to get real snapshot. Or something like this. > into the RTE's. Named snapshots are never freed until > transaction end or FREE SNAPSHOT. > > This should be the core functionality that makes deferred > queries possible at all. And it must solve the problem with > the portal where inserts/updates inside the fetch loop get > visible too. Since the portals heapgettup() will use the > command counter from the CREATE CURSOR instead of the current > command counter, the portal will not see them. The portal > will see the database exactly in the state at CREATE CURSOR > time. But another SELECT issued after an UPDATE in the same > transaction will, as it is supposed to. Nice. Vadim
Vadim wrote: > It seems too complex to me. I again propose to use refcount > inside snapshot itself to prevent free-ing of snapshots. > Benefits: no copying in Executor, no QueryId --> Snapshot > lookup. Just add pointer to RTE. Parser will put NULL there: > as flag that current snapshot has to be used. ExecutorStart > and deffered rules will increment refcount of current snapshot. > Deffered rules will also set snapshot pointers of appropriate > RTEs (to the current snapshot). Yes, the parser could allways put NULL into the RTE's snapshot name field (or the name later for named snapshots). But it's the rewrite system that has to tell for unnamed snapshots, which ones have to be used on which RTE's. Let's have two simple tables with a rule (and assume in the following that snapshot includes scan command Id): create table t1 (a int4); create table t2 (b int4); create rule r1 as on delete to t1 do delete from t2 where b = old.a; We execute the following commands: begin; delete from t1 where a = 5; insert into t2 values (5); commit; If 5 is in t2 after commit depends on if the rule is deferred or not. If it isn't deferred, 5 should be there, otherwise not. The rule will create a parsetree like this: delete from t2 where t1.a = 5 and b = t1.a; So the tree has a rangetable containing t2 and t1 (along with some other unused entries). But only the rule system knows, that the RTE for t2 came from the rule and must be scanned with the visibility of commit time while t1 came from the original query and must be scanned with the visibility that was when the original delete from t1 was executed (they are already deleted, but the rule actions scan must find em). And there could also be rules fired on t2. This results in recursive rewriting and it's not that easy to foresee the order in which all these commands will then get executed. During recursion there is no difference between a command coming from the user and one that is already generated by another rule. The problem here is, that the RTE's in a rule generated query resulting from the former command (that fired them) must get scanned against the snapshot of the time when the former command get's executed. But the RTE's coming from the rule action itself must get the snapshot when the rules command is executed. Only this way the quals added to the rule from the former command will see what the former command saw. The executor cannot know where all the RTE's where coming from. Except we have a QueryId and associate the QueryId with a snapshot at the time of execution. And I think we must do this lookup, because the order commands are executed will not be the same as they got created. The executor only has to override the RTE's snapshot if the RTE's snapshot name isn't NULL. > > > In v6.6 we could also implement the suggested named > > snapshots. This only requires that a query Id can be > > associated with a name. The CREATE SNAPSHOT utilities query > > Id is that of the snapshot and during parse this Id is placed > ^^^^^^^^^^^^ > Snapshot names have to be resolved by Executor or just before > execution: someday we'll implement stored procedures/functions: > no parsing before execution... > We could add bool to RTE: use name from snapshot pointer > to get real snapshot. Or something like this. That's a point I forgot and I will allready have that problem in prepared SPI plans. One SPI query could also fire rules resulting in multiple plans. So I must change it. The parser rewrite combo allways put's QueryId's starting with 0 into the queries and their RTE's, telling which RTE's belong to which queries execution times. And these must then be offset when the plans get actually added to either the current execution tree list or the deferred execution tree list. And SPI must know about deferred queries, because for prepared plans, one must get deferred for every SPI_execp() call. It's not the rewriter who's managing the deferred tree list. It must be it's caller. So the deferred information is part of the querytree. Better now, thanks Vadim. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > Vadim wrote: > > > It seems too complex to me. I again propose to use refcount > > inside snapshot itself to prevent free-ing of snapshots. > > Benefits: no copying in Executor, no QueryId --> Snapshot > > lookup. Just add pointer to RTE. Parser will put NULL there: > > as flag that current snapshot has to be used. ExecutorStart ^^^^^^^^^^^^^^^^ Note: "current" here is "when actual execution will start", not "when query was parsed/rewritten". ExecutorStart will substitute QuerySnapshot for NULL snapshot pointers. > > and deffered rules will increment refcount of current snapshot. ^^^^^^^^^^^^^^^^ I.e. - QuerySnapshot - as it was when rewriting/execution starts. Sorry, I think that my explanation was bad, hope to fix this -:) > > Deffered rules will also set snapshot pointers of appropriate > > RTEs (to the current snapshot). > > Yes, the parser could allways put NULL into the RTE's > snapshot name field (or the name later for named snapshots). > But it's the rewrite system that has to tell for unnamed > snapshots, which ones have to be used on which RTE's. Of course! > Let's have two simple tables with a rule (and assume in the > following that snapshot includes scan command Id): > > create table t1 (a int4); > create table t2 (b int4); > > create rule r1 as on delete to t1 > do delete from t2 where b = old.a; > > We execute the following commands: > > begin; > delete from t1 where a = 5; > insert into t2 values (5); > commit; > > If 5 is in t2 after commit depends on if the rule is deferred > or not. If it isn't deferred, 5 should be there, otherwise > not. > > The rule will create a parsetree like this: > > delete from t2 where t1.a = 5 and b = t1.a; > > So the tree has a rangetable containing t2 and t1 (along with > some other unused entries). But only the rule system knows, > that the RTE for t2 came from the rule and must be scanned > with the visibility of commit time while t1 came from the > original query and must be scanned with the visibility that > was when the original delete from t1 was executed (they are > already deleted, but the rule actions scan must find em). And so for deffered rules rewrite system will: 1. set t2' RTE snapshot pointer to NULL - this will guarantee that snapshot of execution time (commit or set immediate time) will be used; 2. set t1' RTE snapshot pointer to current QuerySnapshot (and increment its refcount). > And there could also be rules fired on t2. This results in > recursive rewriting and it's not that easy to foresee the > order in which all these commands will then get executed. > During recursion there is no difference between a command > coming from the user and one that is already generated by > another rule. > > The problem here is, that the RTE's in a rule generated query > resulting from the former command (that fired them) must get > scanned against the snapshot of the time when the former > command get's executed. But the RTE's coming from the rule ^^^^^^^^^^^^^^^^^^^^^^^^ So - you use QuerySnapshot as it was in this time. > action itself must get the snapshot when the rules command is Set RTE' snapshot pointer to NULL. > executed. Only this way the quals added to the rule from the > former command will see what the former command saw. > > The executor cannot know where all the RTE's where coming > from. Except we have a QueryId and associate the QueryId with > a snapshot at the time of execution. And I think we must do > this lookup, because the order commands are executed will not > be the same as they got created. The executor only has to > override the RTE's snapshot if the RTE's snapshot name isn't > NULL. + set NULL snapshot pointers to QuerySnapshot. Vadim
Vadim wrote: > > > inside snapshot itself to prevent free-ing of snapshots. > > > Benefits: no copying in Executor, no QueryId --> Snapshot > > > lookup. Just add pointer to RTE. Parser will put NULL there: > > > as flag that current snapshot has to be used. ExecutorStart > ^^^^^^^^^^^^^^^^ > Note: "current" here is "when actual execution will start", not > "when query was parsed/rewritten". ExecutorStart will substitute > QuerySnapshot for NULL snapshot pointers. Yepp - snapshots are allways built just before execution starts. The reason why I need the QueryId and it's lookup is that the time of ExecutorStart() for one query hasn't anything to do with where it was coming from or when it has been parsed/rewritten. Due to the rewriting, RTE's in different queries have relationships. Only the rewrite system knows them, and the only place where this information could be stored is the RTE. All RTE's that are related to each other across queries must use the same snapshot when they get scanned. > And so for deffered rules rewrite system will: > > 1. set t2' RTE snapshot pointer to NULL - this will guarantee > that snapshot of execution time (commit or set immediate time) > will be used; > 2. set t1' RTE snapshot pointer to current QuerySnapshot > (and increment its refcount). At parse/rewrite time there is no actual snapshot. And for SPI prepared plan, the snapshot to use will be different for each execution. The RTE cannot hold the snapshot itself. It could only tell, which of all the snapshots created during a transaction to use for it. > > > And there could also be rules fired on t2. This results in > > recursive rewriting and it's not that easy to foresee the > > order in which all these commands will then get executed. > > During recursion there is no difference between a command > > coming from the user and one that is already generated by > > another rule. > > > > The problem here is, that the RTE's in a rule generated query > > resulting from the former command (that fired them) must get > > scanned against the snapshot of the time when the former > > command get's executed. But the RTE's coming from the rule > ^^^^^^^^^^^^^^^^^^^^^^^^ > So - you use QuerySnapshot as it was in this time. > > > action itself must get the snapshot when the rules command is > > Set RTE' snapshot pointer to NULL. > > > executed. Only this way the quals added to the rule from the > > former command will see what the former command saw. > > > > The executor cannot know where all the RTE's where coming > > from. Except we have a QueryId and associate the QueryId with > > a snapshot at the time of execution. And I think we must do > > this lookup, because the order commands are executed will not > > be the same as they got created. The executor only has to > > override the RTE's snapshot if the RTE's snapshot name isn't > > NULL. > > + set NULL snapshot pointers to QuerySnapshot. That way, the executor would have to set all the snapshot pointers in related RTE's of other queries (not yet executed) too so they point to the same snapshot. I can only think about an ordered set to link all the related RTE's to each other. That would be some kind of ordered set over the related RTE's, but I would get into deep trouble when copying rangetables during rewrite or SPI_saveplan() to keep these set's alive. Maybe I'm not able to explain exactly enough what I have vaguely in mind how it could work. But after you've helped not to forget prepared plans I think I have all the odds and ends to build it. I'll hack around a little. Then let's discuss the final details while having a prototype to look at. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > > And so for deffered rules rewrite system will: > > > > 1. set t2' RTE snapshot pointer to NULL - this will guarantee > > that snapshot of execution time (commit or set immediate time) > > will be used; > > 2. set t1' RTE snapshot pointer to current QuerySnapshot > > (and increment its refcount). > > At parse/rewrite time there is no actual snapshot. And for ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Oh, you're right. This is true for prepared plans. > SPI prepared plan, the snapshot to use will be different for > each execution. The RTE cannot hold the snapshot itself. It > could only tell, which of all the snapshots created during a > transaction to use for it. > ... > > Maybe I'm not able to explain exactly enough what I have > vaguely in mind how it could work. But after you've helped > not to forget prepared plans I think I have all the odds and > ends to build it. > > I'll hack around a little. Then let's discuss the final > details while having a prototype to look at. Ok. If you feel that QueryIds is easier way to go then do it. In any case some preprocessing of plan tree just before execution will be required. BTW, why not use CommandIds ? Vadim
Vadim wrote: > Ok. If you feel that QueryIds is easier way to go then do it. > In any case some preprocessing of plan tree just before execution > will be required. > BTW, why not use CommandIds ? CommandId is the order in which plans get executed and snapshots created. But it isn't the order in which the plans got created. There could easily hundreds of CommandId's been created until a deferred query executes. Some of it's RTE's must get the QuerySnapshot and scanCommandId of an earlier executed plan. But at the time it will be saved for deferred execution, I cannot foresee the CommandId it's parents will get. And the case of cascaded rules? Initial query fires rule action 1 which in turn fires rule action 2. Now initial query executes and fires trigger which executes it's own commands. Thus, the parent of action 2 will not get the second CommandId of the transaction. A plan get's associated with a CommandId at the time it's execution starts. So it's useless to tell the relationship between RTE's. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Just a little question: > > Vadim wrote: > > > Ok. If you feel that QueryIds is easier way to go then do it. > > In any case some preprocessing of plan tree just before execution > > will be required. > > BTW, why not use CommandIds ? > For that preprocessing required to associate RTE's with snapshots I need the exact output of the rewrite system at ExecutorStart() time, so I'm able to find all the RTE's. So far I've found that the planner NULL's out subselects in _make_subplan(). A first (very little) test showed, that it seems to work without doing so too. Who coded that and why is it done there? Does anyone know about other places in the code mucking with the parsetrees after I'm done with them in the rewrite system? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> For that preprocessing required to associate RTE's with > snapshots I need the exact output of the rewrite system at > ExecutorStart() time, so I'm able to find all the RTE's. > > So far I've found that the planner NULL's out subselects in > _make_subplan(). A first (very little) test showed, that it > seems to work without doing so too. Who coded that and why > is it done there? > > Does anyone know about other places in the code mucking with > the parsetrees after I'm done with them in the rewrite > system? Vadim originally wrote the file. Is he splitting up the subqueries? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Jan Wieck wrote: > > For that preprocessing required to associate RTE's with > snapshots I need the exact output of the rewrite system at > ExecutorStart() time, so I'm able to find all the RTE's. > > So far I've found that the planner NULL's out subselects in > _make_subplan(). A first (very little) test showed, that it > seems to work without doing so too. Who coded that and why > is it done there? Do you mean subselect.c:147 line: slink->subselect = NULL; /* cool ?! */ ? As I remember this is done to don't copy subquery' Query node in copyObject when copying plan. Actually, using Query node in Executor is annoying me for ~ 1 year: Executor need not in entire Query node, only in range table! Using Query in Executor is bad for prepared plans: we do copying and storing of mostly useless Query node... This would be nice to have some new TopPlan node with upmost plan, range table and some other things (I had to add some fields to Plan node itself for subqueries while these fields should be only in topmost plan node) and get rid of using Query in Executor. Query is result of parsing, plan is result of planning and source for execution. If you don't want to implement TopPlan node then you could allocate new SubLink node in subselect.c:_make_subplan() to be used in node->sublink... Vadim
> Do you mean subselect.c:147 line: > > slink->subselect = NULL; /* cool ?! */ > > ? Exactly that! > > As I remember this is done to don't copy subquery' Query > node in copyObject when copying plan. Actually, using > Query node in Executor is annoying me for ~ 1 year: > Executor need not in entire Query node, only in range table! > Using Query in Executor is bad for prepared plans: we do > copying and storing of mostly useless Query node... > This would be nice to have some new TopPlan node with > upmost plan, range table and some other things (I had to > add some fields to Plan node itself for subqueries while > these fields should be only in topmost plan node) and get rid > of using Query in Executor. Query is result of parsing, > plan is result of planning and source for execution. > > If you don't want to implement TopPlan node then you could > allocate new SubLink node in subselect.c:_make_subplan() > to be used in node->sublink... Ah - I see. So I assume the sublink->subselect, that's copied into the plan, is totally obsolete too at that point. The subplan has it's own rangetable, which is the same as the (not used) one in the subselect. I think I should tidy up that all to finally pass only plan into executor before going ahead with the deferred query stuff. It doesn't make sense to spend much efford now to prepare the system for deferred queries. It depends too much on where the RTE's are and how we organize them. New TopPlan could be passed down the executor instead of querytree. It might hold a List of rangetables. Plan and SubPlan then have an index telling which nth() rangetable of TopPlan to use for it. This would make execution preprocessing for snapshot->RTE assignment very easy because there's only one place to find ALL RTE's (no need to traverse down a tree). And it would substantial lower the amount of data to copy in SPI, since it must not save the Querytree at all. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > So I assume the sublink->subselect, that's copied into the > plan, is totally obsolete too at that point. The subplan has > it's own rangetable, which is the same as the (not used) one > in the subselect. Exactly. > I think I should tidy up that all to finally pass only plan > into executor before going ahead with the deferred query > stuff. It doesn't make sense to spend much efford now to > prepare the system for deferred queries. It depends too much > on where the RTE's are and how we organize them. > > New TopPlan could be passed down the executor instead of > querytree. It might hold a List of rangetables. Plan and > SubPlan then have an index telling which nth() rangetable of > TopPlan to use for it. > > This would make execution preprocessing for snapshot->RTE > assignment very easy because there's only one place to find > ALL RTE's (no need to traverse down a tree). And it would > substantial lower the amount of data to copy in SPI, since it > must not save the Querytree at all. Nice! Vadim