cursors, temp tables, dynamic sql technique (was and is: looping over a small record set over and over in a function) - Mailing list pgsql-general
From | Ivan Sergio Borgonovo |
---|---|
Subject | cursors, temp tables, dynamic sql technique (was and is: looping over a small record set over and over in a function) |
Date | |
Msg-id | 20090621145716.51aec2b7@dawn.webthatworks.it Whole thread Raw |
In response to | Re: looping over a small record set over and over in a function (Craig Ringer <craig@postnewspapers.com.au>) |
Responses |
Re: cursors, temp tables, dynamic sql technique (was and
is: looping over a small record set over and over in a function)
|
List | pgsql-general |
On Sun, 21 Jun 2009 10:57:51 +0800 Craig Ringer <craig@postnewspapers.com.au> wrote: > On Fri, 2009-06-19 at 20:23 +0200, Ivan Sergio Borgonovo wrote: > > > If I could easily load all the dataset into an array, loop > > through it and then just update the computed field it would be > > nice... but how? > > Are you sure you _really_ need a PL/PgSQL function for this? Not really sure. I'm investigating. But well I noticed that even working with a fully fledged procedural language I'll have to loop 2 times to replace/add to the "original" field the computed one, so it's surely not a big loss if I "loop" 2 times with SQL. I think everything could be summed up as: select into t myaggregate1(field) from dataset where condition1; if(t>10) then update dataset set field=myfunc1(a,b,c) where condition1; end if; select into t myaggregate2(field) from dataset where condition2; if(t>44) then update dataset set field=myfunc2(a,b,c) where condition2; end if; Actually there is a trick I could use to "skip" the update loop and pass the result to the next loop but it looks convenient in a more general case if the aggregate should be computed on the "updated" value. I'm still not sure if the class of function I'm working with are always of the above class where myaggregate works on the "original" field. Of course myaggregate could compute myfunc... but then I'll have to compute myfunc twice. In a loop I could create temp table t as select *, field as field1, field as field2, field as result from t1 join t2 on... join t3 on... where ... on commit drop; flipflop='field1'; flopflip='field2'; foreach(row) { if(condition on row) { flopflip=myfunc(row[flipflop]); } else { flopflip=row[flipflop]; } agg+=flopflip; } if(condition on agg) { switch flipflop } foreach(row) { if(condition on row) { flopflip=myfunc(row[flipflop]); } else { flopflip=row[flipflop]; } agg+=flopflip; } if(condition on agg) { switch flipflop } execute 'update t set result=' || flipflop; I think this could be obtained using cursors. This is going to be much more verbose since to use the "flipflop" technique I'll have to use dynamic statements and EXECUTE. EXECUTE 'UPDATE t SET' || flipflop || '=' || newfield || 'WHERE CURRENT OF cursor'; even "condition on row" should be expressed as a dynamic statement if condition involve the computed field. Compared to the select + update solution it is going to avoid looping and checking the condition twice but it is going to work on a larger dataset and run the update even for unchanged row (maybe the else condition could be omitted??). I'm not sure that actual set of functions have the update condition equal to the select condition anyway. But I think wasting updates has a larger cost if they require disk IO. I think stuff may look more "convenient" if there was a simple and cheap way to load a record set into an array, do stuff with the procedural language and substitute the old record set with the computed one. Currently I'll get most of the speed up from looping through a smaller data set avoiding to join over a very large table. Considering the very small data set I'm expecting (no more than 20 record) I don't think dynamic statement, lack of indexes etc... are going to have an impact. As soon as clients will increase, optimizing the loops may be necessary. I'll have to see if all the cases I'm dealing with could be solved by the SELECT + conditional UPDATE technique since it looks much easier to maintain. Furthermore I suspect that for my real case the conditional check will be made on the original value so that myfunction will be computed at most once and I can delay an aggregate computation on the computed field after the last loop has been executed. The remaining costs will be: - looping twice on the same table (getting the aggregate + updating the table) - updating if it causes disk IO I even suspect that the test will mostly fail so that updates will be rare but I still would like to understand how this could work in the most general case and how the temp table + cursor + flipflop technique is going to work internally. I think I really don't have a clear picture of how temp tables really work. They can be seen by concurrent transactions in the same session. But if the transaction in which a temp table is created is not committed yet, other transactions won't see it. If the transaction where the temp table is created is aborted... no other transaction will ever know about the existence of the temp table. If the temp table is defined as DROP ON COMMIT... somehow there is no need to make the changes happening on the temp table land on disk. So it may actually look more as a temporary, private storage that doesn't have to be aware of concurrency. So temp tables should require less disk IO. Is it? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
pgsql-general by date: