M Simms <grim@argh.demon.co.uk> writes:
>> Create function new_function(value_data_type)
>> 'Select sum(t1.var1) from t1 where t1.var2 = $1' LANGUAGE
>> 'sql';
>>
>> update t2 set var1=t2.var1+new_function(t2.var2);
> My reply to this is the question, that as each function contains a
> select, is this technically a single pass.
I think you are right: this way will repeat the sub-select for each
tuple in t2. Now if you have an index on t1.var2, the system could
exploit it to pull out just the desired tuples in each sub-select, so
the total work wouldn't really be O(M*N). But it still looks slow.
If I knew t1 would be large, I think I'd do it with a temporary table:
SELECT var2, sum(var1) FROM t1 GROUP BY var2 INTO temptable;UPDATE t2 SET var1 = t2.var1 + temptable.sum WHERE
t2.var2= temptable.var2;DROP TABLE temptable;
(In 6.5 you can use a temporary table to avoid needing explicit DROP,
but that's just a convenience.) Perhaps there's a way to accomplish
the same thing with a subselect, rather than explicitly creating a
table, but I don't know how.
This is more like O(N log N) in the size of each table, assuming that
the join is done with mergejoin. Actually, if there's not a huge
number of distinct values of var2, then temptable should be small enough
for a hashjoin, which'd mean that there's no need to sort t2 at all.
BTW, I think your original try
update t2 set var1=t2.var1+t1.var1 where t2.var2=t1.var1;
should have worked; it's probably a bug that it doesn't work. IMHO
the semantics ought to be that the addition is repeated for each t1
tuple that matches the current t2 tuple ... but I haven't read the
SQL spec.
regards, tom lane