Thread: short-cutting if sum()>constant
Hi, I'd like to know if select sum(qty) from t where status=37; is > constant. qty is always >0. Is there a way to skip examining further rows and return a result ASAP? -- Ivan Sergio Borgonovo http://www.webthatworks.it
2009/12/22 Ivan Sergio Borgonovo <mail@webthatworks.it>
Hi,
Hi :-)
I'd like to know if
select sum(qty) from t where status=37;
is > constant.
qty is always >0.
Is there a way to skip examining further rows and return a result
ASAP?
With plain SQL, no.
With a user defined function in PL/PgSQL, yes.
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/
On Tue, 2009-12-22 at 13:35 +0100, Ivan Sergio Borgonovo wrote: > Hi, > I'd like to know if > select sum(qty) from t where status=37; > is > constant. > qty is always >0. > Is there a way to skip examining further rows and return a result > ASAP? SELECT SUM(object_version) FROM date_x WHERE owner_id = 10100 HAVING SUM(object_version) > 1000 ? -- OpenGroupware developer: awilliam@whitemice.org <http://whitemiceconsulting.blogspot.com/> OpenGroupare & Cyrus IMAPd documenation @ <http://docs.opengroupware.org/Members/whitemice/wmogag/file_view>
In response to Ivan Sergio Borgonovo : > Hi, > > I'd like to know if > > select sum(qty) from t where status=37; > > is > constant. > > qty is always >0. > > Is there a way to skip examining further rows and return a result > ASAP? I think no. But you can create a new table with 2 columns: status (primary key) and a column for the sum of qty. And you need a TRIGGER: update this table for every insert, update and delete on your table t. Now you can ask _this_ new table if the sum(qty) > your constant, this should work very fast. Just an idea... Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
Does PG support CTE? You can try it. > In response to Ivan Sergio Borgonovo : > > Развернуть > > Hi, > > > > I'd like to know if > > > > select sum(qty) from t where status=37; > > > > is > constant. > > > > qty is always >0. > > > > Is there a way to skip examining further rows and return a result > > ASAP? > I think no. > But you can create a new table with 2 columns: status (primary key) and > a column for the sum of qty. And you need a TRIGGER: update this table > for every insert, update and delete on your table t. > Now you can ask _this_ new table if the sum(qty) > your constant, this > should work very fast. > Just an idea... > Andreas > Яндекс.Почта. Письма есть. Спама - нет. http://mail.yandex.ru/nospam/sign
On Tue, 22 Dec 2009 14:48:55 +0100 Filip Rembiałkowski <plk.zuber@gmail.com> wrote: > With plain SQL, no. > > With a user defined function in PL/PgSQL, yes. thanks to all. I'm on 8.3 so no CTE. I was hoping there was some way to write it in plain SQL. I'm planning to wrap everything in a plpgsql function using cursors and then switch to WITH when I'll move to 8.4. -- Ivan Sergio Borgonovo http://www.webthatworks.it
msi77 <msi77@yandex.ru> wrote: > Does PG support CTE? Since 8.4 yes. > You can try it. Sorry, but i don't know how a CTE can help in this case, can you explain that? Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
> Sorry, but i don't know how a CTE can help in this case, can you explain I mean RECURSIVE CTE. You can check your condition on each iteration and stop execution when condition is false. Sergey > msi77 <msi77@yandex.ru> wrote: > > Развернуть > > Does PG support CTE? > Since 8.4 yes. > > Развернуть > > You can try it. > Sorry, but i don't know how a CTE can help in this case, can you explain > that? > Andreas > Здесь спама нет http://mail.yandex.ru/nospam/sign
Hello I found one ugly trick. You can multiply lines and SUM > cons could be replaced limit clause: postgres=# select * from data;a ---321423 (6 rows) Then SELECT * FROM WHERE and stop when SUM(a) = n then postgres=# select generate_series(1,a) from data;generate_series ----------------- 1 2 3 1 2 1 1 2 3 4 1 2 1 2 3 So If I would to check if there are sum(a) >= 10 then I can use LIMIT 10. If query returns ten rows, then result is true, else result is false select a, (a = generate_series(1,a))::int from data limit 12; -- stop after sum(a) = 12 postgres=# select sum(x) from (select 1 as x,(a = generate_series(1,a))::int from data limit 12) s;sum ----- 12 -- 12 is eq 12, so test is successful (1 row) Regards Pavel Stehule 2009/12/22 Ivan Sergio Borgonovo <mail@webthatworks.it>: > Hi, > > I'd like to know if > > select sum(qty) from t where status=37; > > is > constant. > > qty is always >0. > > Is there a way to skip examining further rows and return a result > ASAP? > > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On Tue, 22 Dec 2009 20:47:18 +0100 Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > I found one ugly trick. You can multiply lines and SUM > cons > could be replaced limit clause: The trick is really smart (and fun), kudos, really, it's always a pleasure to read your solutions, thanks. But as expected: In a table with 100000 rows with random values [0,9] test=# select sum(x) from (select 1 as x,(a = generate_series(1,a))::int from data limit 100000) s; sum --------100000 (1 row) Time: 71.423 ms test=# create or replace function tano(a int, out b int) returns int as $$ declare row record; beginb :=0; for row in select a as _a from data loop b := row._a + b; if (b>=a) then return; end if; end loop; return; end; $$ language plpgsql; test=# select * from tano(100000); b --------100000 (1 row) Time: 0.187 ms I run both several times to avoid simple caching issues... anyway I didn't really run a serious benchmark, but results were always in the same order of magnitude. I hope I didn't make any mistake. -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Wed, 23 Dec 2009 00:00:31 +0100 Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > On Tue, 22 Dec 2009 20:47:18 +0100 > Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > Hello > > > > I found one ugly trick. You can multiply lines and SUM > cons > > could be replaced limit clause: > > The trick is really smart (and fun), kudos, really, it's always a > pleasure to read your solutions, thanks. > > But as expected: as unexpected... > test=# create or replace function tano(a int, out b int) ^^^ should be anything buta > returns int as > $$ > declare > row record; > begin > b :=0; > for row in select a as _a from data where a>0 > loop > b := row._a + b; > if (b>=a) then > return; > end if; > end loop; > return; > end; > $$ language plpgsql; Making it longer to better appreciate the difference: 1M rows where a [0,2] select * from tano((1000000)::int); b ---------1000001 (1 row) Time: 1235.243 ms select sum(x) from (select 1 as x,(a = generate_series(1,a))::int from data limit 1000000) s; sum ---------1000000 (1 row) Time: 1309.441 ms Being fair once you add the where clause to the generate_series version the difference in performance is negligible and saying that the plpgsql version is faster would require some more serious benchmarking. Surprised! If the generate_series can compete with the plpgsql for loop... why is the plpgsql version so "slow"? -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Wed, 23 Dec 2009 01:09:40 +0100 Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > On Wed, 23 Dec 2009 00:00:31 +0100 > Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > > > On Tue, 22 Dec 2009 20:47:18 +0100 > > Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > > > Hello > > > > > > I found one ugly trick. You can multiply lines and SUM > cons > > > could be replaced limit clause: > > > > The trick is really smart (and fun), kudos, really, it's always a > > pleasure to read your solutions, thanks. > > > > But as expected: > > as unexpected... As even more unexpected... when all row are >0 and most of them are equal to 1 the generate_series performs appreciably better (roughly 15% faster). And I think your version can be further optimised: select count(*) from (select (generate_series(1,a))::int from data limit 90000000) s; This perform 30% faster. So what's so slow in the plpgsql version? Fortunately as expected when "enough" rows are >1 the for loop solution perform much better. -- Ivan Sergio Borgonovo http://www.webthatworks.it
2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Wed, 23 Dec 2009 01:09:40 +0100 > Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: > >> On Wed, 23 Dec 2009 00:00:31 +0100 >> Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: >> >> > On Tue, 22 Dec 2009 20:47:18 +0100 >> > Pavel Stehule <pavel.stehule@gmail.com> wrote: >> > >> > > Hello >> > > >> > > I found one ugly trick. You can multiply lines and SUM > cons >> > > could be replaced limit clause: >> > >> > The trick is really smart (and fun), kudos, really, it's always a >> > pleasure to read your solutions, thanks. >> > >> > But as expected: >> >> as unexpected... > > As even more unexpected... when all row are >0 and most of them are > equal to 1 the generate_series performs appreciably better (roughly > 15% faster). > And I think your version can be further optimised: > select count(*) from (select (generate_series(1,a))::int from > data limit 90000000) s; > This perform 30% faster. > > So what's so slow in the plpgsql version? don't forget - plpgsql is interpret - it is best as glue for SQL statement. I don't thing so plpgsql is slow - speed is similar to using buildin functionality. But I am sure, rewritening your function to C could help. If you need maximal speed. I thing, so there are other trick, I am not sure if it is faster. You can create own aggregate. In state function you can calculate and check state value. If it is over your limit, then you can raise exception. So if your query will be finished with custom exception, then sum(c) > n is true. Regards Pavel Stehule > > Fortunately as expected when "enough" rows are >1 the for loop > solution perform much better. > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On Wed, 23 Dec 2009 08:38:52 +0100 Pavel Stehule <pavel.stehule@gmail.com> wrote: > > As even more unexpected... when all row are >0 and most of them > > are equal to 1 the generate_series performs appreciably better > > (roughly 15% faster). > > And I think your version can be further optimised: > > select count(*) from (select (generate_series(1,a))::int from > > data limit 90000000) s; > > This perform 30% faster. > > > > So what's so slow in the plpgsql version? > > don't forget - plpgsql is interpret - it is best as glue for SQL Yeah but how many times is it interpreted in a 1M cycle expecially when you ran it more than one time and the interpreter should be "hot"? I don't know how much "interpreter" work can be saved from a previous run of the function... but still there should be something left [1]. plpgsql isn't duck typed and that function doesn't leave too much space for "interpretation". Unless I'm missing something, in a typed language like plpgsql that function could be easily turned into its C equivalent by the interpreter. I really thought the cost of running plpgsql was much lower in such kind of situation. Consider that the cost doesn't come from the initial cost of interpreting the function but it is proportional to the numbers of cycles. Even hard coding the LIMIT in the plpgsql version doesn't make it faster enough. So it is not the planner. In a 10M rows dataset where 8332885 are =1 and the rest are =2: If I'm testing for >900000 I get 940.580 ms vs 1302.162 ms If I'm testing for >9000000 I get 8816.263 ms vs 12306.483 ms BTW select sum(a) from data; takes 1999.492 ms. select count(*) from data; takes 1612.039 ms While smart your way is definitively more convoluted and should require much more cpu cycles and memory. The plperl version: create or replace function totano(int) returns int as $$ my $b = 0; my $row; my $sth = spi_query("select a from data"); while(defined($row=spi_fetchrow($sth))){ $b += $row->{a}; if($b>=$_[0]) { return $b; } } return $b; $$ LANGUAGE plperl; Is 10 times slower than the plpgsql version. > statement. I don't thing so plpgsql is slow - speed is similar to > using buildin functionality. But I am sure, rewritening your > function to C could help. If you need maximal speed. In my case it is not worth. I was just looking into something that didn't have to scan all the rows. With the real distribution of data and no real workload on the box your solution is faster but the difference while unexpectedly appreciable is nearly negligible. Anyway I'm not able to justify the difference in speed between plpgsql and your solution in such case unless plpgsql is switching back and forward between binary data and their text representation. It would be nice if someone that know plpgsql internals explain where the cost comes from. > I thing, so there are other trick, I am not sure if it is faster. > You can create own aggregate. In state function you can calculate > and check state value. If it is over your limit, then you can raise > exception. So if your query will be finished with custom exception, > then sum(c) > n is true. I may test it later. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Wed, 23 Dec 2009 08:38:52 +0100 > Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> > As even more unexpected... when all row are >0 and most of them >> > are equal to 1 the generate_series performs appreciably better >> > (roughly 15% faster). >> > And I think your version can be further optimised: >> > select count(*) from (select (generate_series(1,a))::int from >> > data limit 90000000) s; >> > This perform 30% faster. >> > >> > So what's so slow in the plpgsql version? >> >> don't forget - plpgsql is interpret - it is best as glue for SQL > > Yeah but how many times is it interpreted in a 1M cycle expecially > when you ran it more than one time and the interpreter should be > "hot"? I don't know how much "interpreter" work can be saved from a > previous run of the function... but still there should be something > left [1]. > plpgsql isn't duck typed and that function doesn't leave too much > space for "interpretation". > Unless I'm missing something, in a typed language like plpgsql that > function could be easily turned into its C equivalent by the > interpreter. > I really thought the cost of running plpgsql was much lower in such > kind of situation. The problem is massive cycle. Plpgsql really isn't language for this. This interpret missing own arithmetic unit. So every expression is translated to SELECT statement IF a > c ~ SELECT a > c a = a + 1 ~ SELECT a + 1 these queries are executed in some special mode, but still it is more expensive than C a = a + 1 > > Consider that the cost doesn't come from the initial cost of > interpreting the function but it is proportional to the numbers of > cycles. > > Even hard coding the LIMIT in the plpgsql version doesn't make it > faster enough. So it is not the planner. > > In a 10M rows dataset where 8332885 are =1 and the rest are =2: > > If I'm testing for >900000 > I get 940.580 ms vs 1302.162 ms > If I'm testing for >9000000 > I get 8816.263 ms vs 12306.483 ms > > BTW > select sum(a) from data; takes 1999.492 ms. > select count(*) from data; takes 1612.039 ms > it is slower, because there is one NULL test more. > While smart your way is definitively more convoluted and should > require much more cpu cycles and memory. > > The plperl version: > create or replace function totano(int) > returns int as > $$ > my $b = 0; > my $row; > my $sth = spi_query("select a from data"); > while(defined($row=spi_fetchrow($sth))) { > $b += $row->{a}; > if($b>=$_[0]) { > return $b; > } > } > return $b; > $$ LANGUAGE plperl; > > Is 10 times slower than the plpgsql version. > PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses normal standard recordset - it means so your query generate full scan. In your plperl code, you have to use explicit CURSORS. Then the speed should be comparable. Plperlu is little bit faster, but plpgsql has direct cursor support. >> statement. I don't thing so plpgsql is slow - speed is similar to >> using buildin functionality. But I am sure, rewritening your >> function to C could help. If you need maximal speed. > > In my case it is not worth. I was just looking into something that > didn't have to scan all the rows. > With the real distribution of data and no real workload on the box > your solution is faster but the difference while unexpectedly > appreciable is nearly negligible. > > Anyway I'm not able to justify the difference in speed between > plpgsql and your solution in such case unless plpgsql is switching > back and forward between binary data and their text representation. > It would be nice if someone that know plpgsql internals explain > where the cost comes from. I thing so most expensive part in your plpgsql code is two expression. Regards Pavel > >> I thing, so there are other trick, I am not sure if it is faster. >> You can create own aggregate. In state function you can calculate >> and check state value. If it is over your limit, then you can raise >> exception. So if your query will be finished with custom exception, >> then sum(c) > n is true. > > I may test it later. > > thanks > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On Wed, 23 Dec 2009 12:52:38 +0100 Pavel Stehule <pavel.stehule@gmail.com> wrote: > The problem is massive cycle. Plpgsql really isn't language for > this. This interpret missing own arithmetic unit. So every > expression is translated to SELECT statement > > IF a > c ~ SELECT a > c > a = a + 1 ~ SELECT a + 1 > these queries are executed in some special mode, but still it is > more expensive than C a = a + 1 I didn't get it. I'd expect that since plpgsql should shere SQL data types it could map easily something like if (a>b) then or a := a +b and something like a for in query loop was going to be highly optimised as well. plpgsql should be the most tightly bounded language to the internals of pg SQL. > > select sum(a) from data; takes 1999.492 ms. > > select count(*) from data; takes 1612.039 ms > it is slower, because there is one NULL test more. That didn't came as a surprise. It was there for comparison. > PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses I didn't notice. That was my first plperl function in my life. Anyway that means that that test didn't say anything about interpreter speed and duck typing. > > Anyway I'm not able to justify the difference in speed between > > plpgsql and your solution in such case unless plpgsql is > > switching back and forward between binary data and their text > > representation. It would be nice if someone that know plpgsql > > internals explain where the cost comes from. > I thing so most expensive part in your plpgsql code is two > expression. I didn't understand. What's the reason the interpreter can't translate the if and the b := row._a + b; into something that very resemble compiled C? plpgsql is not doing anything different than: select count(*) from (select case when a>3 then 1 else 2 end, a+a from data limit 9000000) as f; One of the advantages of using plpgsql (and one annoyance too) is that variables are declared and plpgsql should know how to operate on them with native C code. BTW the best performer considering all constraints and data distribution seems to be the simplest one: select sum(s.a) from (select a from data where a>0 limit 9000000) s; Time: 2620.677 ms We've no negative int... and they are int, so they have to be no less than 1 if they are not 0. If I know there are no 0, the simplest version become even faster. When I randomly pick up values in [1,3] plpgsql and generate_series start to perform similarly but still the simplest version is leading. When the interval is in [1,6] the plpgsql becomes faster than the generate_series but the simplest version is still leading. Just when the interval is in [1,10] the plpgsql version and the simplest one becomes comparable. -- Ivan Sergio Borgonovo http://www.webthatworks.it
2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Wed, 23 Dec 2009 12:52:38 +0100 > Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> The problem is massive cycle. Plpgsql really isn't language for >> this. This interpret missing own arithmetic unit. So every >> expression is translated to SELECT statement >> >> IF a > c ~ SELECT a > c >> a = a + 1 ~ SELECT a + 1 > >> these queries are executed in some special mode, but still it is >> more expensive than C a = a + 1 > > I didn't get it. > I'd expect that since plpgsql should shere SQL data types it could > map easily something like > > if (a>b) then > or > a := a +b > > and something like a for in query loop was going to be highly > optimised as well. > plpgsql should be the most tightly bounded language to the internals > of pg SQL. no there are not any internal optimisation. PostgreSQL can optimize only execution plan - not individual expressions. > >> > select sum(a) from data; takes 1999.492 ms. >> > select count(*) from data; takes 1612.039 ms > >> it is slower, because there is one NULL test more. > > That didn't came as a surprise. It was there for comparison. > >> PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses > > I didn't notice. That was my first plperl function in my life. > Anyway that means that that test didn't say anything about > interpreter speed and duck typing. you don't compare equal things. > >> > Anyway I'm not able to justify the difference in speed between >> > plpgsql and your solution in such case unless plpgsql is >> > switching back and forward between binary data and their text >> > representation. It would be nice if someone that know plpgsql >> > internals explain where the cost comes from. > >> I thing so most expensive part in your plpgsql code is two >> expression. > one expression: if >> loop_var > parameter << second expression: loop_var + 1 > I didn't understand. > > What's the reason the interpreter can't translate the if and the b := > row._a + b; into something that very resemble compiled C? a) simplicity. PLpgSQL interpret is very simple. b) mostly types are little bit different behave than natural C types - domains are different, C types doesn't know NULL value, ... so if you like maximum speed, then you can use C language. It is relative simple, much simpler than C procedures in T-SQL or Oracle. > plpgsql is not doing anything different than: > select count(*) from (select case when a>3 then 1 else 2 end, a+a > from data limit 9000000) as f; no - it doesn't any intelligence - it doesn't join expression together. It does exactly it, what you write. see http://okbob.blogspot.com/2009/12/how-dont-write-plpgsql-functions.html > One of the advantages of using plpgsql (and one annoyance too) is > that variables are declared and plpgsql should know how to operate > on them with native C code. It little bit nonsense. On 99% plpgsql use SPI api and work with variables via query interface. PLpgSQL can do loop statement, if statement, but for all others use internal query based API. > > BTW the best performer considering all constraints and data > distribution seems to be the simplest one: > > select sum(s.a) from (select a from data where a>0 limit 9000000) s; > Time: 2620.677 ms > this query is little but different, than you original request, but it could work for you. Regard Pavel Stehule > We've no negative int... and they are int, so they have to be no > less than 1 if they are not 0. If I know there are no 0, the > simplest version become even faster. > > When I randomly pick up values in [1,3] plpgsql and generate_series > start to perform similarly but still the simplest version is leading. > > When the interval is in [1,6] the plpgsql becomes faster than the > generate_series but the simplest version is still leading. > > Just when the interval is in [1,10] the plpgsql version and the > simplest one becomes comparable. > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On Wed, 23 Dec 2009 14:35:27 +0100 Pavel Stehule <pavel.stehule@gmail.com> wrote: > a) simplicity. PLpgSQL interpret is very simple. > b) mostly types are little bit different behave than natural C > types - domains are different, C types doesn't know NULL value, ... But well... there should be a mapping somewhere between SQL int and C and it can be plainly reused. $b = 5 + "10"; is going to be far more problematic to execute... but I guess that's exactly the problem. create or replace function tano(z int, out b int) returns int as $$ declare x varchar(4) = '12'; y int = 10; begin b := x + y; return; end; $$ language plpgsql; select * from tano(1); ERROR: operator does not exist: character varying + integer LINE 1: SELECT $1 + $2 ^ From what I can guess... the interpreter is sending SQL commands to pg and waiting result back. So actually the "interpreter" isn't actually aware of data types... it just relies on SQL to spot data type mismatch. That means... that everything is converted back and forward to *text*. On the other side once a SQL command is interpreted it actually knows what data types are and can pretty quickly sum int to int just taking care of nulls and overflows that would be anyway problems for any other language trying to sum "SQL ints". Even if it had to sum int and bigint the code could be optimised for just that and the need of a cast should be known in advance before every "loop". Did I get it? That's what you were trying to make me understand with: > It little bit nonsense. On 99% plpgsql use SPI api and work with > variables via query interface. PLpgSQL can do loop statement, if > statement, but for all others use internal query based API. > this query is little but different, than you original request, but > it could work for you. Yep... making clear a is an int simplify the problem quite a lot. But you couldn't use generate_series if a was not an int. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Pavel Stehule wrote: > The problem is massive cycle. Plpgsql really isn't language for this. > This interpret missing own arithmetic unit. So every expression is > translated to SELECT statement > > IF a > c ~ SELECT a > c > a = a + 1 ~ SELECT a + 1 > > these queries are executed in some special mode, but still it is more > expensive than C a = a + 1 ... and may have different rules, so you can't just write a simple "map expressions to C equivalents" arithmetic evaluator. It's also far from easy to just translate PL/PgSQL to directly machine-executable code as you suggested, by the way. It'd really require a just-in-time compiler akin to what Java uses, though the ability to compile once and cache would help get rid of some of the complexity of Java's. It'd quickly become attractive to just use PL/Java instead, or write your own C-language function and LOAD it. -- Craig Ringer
2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Wed, 23 Dec 2009 14:35:27 +0100 > Pavel Stehule <pavel.stehule@gmail.com> wrote: > > >> a) simplicity. PLpgSQL interpret is very simple. >> b) mostly types are little bit different behave than natural C >> types - domains are different, C types doesn't know NULL value, ... > > But well... there should be a mapping somewhere between SQL int and > C and it can be plainly reused. > > $b = 5 + "10"; > > is going to be far more problematic to execute... but I guess that's > exactly the problem. > > create or replace function tano(z int, out b int) returns int as > $$ > declare > x varchar(4) = '12'; > y int = 10; > begin > b := x + y; > return; > end; > $$ language plpgsql; > > select * from tano(1); > ERROR: operator does not exist: character varying + integer > LINE 1: SELECT $1 + $2 > ^ > > From what I can guess... the interpreter is sending SQL commands to > pg and waiting result back. So actually the "interpreter" isn't > actually aware of data types... it just relies on SQL to spot data > type mismatch. > That means... that everything is converted back and forward to > *text*. no - not everything. plpgsql hold values in PostgreSQL native types. But these types are not 100% equal to C types. Integer is +- equal to C int. Varchar is absolutly different then C string. > On the other side once a SQL command is interpreted it actually > knows what data types are and can pretty quickly sum int to int just > taking care of nulls and overflows that would be anyway problems for > any other language trying to sum "SQL ints". > > Even if it had to sum int and bigint the code could be optimised for > just that and the need of a cast should be known in advance before > every "loop". it could be, but it isn't. PLpgSQL is very simple - you have to know, so every function is "recompiled" everytime when function is called first time in session. So there are not time for full optimalisations like C languages. PLpgSQL do fast non optimalised execution - like Pascal. If you need well optimized code, then you have to use C language and external stored procedures. PLpgSQL is best as glue of SQL statements. Not for numeric calculation, complex string operations. Regards Pavel Stehule > > Did I get it? > > That's what you were trying to make me understand with: > >> It little bit nonsense. On 99% plpgsql use SPI api and work with >> variables via query interface. PLpgSQL can do loop statement, if >> statement, but for all others use internal query based API. > >> this query is little but different, than you original request, but >> it could work for you. > > Yep... making clear a is an int simplify the problem quite a lot. > But you couldn't use generate_series if a was not an int. > > thanks > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
Craig Ringer <craig@postnewspapers.com.au> writes: > Pavel Stehule wrote: >> these queries are executed in some special mode, but still it is more >> expensive than C a = a + 1 > ... and may have different rules, so you can't just write a simple "map > expressions to C equivalents" arithmetic evaluator. Yeah. As an example, overflow is supposed to be caught in "a + 1", unlike what would happen in C. In principle you could map some of the builtin operators into inline code, but it would be a great deal of work and the results would be un-portable. regards, tom lane
On Wed, 23 Dec 2009 11:36:31 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig@postnewspapers.com.au> writes: > > Pavel Stehule wrote: > >> these queries are executed in some special mode, but still it > >> is more expensive than C a = a + 1 > > > ... and may have different rules, so you can't just write a > > simple "map expressions to C equivalents" arithmetic evaluator. > Yeah. As an example, overflow is supposed to be caught in "a + 1", > unlike what would happen in C. > In principle you could map some of the builtin operators into > inline code, but it would be a great deal of work and the results > would be un-portable. Tank you all for being so patient... I really miss how actually procedural languages works internally. doesn't pg routinely map between SQL and C? What is the difference between select a+a from data; and a := a + a; in a plpgsql function? plpgsql knows that a are eg. int so it could just use the same C code that it uses when it has to sum a+a in sql. My guess since I don't even know what to look for to get an idea of the internal working of plpgsql is that the interpreter translate the code into SQL (sort of...), it sends it to the parser through SPI_execute/prepare etc... (so yeah maybe for the "data" it is not really sending "text" representation of data) but still... the "code" has to be further interpreted... So something like: a := a + a; turns out to be: SPI_prepare("SELECT $1 + $2", 2, ...); and this is going to be called for every loop. while I thought the SQL engine and plpgsql interpreter were nearer so that the interpreter could push directly in the SQL engine the values of a. Am I getting nearer? -- Ivan Sergio Borgonovo http://www.webthatworks.it
2009/12/23 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Wed, 23 Dec 2009 11:36:31 -0500 > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Craig Ringer <craig@postnewspapers.com.au> writes: >> > Pavel Stehule wrote: >> >> these queries are executed in some special mode, but still it >> >> is more expensive than C a = a + 1 >> >> > ... and may have different rules, so you can't just write a >> > simple "map expressions to C equivalents" arithmetic evaluator. > >> Yeah. As an example, overflow is supposed to be caught in "a + 1", >> unlike what would happen in C. > >> In principle you could map some of the builtin operators into >> inline code, but it would be a great deal of work and the results >> would be un-portable. > > Tank you all for being so patient... > I really miss how actually procedural languages works internally. > > doesn't pg routinely map between SQL and C? generally yes, but not directly. for example: operator + for type integer is wrapped by function int4pl http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/int.c?rev=1.86 After parsing and transformations, the expression is transformed to sequence of call functions like int4pl. PostgreSQL doesn't contains C compiler - so it cannot compile to target code. > > What is the difference between > > select a+a from data; > and > a := a + a; > in a plpgsql function? nothing, you have to know so plpgsql doesn't see expression a+a; An content of any expression is invisible for plpgsql parser. PLpgSQL doesn't understand to expressions. PLpgSQL knows so somewhere have to be expression, or so somewhere have to boolean expression, but own expression is black box for plpgsql interpret. > > plpgsql knows that a are eg. int so it could just use the same C > code that it uses when it has to sum a+a in sql. PLpgSQL knows it. But this knowledge isn't enough. You have to have a real compiler to machine code. But PostgreSQL hasn't real compiler - it is only set of some specialised interprets. There are SQL interpret, there are PLpgSQL interpret. Nothing is translated to machine code. > > My guess since I don't even know what to look for to get an idea of > the internal working of plpgsql is that the interpreter translate > the code into SQL (sort of...), it sends it to the parser through > SPI_execute/prepare etc... (so yeah maybe for the "data" it is not > really sending "text" representation of data) but still... the > "code" has to be further interpreted... > +/- plpgsql uses cached plans. So SPI_prepare is called only when expression is first time evaluated (for session). > So something like: > a := a + a; > turns out to be: > SPI_prepare("SELECT $1 + $2", 2, ...); > and this is going to be called for every loop. > while I thought the SQL engine and plpgsql interpreter were nearer > so that the interpreter could push directly in the SQL engine the > values of a. > > Am I getting nearer? there are two steps: if (cached_plan == NULL) cached_plan = prepare("SELECT $1 + $2, info_about_types[]) result = exec(cached_plan, values[], nulls[], &isnull) .... some_like_move_result_to_variable(address_of_a) Pavel > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >