Thread: two queryes in a single tablescan
Hi everybody, suppose you have a large table tab and two (or more) queryes like this: SELECT count(*),A FROM tab WHERE C GROUP BY A; SELECT count(*),B FROM tab WHERE C GROUP BY B; is there any way to get both results in a single query, eventually through stored procedure? The retrieved [count(*),A] ; [count(*),B)] data couldnt fit on a single table, of course. The main goal would be to get multiple results while scanning the table[s] once only thus getting results in a faster way. This seems to me quite a common situation but i have no clue whether a neat solution can be implemented through stored procedure. Any hint? Thank you Stefano
Stefano Dal Pra wrote: > suppose you have a large table tab and two (or more) queryes like this: > > SELECT count(*),A FROM tab WHERE C GROUP BY A; > SELECT count(*),B FROM tab WHERE C GROUP BY B; > > is there any way to get both results in a single query, > eventually through stored procedure? > The retrieved [count(*),A] ; [count(*),B)] data couldnt fit > on a single table, of course. > > The main goal would be to get multiple results while scanning the > table[s] once only > thus getting results in a faster way. > > This seems to me quite a common situation but i have no clue whether a neat > solution can be implemented through stored procedure. With a temp table: CREATE TEMPORARY TABLE tmp AS SELECT COUNT(*) as rows, a,b FROM WHERE C GROUP BY a,b; SELECT SUM(rows), a FROM tmp GROUP BY a; SELECT SUM(rows), b FROM tmp GROUP BY b; DROP TABLE tmp; (Using temp tables in plpgsql procedures doesn't quite work until 8.3. But you can use dynamic EXECUTE as a work-around. There used to be a FAQ entry about that, but apparently it's been removed because the problem has been fixed in the upcoming release.) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Oct 17, 2007 at 02:30:52PM +0200, Stefano Dal Pra wrote: > The main goal would be to get multiple results while scanning the > table[s] once only > thus getting results in a faster way. In 8.3, Postgres will do this for you itself -- if you already have a sequential scan running against a given table, another one starting in parallel will simply piggyback it. /* Steinar */ -- Homepage: http://www.sesse.net/
On 10/17/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Stefano Dal Pra wrote: > > suppose you have a large table tab and two (or more) queryes like this: > > > > SELECT count(*),A FROM tab WHERE C GROUP BY A; > > SELECT count(*),B FROM tab WHERE C GROUP BY B; > > > > is there any way to get both results in a single query, > > eventually through stored procedure? > > The retrieved [count(*),A] ; [count(*),B)] data couldnt fit > > on a single table, of course. > > > > The main goal would be to get multiple results while scanning the > > table[s] once only > > thus getting results in a faster way. > > > > This seems to me quite a common situation but i have no clue whether a neat > > solution can be implemented through stored procedure. > > With a temp table: > > CREATE TEMPORARY TABLE tmp AS SELECT COUNT(*) as rows, a,b FROM WHERE C > GROUP BY a,b; > SELECT SUM(rows), a FROM tmp GROUP BY a; > SELECT SUM(rows), b FROM tmp GROUP BY b; > DROP TABLE tmp; > Thank You. I actually already do something like that: in a stored procedure i do create a md5 hash using passed parameters converted to TEXT and get a unix_like timestamp using now()::abstime::integer. This gets me a string like: 9ffeb60e9e6581726f7f5027b42c7942_1192443215 which i do use to EXECUTE CREATE TABLE 9ffeb60e9e6581726f7f5027b42c7942_1192443215 AS SELECT * FROM getjd('''||param1||''','''||param2||''','||param3||','||param4||')' The 9ffeb60e9e6581726f7f5027b42c7942_1192443215 is what i called 'tab' in my first post, and i need to perform about 7 queryes on that. (after a while i will drop the table using the timestamp part of the name, but that's another point). Here is where i would like to scan once only that table. Depending on parameters it may get as big as 50Mb (this actually is the tablespace size growth) or more with about 10^6 tuples. Stefano > (Using temp tables in plpgsql procedures doesn't quite work until 8.3. > But you can use dynamic EXECUTE as a work-around. There used to be a FAQ > entry about that, but apparently it's been removed because the problem > has been fixed in the upcoming release.) > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com >
Steinar H. Gunderson wrote: > On Wed, Oct 17, 2007 at 02:30:52PM +0200, Stefano Dal Pra wrote: >> The main goal would be to get multiple results while scanning the >> table[s] once only >> thus getting results in a faster way. > > In 8.3, Postgres will do this for you itself -- if you already have a > sequential scan running against a given table, another one starting in > parallel will simply piggyback it. You'd have to run the seq scans at the same time, from two different backends, so it's not going to help here. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
I remember when I was using SQL server we did like like that: SELECT count(CASE WHEN A THEN 1 END) AS cnt_a, count(CASE WHEN B THEN 1 END) AS cnt_b FROM tab WHERE C; I did a little test with pg_bench data, also works in PostgreSQL: test=# select count(*) from history where tid = 1; count ------- 574 (1 行) 时间: 9.553 ms test=# select count(*) from history where tid = 2; count ------- 1107 (1 行) 时间: 8.949 ms test=# select count(CASE WHEN tid = 1 then 1 END) as t1_cont, count(case when tid=2 then 1 end) as t2_cnt from history ; t1_cont | t2_cnt ---------+-------- 574 | 1107 (1 行) 时间: 17.182 ms Hope that helps. Regards Stefano Dal Pra wrote: > Hi everybody, > > suppose you have a large table tab and two (or more) queryes like this: > > SELECT count(*),A FROM tab WHERE C GROUP BY A; > SELECT count(*),B FROM tab WHERE C GROUP BY B; > > is there any way to get both results in a single query, > eventually through stored procedure? > The retrieved [count(*),A] ; [count(*),B)] data couldnt fit > on a single table, of course. > > The main goal would be to get multiple results while scanning the > table[s] once only > thus getting results in a faster way. > > This seems to me quite a common situation but i have no clue whether a neat > solution can be implemented through stored procedure. > > Any hint? > > Thank you > > Stefano
Hi, Stefano, "Stefano Dal Pra" <s.dalpra@gmail.com> wrote: > suppose you have a large table tab and two (or more) queryes like this: > > SELECT count(*),A FROM tab WHERE C GROUP BY A; > SELECT count(*),B FROM tab WHERE C GROUP BY B; > > is there any way to get both results in a single query, > eventually through stored procedure? > The retrieved [count(*),A] ; [count(*),B)] data couldnt fit > on a single table, of course. > > The main goal would be to get multiple results while scanning the > table[s] once only > thus getting results in a faster way. PostgreSQL 8.3 contains great improvements in this area, you can simply start the selects from concurrent connections, and the backend will synchronize the scans. Regards, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Markus Schaber <schabi@logix-tt.com> schrieb: > > is there any way to get both results in a single query, > > eventually through stored procedure? > > The retrieved [count(*),A] ; [count(*),B)] data couldnt fit > > on a single table, of course. > > > > The main goal would be to get multiple results while scanning the > > table[s] once only > > thus getting results in a faster way. > > PostgreSQL 8.3 contains great improvements in this area, you can simply > start the selects from concurrent connections, and the backend will > synchronize the scans. works this right across different transactions? I mean, for instance, TX a insert rows and TX b insert other rows and both clients (with different transactions) starts a seq-scan? Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Oct 20, 2007, at 12:19 PM, Andreas Kretschmer wrote: > Markus Schaber <schabi@logix-tt.com> schrieb: >>> is there any way to get both results in a single query, >>> eventually through stored procedure? >>> The retrieved [count(*),A] ; [count(*),B)] data couldnt fit >>> on a single table, of course. >>> >>> The main goal would be to get multiple results while scanning the >>> table[s] once only >>> thus getting results in a faster way. >> >> PostgreSQL 8.3 contains great improvements in this area, you can >> simply >> start the selects from concurrent connections, and the backend will >> synchronize the scans. > > works this right across different transactions? I mean, for > instance, TX > a insert rows and TX b insert other rows and both clients (with > different transactions) starts a seq-scan? If you are in read-committed mode and both backends start their scans after the other has made its insert, then yes. Note Markus's point that both queries must be initiated by concurrent connections. Since Postgres doesn't have any kind of shared transaction mechanism across connections then this is inherent. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com