Thread: two queryes in a single tablescan

two queryes in a single tablescan

From

"Stefano Dal Pra"

Date:

17 October 2007, 09:37:22

Hi everybody,

suppose you have a large table tab and two (or more) queryes like this:

SELECT count(*),A FROM tab WHERE C GROUP BY A;
SELECT count(*),B FROM tab WHERE C GROUP BY B;

is there any way to get both results in a single query,
eventually through stored procedure?
The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
on a single table, of course.

The main goal would be to get multiple results while scanning the
table[s] once only
thus getting results in a faster  way.

This seems to me quite a common situation but i have no clue whether a neat
solution can be implemented through stored procedure.

Any hint?

Thank you

Stefano

Re: two queryes in a single tablescan

From

Heikki Linnakangas

Date:

17 October 2007, 10:02:21

Stefano Dal Pra wrote:
> suppose you have a large table tab and two (or more) queryes like this:
>
> SELECT count(*),A FROM tab WHERE C GROUP BY A;
> SELECT count(*),B FROM tab WHERE C GROUP BY B;
>
> is there any way to get both results in a single query,
> eventually through stored procedure?
> The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
> on a single table, of course.
>
> The main goal would be to get multiple results while scanning the
> table[s] once only
> thus getting results in a faster  way.
>
> This seems to me quite a common situation but i have no clue whether a neat
> solution can be implemented through stored procedure.

With a temp table:

CREATE TEMPORARY TABLE tmp AS SELECT COUNT(*) as rows, a,b FROM WHERE C
GROUP BY a,b;
SELECT SUM(rows), a FROM tmp GROUP BY a;
SELECT SUM(rows), b FROM tmp GROUP BY b;
DROP TABLE tmp;

(Using temp tables in plpgsql procedures doesn't quite work until 8.3.
But you can use dynamic EXECUTE as a work-around. There used to be a FAQ
entry about that, but apparently it's been removed because the problem
has been fixed in the upcoming release.)

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: two queryes in a single tablescan

From

"Steinar H. Gunderson"

Date:

17 October 2007, 10:15:37

On Wed, Oct 17, 2007 at 02:30:52PM +0200, Stefano Dal Pra wrote:
> The main goal would be to get multiple results while scanning the
> table[s] once only
> thus getting results in a faster  way.

In 8.3, Postgres will do this for you itself -- if you already have a
sequential scan running against a given table, another one starting in
parallel will simply piggyback it.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: two queryes in a single tablescan

From

"Stefano Dal Pra"

Date:

17 October 2007, 10:22:01

On 10/17/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
> Stefano Dal Pra wrote:
> > suppose you have a large table tab and two (or more) queryes like this:
> >
> > SELECT count(*),A FROM tab WHERE C GROUP BY A;
> > SELECT count(*),B FROM tab WHERE C GROUP BY B;
> >
> > is there any way to get both results in a single query,
> > eventually through stored procedure?
> > The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
> > on a single table, of course.
> >
> > The main goal would be to get multiple results while scanning the
> > table[s] once only
> > thus getting results in a faster  way.
> >
> > This seems to me quite a common situation but i have no clue whether a neat
> > solution can be implemented through stored procedure.
>
> With a temp table:
>
> CREATE TEMPORARY TABLE tmp AS SELECT COUNT(*) as rows, a,b FROM WHERE C
> GROUP BY a,b;
> SELECT SUM(rows), a FROM tmp GROUP BY a;
> SELECT SUM(rows), b FROM tmp GROUP BY b;
> DROP TABLE tmp;
>

Thank You.

I actually already do something like that:
in a stored procedure i do create a md5 hash using passed parameters
converted to TEXT
and get a unix_like timestamp using now()::abstime::integer.
This gets me a string like: 9ffeb60e9e6581726f7f5027b42c7942_1192443215
which i do use to
EXECUTE
 CREATE TABLE 9ffeb60e9e6581726f7f5027b42c7942_1192443215 AS
    SELECT * FROM
getjd('''||param1||''','''||param2||''','||param3||','||param4||')'


The 9ffeb60e9e6581726f7f5027b42c7942_1192443215 is what i called 'tab'
in my first post,
and i need to perform about 7 queryes on that. (after a while i will
drop the table using the timestamp part of the name, but that's
another point).

Here is where i would like to scan once only that table. Depending on
parameters it may get as big as 50Mb (this actually is the tablespace
size growth) or more with about 10^6 tuples.

 Stefano


> (Using temp tables in plpgsql procedures doesn't quite work until 8.3.
> But you can use dynamic EXECUTE as a work-around. There used to be a FAQ
> entry about that, but apparently it's been removed because the problem
> has been fixed in the upcoming release.)
>
> --
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
>

Re: two queryes in a single tablescan

From

Heikki Linnakangas

Date:

17 October 2007, 10:41:03

Steinar H. Gunderson wrote:
> On Wed, Oct 17, 2007 at 02:30:52PM +0200, Stefano Dal Pra wrote:
>> The main goal would be to get multiple results while scanning the
>> table[s] once only
>> thus getting results in a faster  way.
>
> In 8.3, Postgres will do this for you itself -- if you already have a
> sequential scan running against a given table, another one starting in
> parallel will simply piggyback it.

You'd have to run the seq scans at the same time, from two different
backends, so it's not going to help here.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: two queryes in a single tablescan

From

李彦 Ian Li

Date:

17 October 2007, 23:32:05

I remember when I was using SQL server we did like like that:

SELECT count(CASE WHEN A THEN 1 END) AS cnt_a, count(CASE WHEN B
THEN 1 END) AS cnt_b FROM tab WHERE C;

I did a little test with pg_bench data, also works in PostgreSQL:

test=# select count(*) from history where tid = 1;
  count
-------
    574
(1 行)

时间: 9.553 ms
test=# select count(*) from history where tid = 2;
  count
-------
   1107
(1 行)

时间: 8.949 ms
test=# select count(CASE WHEN tid = 1 then 1 END) as t1_cont,
count(case when tid=2 then 1 end) as t2_cnt from history ;
  t1_cont | t2_cnt
---------+--------
      574 |   1107
(1 行)

时间: 17.182 ms

Hope that helps.

Regards

Stefano Dal Pra wrote:
> Hi everybody,
>
> suppose you have a large table tab and two (or more) queryes like this:
>
> SELECT count(*),A FROM tab WHERE C GROUP BY A;
> SELECT count(*),B FROM tab WHERE C GROUP BY B;
>
> is there any way to get both results in a single query,
> eventually through stored procedure?
> The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
> on a single table, of course.
>
> The main goal would be to get multiple results while scanning the
> table[s] once only
> thus getting results in a faster  way.
>
> This seems to me quite a common situation but i have no clue whether a neat
> solution can be implemented through stored procedure.
>
> Any hint?
>
> Thank you
>
> Stefano

Re: [SQL] two queryes in a single tablescan

From

Markus Schaber

Date:

20 October 2007, 13:31:47

Hi, Stefano,

"Stefano Dal Pra" <s.dalpra@gmail.com> wrote:

> suppose you have a large table tab and two (or more) queryes like this:
>
> SELECT count(*),A FROM tab WHERE C GROUP BY A;
> SELECT count(*),B FROM tab WHERE C GROUP BY B;
>
> is there any way to get both results in a single query,
> eventually through stored procedure?
> The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
> on a single table, of course.
>
> The main goal would be to get multiple results while scanning the
> table[s] once only
> thus getting results in a faster  way.

PostgreSQL 8.3 contains great improvements in this area, you can simply
start the selects from concurrent connections, and the backend will
synchronize the scans.



Regards,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: [SQL] two queryes in a single tablescan

From

Andreas Kretschmer

Date:

20 October 2007, 14:14:02

Markus Schaber <schabi@logix-tt.com> schrieb:
> > is there any way to get both results in a single query,
> > eventually through stored procedure?
> > The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
> > on a single table, of course.
> >
> > The main goal would be to get multiple results while scanning the
> > table[s] once only
> > thus getting results in a faster  way.
>
> PostgreSQL 8.3 contains great improvements in this area, you can simply
> start the selects from concurrent connections, and the backend will
> synchronize the scans.

works this right across different transactions? I mean, for instance, TX
a insert rows and TX b insert other rows and both clients (with
different transactions) starts a seq-scan?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: [SQL] two queryes in a single tablescan

From

Erik Jones

Date:

20 October 2007, 19:02:33

On Oct 20, 2007, at 12:19 PM, Andreas Kretschmer wrote:

> Markus Schaber <schabi@logix-tt.com> schrieb:
>>> is there any way to get both results in a single query,
>>> eventually through stored procedure?
>>> The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
>>> on a single table, of course.
>>>
>>> The main goal would be to get multiple results while scanning the
>>> table[s] once only
>>> thus getting results in a faster  way.
>>
>> PostgreSQL 8.3 contains great improvements in this area, you can
>> simply
>> start the selects from concurrent connections, and the backend will
>> synchronize the scans.
>
> works this right across different transactions? I mean, for
> instance, TX
> a insert rows and TX b insert other rows and both clients (with
> different transactions) starts a seq-scan?

If you are in read-committed mode and both backends start their scans
after the other has made its insert, then yes.  Note Markus's point
that both queries must be initiated by concurrent connections.  Since
Postgres doesn't have any kind of shared transaction mechanism across
connections then this is inherent.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com