Thread: efficient query help

efficient query help

From
Laurette Cisneros
Date:
Hi all,

I have crafted the following query which returns the results I want but
runs extremely slow (I'm sure it's the "not in" part that does it) since
the tables can contain large amounts of data.

Is there a better, more efficient way to build this query?  

It's important that I return zero counts for id flag combinations that do not 
have an entry in the log table hence the union with the "not in":

id is a text field and flgtime is a timestamp.  flags is a table that contains 
a single column with a row for each unique flag (text).

select id, flag, count(*) as count from logwhere date(flgtime) = '2002-04-16' and flag is not null
group by id, flag
union
select distinct l.id, f.flag, 0 as count from log l, flags fwhere (l.id, f.flag) not in      (select id, flag
fromlog        where date(flgtime) = '2002-04-16'          and fag is not null)
 
group by l.id, f.flag
;

Thanks for any suggestions.

-- 
Laurette Cisneros
Database Roadie
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Where's my....bus?




Re: efficient query help

From
Clinton Adams
Date:
> I have crafted the following query which returns the results I want but
> runs extremely slow (I'm sure it's the "not in" part that does it) since
> the tables can contain large amounts of data.
>
> Is there a better, more efficient way to build this query?

Why not try NOT EXISTS? I find that NOT IN bogs down on even simple queries.


Re: efficient query help

From
"Joel Burton"
Date:
Hmmm... Is this better? Please let me know; it looks better with a small set
of sample data, but I'd be curious how it does with your real table:

select id,      flag,      sum(count)

from   ( select id,               flag,               1 as count        from log        where flgtime='2002-04-16'
        union all
        select id,               f.flag,               0 as count        from log l,             flags f
   /* don't think this is neccessary    where flgtime <> '2002-04-16'  */
      ) as t

where flag is not null

group by id, flag;


[ for others: the point of the query is to count the number of unique
id/flag combos on 4/16/2002, and union this with all possible combos of
ids/flags that aren't present on non-4/16/2002 ]


Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
Knowledge Management & Technology Consultant

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Laurette Cisneros
> Sent: Monday, April 22, 2002 2:11 PM
> To: pgsql-sql@postgresql.org
> Subject: [SQL] efficient query help
>
>
>
> Hi all,
>
> I have crafted the following query which returns the results I want but
> runs extremely slow (I'm sure it's the "not in" part that does it) since
> the tables can contain large amounts of data.
>
> Is there a better, more efficient way to build this query?
>
> It's important that I return zero counts for id flag combinations
> that do not
> have an entry in the log table hence the union with the "not in":
>
> id is a text field and flgtime is a timestamp.  flags is a table
> that contains
> a single column with a row for each unique flag (text).
>
> select id, flag, count(*) as count
>   from log
>  where date(flgtime) = '2002-04-16'
>   and flag is not null
> group by id, flag
> union
> select distinct l.id, f.flag, 0 as count
>   from log l, flags f
>  where (l.id, f.flag) not in
>        (select id, flag
>           from log
>          where date(flgtime) = '2002-04-16'
>            and fag is not null)
> group by l.id, f.flag
> ;
>
> Thanks for any suggestions.
>
> --
> Laurette Cisneros
> Database Roadie
> (510) 420-3137
> NextBus Information Systems, Inc.
> www.nextbus.com
> Where's my....bus?
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: efficient query help

From
"Joel Burton"
Date:
P.S. It might be faster if your change the second half of the inner query to
"SELECT DISTINCT ON(id, f.flag)" and case the numbers to int2 (assuming your
data won't overflow this); this improves things about 20% with my dataset
(flags n=26, log n=896).

By my tests, this is about 4-5x faster than your original query.

Is there anything better than this?


Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
Knowledge Management & Technology Consultant

> -----Original Message-----
> From: Joel Burton [mailto:joel@joelburton.com]
> Sent: Wednesday, April 24, 2002 10:06 AM
> To: Laurette Cisneros; Pgsql-Sql@Postgresql. Org
> Subject: RE: [SQL] efficient query help
>
>
> Hmmm... Is this better? Please let me know; it looks better with
> a small set of sample data, but I'd be curious how it does with
> your real table:
>
> select id,
>        flag,
>        sum(count)
>
> from   ( select id,
>                 flag,
>                 1::int2 as count                <- change here
>          from log
>          where flgtime='2002-04-16'
>          and flag is not null                   <- change here
>
>          union all
>
>          select DISTINCT ON (id, f.flag) id,     <- change here
>                 f.flag,
>                 0::int2 as count                 <- change here
>          from log l,
>               flags f
>
>     /* don't think this is neccessary, but might make it faster:    where
flgtime <> '2002-04-16' and l.flag is not null  */
>
>        ) as t
>
> where flag is not null
>
> group by id, flag;
>
>
> [ for others: the point of the query is to count the number of
> unique id/flag combos on 4/16/2002, and union this with all
> possible combos of ids/flags that aren't present on non-4/16/2002 ]
>
>
> Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
> Knowledge Management & Technology Consultant
>
> > -----Original Message-----
> > From: pgsql-sql-owner@postgresql.org
> > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Laurette Cisneros
> > Sent: Monday, April 22, 2002 2:11 PM
> > To: pgsql-sql@postgresql.org
> > Subject: [SQL] efficient query help
> >
> >
> >
> > Hi all,
> >
> > I have crafted the following query which returns the results I want but
> > runs extremely slow (I'm sure it's the "not in" part that does it) since
> > the tables can contain large amounts of data.
> >
> > Is there a better, more efficient way to build this query?
> >
> > It's important that I return zero counts for id flag combinations
> > that do not
> > have an entry in the log table hence the union with the "not in":
> >
> > id is a text field and flgtime is a timestamp.  flags is a table
> > that contains
> > a single column with a row for each unique flag (text).
> >
> > select id, flag, count(*) as count
> >   from log
> >  where date(flgtime) = '2002-04-16'
> >   and flag is not null
> > group by id, flag
> > union
> > select distinct l.id, f.flag, 0 as count
> >   from log l, flags f
> >  where (l.id, f.flag) not in
> >        (select id, flag
> >           from log
> >          where date(flgtime) = '2002-04-16'
> >            and fag is not null)
> > group by l.id, f.flag
> > ;
> >
> > Thanks for any suggestions.
> >
> > --
> > Laurette Cisneros
> > Database Roadie
> > (510) 420-3137
> > NextBus Information Systems, Inc.
> > www.nextbus.com
> > Where's my....bus?
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >