Thread: Trouble with massive select statement.

Trouble with massive select statement.

From

Darren Greer

Date:

21 June 1999, 17:52:44

Hello all, first a bit of information.

I have the following tables:
listeners (username text, first_name text, last_name text, email text, station
char(10) );  
listeners_data (username text, signup_date, zip_text text); 
testmaster (username text, test_date date, station char(10) );

Now what I need to do is select all the users who have taken a test (for a
particular station) within, say the last so many number of days.  The following
select statement does that.  However I need the following select statement to
work inside of another select statement.  

select distinct username, station from testmaster t1 where 1 < (    select count(t2.username)         from testmaster
t2       where t2.test_date > '05-14-1999'         and t2.station = 'WZZZ'         and t1.username = t2.username    )
 
;

I already have this select statement:select first_name, email from listeners l, listeners_data ld where l.username =
ld.usernamea
 
nd l.station = 'XXXX';
Now this select statement must only grab data if l.username exists in the data
retrieved from the first select statement I listed.  This seems awefully
convoluded to me, and I expect on a large table, this could take a long time. 
Therefore, I am open to any thoughts on a better way to do this.  Please let me
know if you need any more information and I will be glad to provide it.

Darren

Re: [SQL] Trouble with massive select statement.

From

Tom Lane

Date:

21 June 1999, 19:30:35

Darren Greer <dgreer@websightsolutions.com> writes:
> Now what I need to do is select all the users who have taken a test
> (for a particular station) within, say the last so many number of
> days.  The following select statement does that.

> select distinct username, station 
>     from testmaster t1 
>     where 1 < (
>         select count(t2.username) 
>             from testmaster t2
>             where t2.test_date > '05-14-1999' 
>             and t2.station = 'WZZZ' 
>             and t1.username = t2.username
>         )
> ;

The above doesn't seem to me to do what you claim you want to do ---
it seems to be looking for users who have taken the test *more than
once* in the given interval.

Assuming that the code is right and the comment wrong ;-), I agree
that this is the hard way to do it.  The inner select will be
re-evaluated from scratch for every tuple scanned by the outer select,
so your run time is proportional to the square of the number of tuples
in testmaster.  Not good.  I think you want to use a HAVING clause:

SELECT username, station FROM testmaster WHERE test_date > '05-14-1999' and station = 'WZZZ'  GROUP BY username,
stationHAVING count(*) > 1;

The WHERE selects only the tuples you care about, the GROUP BY collects
them into groups with the same username & station (thus serving the
purpose you were using DISTINCT for; you don't need DISTINCT with GROUP
BY); and finally the HAVING selects only the groups you care about, ie
those containing more than one tuple.  (When you use GROUP BY, count(*)
and other aggregates only aggregate over a group, not the whole table.)
        regards, tom lane

Re: [SQL] Trouble with massive select statement.

From

Darren Greer

Date:

22 June 1999, 09:05:33

See below

> > select distinct username, station 
> >     from testmaster t1 
> >     where 1 < (
> >         select count(t2.username) 
> >             from testmaster t2
> >             where t2.test_date > '05-14-1999' 
> >             and t2.station = 'WZZZ' 
> >             and t1.username = t2.username
> >         )
> > ;
> 
> The above doesn't seem to me to do what you claim you want to do ---
> it seems to be looking for users who have taken the test *more than
> once* in the given interval.
That is correct......my mistake :)

> Assuming that the code is right and the comment wrong ;-), I agree
Good assumption :)

> SELECT username, station FROM testmaster
>   WHERE test_date > '05-14-1999' and station = 'WZZZ' 
>   GROUP BY username, station
>   HAVING count(*) > 1;
This works wonderfully, thanks.  Now comes the other twist.  I already have a
select statement (select first_name, email from listeners l, listeners_data ld
where l.username = ld.username a nd $filter;).  The $filter is a bunch of and
statements that limit the data coming from the table listeners.  I need to get
the select statement you told me to work as a filter on the data that this
select statement grabs.  I cant see an easy way to do this without making this
thing incredibly slow having to chech every user.  Any thoughts?

Darren

Re: [SQL] Trouble with massive select statement.

From

Tom Lane

Date:

22 June 1999, 11:26:14

Darren Greer <dgreer@websightsolutions.com> writes:
> Now comes the other twist.  I already have a select statement (select
> first_name, email from listeners l, listeners_data ld where l.username
> = ld.username and $filter;).  The $filter is a bunch of and
> statements that limit the data coming from the table listeners.  I
> need to get the select statement you told me to work as a filter on
> the data that this select statement grabs.  I cant see an easy way to
> do this without making this thing incredibly slow having to chech
> every user.  Any thoughts?

This isn't very clear to me, but a couple of thoughts:

1. Rather than a subselect that gets re-executed for every tuple,
consider a temporary table:SELECT * FROM my_table INTO temp_table WHERE .... ;SELECT * FROM temp_table WHERE ... ;DROP
TABLEtemp_table ;
 
This is currently the only way to achieve results that require multiple
levels of grouping.

2. The system knows how to exploit indexes to avoid scanning all of a
table, if you have WHERE conditions of the right form.  For example,
given an index on test_date, "WHERE test_date > '05-14-1999'" won't
bother to scan tuples older than the specified date, and a two-way
constraint likeWHERE test_date > '05-14-1999' AND test_date < '05-21-1999'
is even more effective.  So, look at the filter conditions you expect
to use and consider making an index or two.  (Don't go overboard making
indexes, since you pay for them in extra work in table updates...)
        regards, tom lane