Thread: Trouble with massive select statement.
Hello all, first a bit of information. I have the following tables: listeners (username text, first_name text, last_name text, email text, station char(10) ); listeners_data (username text, signup_date, zip_text text); testmaster (username text, test_date date, station char(10) ); Now what I need to do is select all the users who have taken a test (for a particular station) within, say the last so many number of days. The following select statement does that. However I need the following select statement to work inside of another select statement. select distinct username, station from testmaster t1 where 1 < ( select count(t2.username) from testmaster t2 where t2.test_date > '05-14-1999' and t2.station = 'WZZZ' and t1.username = t2.username ) ; I already have this select statement:select first_name, email from listeners l, listeners_data ld where l.username = ld.usernamea nd l.station = 'XXXX'; Now this select statement must only grab data if l.username exists in the data retrieved from the first select statement I listed. This seems awefully convoluded to me, and I expect on a large table, this could take a long time. Therefore, I am open to any thoughts on a better way to do this. Please let me know if you need any more information and I will be glad to provide it. Darren
Darren Greer <dgreer@websightsolutions.com> writes: > Now what I need to do is select all the users who have taken a test > (for a particular station) within, say the last so many number of > days. The following select statement does that. > select distinct username, station > from testmaster t1 > where 1 < ( > select count(t2.username) > from testmaster t2 > where t2.test_date > '05-14-1999' > and t2.station = 'WZZZ' > and t1.username = t2.username > ) > ; The above doesn't seem to me to do what you claim you want to do --- it seems to be looking for users who have taken the test *more than once* in the given interval. Assuming that the code is right and the comment wrong ;-), I agree that this is the hard way to do it. The inner select will be re-evaluated from scratch for every tuple scanned by the outer select, so your run time is proportional to the square of the number of tuples in testmaster. Not good. I think you want to use a HAVING clause: SELECT username, station FROM testmaster WHERE test_date > '05-14-1999' and station = 'WZZZ' GROUP BY username, stationHAVING count(*) > 1; The WHERE selects only the tuples you care about, the GROUP BY collects them into groups with the same username & station (thus serving the purpose you were using DISTINCT for; you don't need DISTINCT with GROUP BY); and finally the HAVING selects only the groups you care about, ie those containing more than one tuple. (When you use GROUP BY, count(*) and other aggregates only aggregate over a group, not the whole table.) regards, tom lane
See below > > select distinct username, station > > from testmaster t1 > > where 1 < ( > > select count(t2.username) > > from testmaster t2 > > where t2.test_date > '05-14-1999' > > and t2.station = 'WZZZ' > > and t1.username = t2.username > > ) > > ; > > The above doesn't seem to me to do what you claim you want to do --- > it seems to be looking for users who have taken the test *more than > once* in the given interval. That is correct......my mistake :) > Assuming that the code is right and the comment wrong ;-), I agree Good assumption :) > SELECT username, station FROM testmaster > WHERE test_date > '05-14-1999' and station = 'WZZZ' > GROUP BY username, station > HAVING count(*) > 1; This works wonderfully, thanks. Now comes the other twist. I already have a select statement (select first_name, email from listeners l, listeners_data ld where l.username = ld.username a nd $filter;). The $filter is a bunch of and statements that limit the data coming from the table listeners. I need to get the select statement you told me to work as a filter on the data that this select statement grabs. I cant see an easy way to do this without making this thing incredibly slow having to chech every user. Any thoughts? Darren
Darren Greer <dgreer@websightsolutions.com> writes: > Now comes the other twist. I already have a select statement (select > first_name, email from listeners l, listeners_data ld where l.username > = ld.username and $filter;). The $filter is a bunch of and > statements that limit the data coming from the table listeners. I > need to get the select statement you told me to work as a filter on > the data that this select statement grabs. I cant see an easy way to > do this without making this thing incredibly slow having to chech > every user. Any thoughts? This isn't very clear to me, but a couple of thoughts: 1. Rather than a subselect that gets re-executed for every tuple, consider a temporary table:SELECT * FROM my_table INTO temp_table WHERE .... ;SELECT * FROM temp_table WHERE ... ;DROP TABLEtemp_table ; This is currently the only way to achieve results that require multiple levels of grouping. 2. The system knows how to exploit indexes to avoid scanning all of a table, if you have WHERE conditions of the right form. For example, given an index on test_date, "WHERE test_date > '05-14-1999'" won't bother to scan tuples older than the specified date, and a two-way constraint likeWHERE test_date > '05-14-1999' AND test_date < '05-21-1999' is even more effective. So, look at the filter conditions you expect to use and consider making an index or two. (Don't go overboard making indexes, since you pay for them in extra work in table updates...) regards, tom lane