Thread: Expensive query
I have two tables, structured as: users: id varchar(70) not null unique events: userid varchar(70) not null, code char(10) not null, when datetime not null The query I need to perform answers the question, "Which users do NOT have a particular event (selected by code), and which do not have ANY event for the last day?" The query I use is: select id from users where id not in ( select unique id from events where code = 'some code' ) and id not in ( select unique id from events where age('now',when) <= '1 day' ); This query is *very* expensive. With 10,000 users and 40,000 events, it can take up to 20 minutes (!) to execute, and the postgres process grows to 40 megabytes (!!) of memory (on a Pentium II system running BSDI). This seems surprising, since each of the subqueries only needs to be evaluated once, rather than once per row of users. Is there a way to reformulate this query to make it less expensive?
On Fri, 30 Oct 1998, Christophe Pettus wrote: That users table looks kinda useless. I'd recommend changing id to name and making id an int, then putting your events users in as ints. Currently, it doesn't do anything at all, and you might as well be doing this query: select distinct id from events where code = 'Whatever' and age('now',when) <= ' 1 day'; I would also recommend not making code a char(10). If you put an index on code and an index on date, you should be able to get your results pretty quick. If you change the userid and the code to integers and do a three table join, it might be faster, but it would use *much* less disk space. # I have two tables, structured as: # # users: # # id varchar(70) not null unique # # events: # # userid varchar(70) not null, # code char(10) not null, # when datetime not null # # The query I need to perform answers the question, "Which users do NOT # have a particular event (selected by code), and which do not have ANY # event for the last day?" The query I use is: # # select id from users # where # id not in ( # select unique id from events # where code = 'some code' # ) # and # id not in ( # select unique id from events # where age('now',when) <= '1 day' # ); # # This query is *very* expensive. With 10,000 users and 40,000 events, # it can take up to 20 minutes (!) to execute, and the postgres process # grows to 40 megabytes (!!) of memory (on a Pentium II system running # BSDI). This seems surprising, since each of the subqueries only needs # to be evaluated once, rather than once per row of users. Is there a # way to reformulate this query to make it less expensive? # # -- SA, beyond.com My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
At 02:13 PM 10/30/98 , you wrote: > That users table looks kinda useless. I'd recommend changing id >to name and making id an int, then putting your events users in as ints. >Currently, it doesn't do anything at all, and you might as well be doing >this query [...] Actually, the real-life 'users' table has lots of fields, none of them relevant to the question at hand. > I would also recommend not making code a char(10). If you put an >index on code and an index on date, you should be able to get your results >pretty quick. They are indexed. Using ints is not unreasonable, but does not explain the pathological behavior.
On Fri, 30 Oct 1998, Christophe Pettus wrote: // At 02:13 PM 10/30/98 , you wrote: // > That users table looks kinda useless. I'd recommend changing id // >to name and making id an int, then putting your events users in as ints. // >Currently, it doesn't do anything at all, and you might as well be doing // >this query [...] // // Actually, the real-life 'users' table has lots of fields, none of // them relevant to the question at hand. Right, the point was the table you were using in the query the way you were using it in the query wouldn't do anything but slow down the query at best. You were doing three queries with no chance of the optimizer doing its job. Also using ``is not in'' on those search results just sounds like it'd be horribly slow when you could just ask the whole thing in one query. -- SA, beyond.com The world is watching America, pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L________________________________________ and America is watching TV. __
At 0:13 +0200 on 31/10/98, Dustin Sallings wrote: > > That users table looks kinda useless. I'd recommend changing id > to name and making id an int, then putting your events users in as ints. > Currently, it doesn't do anything at all, and you might as well be doing > this query: > > select distinct id from events where code = 'Whatever' > and age('now',when) <= ' 1 day'; > > I would also recommend not making code a char(10). If you put an > index on code and an index on date, you should be able to get your results > pretty quick. If you change the userid and the code to integers and do a > three table join, it might be faster, but it would use *much* less disk > space. I don't get your solution. Doesn't it return the exact opposite of what is needed? He needed the users which did NOT have code 'Whatever' and also didn't have an event the last day - and your query asks which users have a 'Whatever' event less than a day ago... If what you intended is to put this query into a NOT IN subquery, then it's still not what was required, because we didn't want to exclude all 'Whatever' events of the last day. We wanted to exclude both 'Whatever' events and events of the last day. No, it's either two subqueries or a subquery with an OR, which is not optimized. Perhaps changing to a NOT EXISTS rather than a NOT IN query would help: select id from users where NOT EXISTS ( select * from events where userid = id and code = 'some code' ) and NOT EXISTS ( select * from events where userid = id and age('now',when) <= '1 day' ); Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma