Thread: NOT IN query takes forever
Hi I have 2 tables like this: CREATE TABLE query ( query_id int not null, dat varchar(64) null , sub_acc_id int null , query_ip varchar(64) null , osd_user_type varchar(64) null ) ; CREATE TABLE trans ( transaction_id varchar(64) not null , date varchar(64) null , query_id int not null , sub_acc_id int null , reg_acc_id int null ) ; CREATE UNIQUE INDEX query_query_id_idx ON query (query_id) ; CREATE INDEX trans_reg_acc_id_idx ON trans (reg_acc_id) ; CREATE INDEX trans_query_id_idx ON trans(query_id) ; osd=> select count(*) from trans osd-> ; count -------- 598809 (1 row) osd=> osd=> select count(*) from query osd-> ; count -------- 137042 (1 row) I just vacuum analyse'd the database. Trying to run this query: EXPLAIN ANALYSE select * FROM trans WHERE query_id NOT IN (select query_id FROM query) but it will remain like that forever (cancelled after 30 min). My postgresql.conf is the default: # - Memory - shared_buffers = 1000 # min 16, at least max_connections*2, 8KB each #sort_mem = 1024 # min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB Should I adjust something? Using postgresql 7.4.2, saw in release notes that IN/NOT IN queries are at least as faster than EXISTS. Thank you! -- Marius Andreiana Galuna - Solutii Linux in Romania http://www.galuna.ro
> Trying to run this query: > EXPLAIN ANALYSE > select * FROM trans > WHERE query_id NOT IN (select query_id FROM query) > > but it will remain like that forever (cancelled after 30 min). explain analyze actually runs the query to do timings. Just run explain and see what you come up with. More than likely there is a nestloop in there which is causing the long query time. Try bumping up shared buffers some and sort mem as much as you safely can. Merlin
On Tue, 3 Aug 2004, Marius Andreiana wrote: > I just vacuum analyse'd the database. > > Trying to run this query: > EXPLAIN ANALYSE > select * FROM trans > WHERE query_id NOT IN (select query_id FROM query) > > but it will remain like that forever (cancelled after 30 min). > > My postgresql.conf is the default: > # - Memory - > > shared_buffers = 1000 # min 16, at least max_connections*2, > 8KB each > #sort_mem = 1024 # min 64, size in KB > #vacuum_mem = 8192 # min 1024, size in KB > > Should I adjust something? Probably sort_mem. It's probably estimating that it can't hash the result into the 1MB of sort_mem so it's probably falling back to some sort of nested execution.
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Try bumping up shared buffers some and sort mem as much as you safely > can. sort_mem is probably the issue here. The only reasonable way to do NOT IN is with a hash table, and the default setting of sort_mem is probably too small to support a 137042-element table. regards, tom lane
On Tue, 2004-08-03 at 08:05 -0400, Merlin Moncure wrote: > > Trying to run this query: > > EXPLAIN ANALYSE > > select * FROM trans > > WHERE query_id NOT IN (select query_id FROM query) > > > > but it will remain like that forever (cancelled after 30 min). > > explain analyze actually runs the query to do timings. Just run explain > and see what you come up with. More than likely there is a nestloop in > there which is causing the long query time. > > Try bumping up shared buffers some and sort mem as much as you safely > can. Thank you, that did it! With shared_buffers = 3000 # min 16, at least max_connections*2, 8KB each sort_mem = 128000 # min 64, size in KB it takes <3 seconds (my hardware is not server-class). -- Marius Andreiana Galuna - Solutii Linux in Romania http://www.galuna.ro
> > Try bumping up shared buffers some and sort mem as much as you safely > > can. > Thank you, that did it! > > With > shared_buffers = 3000 # min 16, at least max_connections*2, 8KB > each > sort_mem = 128000 # min 64, size in KB > > it takes <3 seconds (my hardware is not server-class). Be careful...sort_mem applies to each connection and (IIRC) in some cases more than once to a connection. Of all the configuration parameters, sort_mem (IMO) is the most important and the hardest to get right. 128k (or 128MB) is awfully high unless you have a ton of memory (you don't) or you are running in single connection scenarios. Do some experimentation by lowering the value until you get a good balance between potential memory consumption and speed. Merlin
On Tue, 2004-08-03 at 10:10, Merlin Moncure wrote: > > > Try bumping up shared buffers some and sort mem as much as you > safely > > > can. > > Thank you, that did it! > > > > With > > shared_buffers = 3000 # min 16, at least max_connections*2, > 8KB > > each > > sort_mem = 128000 # min 64, size in KB > > > > it takes <3 seconds (my hardware is not server-class). > > Be careful...sort_mem applies to each connection and (IIRC) in some > cases more than once to a connection. Of all the configuration > parameters, sort_mem (IMO) is the most important and the hardest to get > right. 128k (or 128MB) is awfully high unless you have a ton of memory > (you don't) or you are running in single connection scenarios. Do some > experimentation by lowering the value until you get a good balance > between potential memory consumption and speed. Minor nit, sort_mem actually applies to EACH sort individually, so a query that had to run three sorts could use 3 x sort_mem. Note that one can set sort_mem per backend connection with set sort_mem=128000 if need be so as not to use up all the memory with other backends.
Marius Andreiana wrote: > On Tue, 2004-08-03 at 08:05 -0400, Merlin Moncure wrote: > >>>Trying to run this query: >>>EXPLAIN ANALYSE >>>select * FROM trans >>>WHERE query_id NOT IN (select query_id FROM query) >>> >>>but it will remain like that forever (cancelled after 30 min). >> >>explain analyze actually runs the query to do timings. Just run explain >>and see what you come up with. More than likely there is a nestloop in >>there which is causing the long query time. >> >>Try bumping up shared buffers some and sort mem as much as you safely >>can. > > Thank you, that did it! > > With > shared_buffers = 3000 # min 16, at least max_connections*2, 8KB each > sort_mem = 128000 # min 64, size in KB 128 MB for sort_mem is too much, consider that in this way each backend can use 128 MB for sort operations... Also shared_buffers = 3000 means 24MB that is not balanced with the 128MB needed for sort... Try to bump up 128 MB for shared_buffer ( may be you need to instruct your OS to allow that ammount of shared memory usage ) and 24MB for sort_mem. Regards Gaetano Mendola
> explain analyze actually runs the query to do timings. Just run explain > and see what you come up with. More than likely there is a nestloop in > there which is causing the long query time. > > Try bumping up shared buffers some and sort mem as much as you safely > can. Just use an EXISTS query I suggest. Chris