Thread: NOT IN query takes forever

NOT IN query takes forever

From
Marius Andreiana
Date:
Hi

I have 2 tables like this:
CREATE TABLE query (
    query_id     int not null,
    dat     varchar(64)  null ,
    sub_acc_id     int  null ,
    query_ip     varchar(64)  null ,
    osd_user_type     varchar(64)  null
)
;

CREATE TABLE trans (
    transaction_id     varchar(64)  not null ,
    date     varchar(64)  null ,
    query_id     int not  null ,
    sub_acc_id     int  null ,
    reg_acc_id     int  null
)
;

CREATE UNIQUE INDEX query_query_id_idx
ON query (query_id)
;

CREATE INDEX trans_reg_acc_id_idx
ON trans (reg_acc_id)
;

CREATE INDEX trans_query_id_idx
ON trans(query_id)
;
osd=> select count(*) from trans
osd-> ;
 count
--------
 598809
(1 row)

osd=>
osd=> select count(*) from query
osd-> ;
 count
--------
 137042
(1 row)

I just vacuum analyse'd the database.

Trying to run this query:
EXPLAIN ANALYSE
select * FROM trans
WHERE query_id NOT IN (select query_id FROM query)

but it will remain like that forever (cancelled after 30 min).

My postgresql.conf is the default:
# - Memory -

shared_buffers = 1000           # min 16, at least max_connections*2,
8KB each
#sort_mem = 1024                # min 64, size in KB
#vacuum_mem = 8192              # min 1024, size in KB

Should I adjust something?

Using postgresql 7.4.2, saw in release notes that IN/NOT IN queries are
at least as faster than EXISTS.

Thank you!
--
Marius Andreiana
Galuna - Solutii Linux in Romania
http://www.galuna.ro


Re: NOT IN query takes forever

From
"Merlin Moncure"
Date:
> Trying to run this query:
> EXPLAIN ANALYSE
> select * FROM trans
> WHERE query_id NOT IN (select query_id FROM query)
>
> but it will remain like that forever (cancelled after 30 min).

explain analyze actually runs the query to do timings.  Just run explain
and see what you come up with.  More than likely there is a nestloop in
there which is causing the long query time.

Try bumping up shared buffers some and sort mem as much as you safely
can.

Merlin

Re: NOT IN query takes forever

From
Stephan Szabo
Date:
On Tue, 3 Aug 2004, Marius Andreiana wrote:

> I just vacuum analyse'd the database.
>
> Trying to run this query:
> EXPLAIN ANALYSE
> select * FROM trans
> WHERE query_id NOT IN (select query_id FROM query)
>
> but it will remain like that forever (cancelled after 30 min).
>
> My postgresql.conf is the default:
> # - Memory -
>
> shared_buffers = 1000           # min 16, at least max_connections*2,
> 8KB each
> #sort_mem = 1024                # min 64, size in KB
> #vacuum_mem = 8192              # min 1024, size in KB
>
> Should I adjust something?

Probably sort_mem.  It's probably estimating that it can't hash the result
into the 1MB of sort_mem so it's probably falling back to some sort of
nested execution.


Re: NOT IN query takes forever

From
Tom Lane
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> Try bumping up shared buffers some and sort mem as much as you safely
> can.

sort_mem is probably the issue here.  The only reasonable way to do NOT
IN is with a hash table, and the default setting of sort_mem is probably
too small to support a 137042-element table.

            regards, tom lane

Re: NOT IN query takes forever

From
Marius Andreiana
Date:
On Tue, 2004-08-03 at 08:05 -0400, Merlin Moncure wrote:
> > Trying to run this query:
> > EXPLAIN ANALYSE
> > select * FROM trans
> > WHERE query_id NOT IN (select query_id FROM query)
> >
> > but it will remain like that forever (cancelled after 30 min).
>
> explain analyze actually runs the query to do timings.  Just run explain
> and see what you come up with.  More than likely there is a nestloop in
> there which is causing the long query time.
>
> Try bumping up shared buffers some and sort mem as much as you safely
> can.
Thank you, that did it!

With
shared_buffers = 3000        # min 16, at least max_connections*2, 8KB each
sort_mem = 128000        # min 64, size in KB

it takes <3 seconds (my hardware is not server-class).

--
Marius Andreiana
Galuna - Solutii Linux in Romania
http://www.galuna.ro


Re: NOT IN query takes forever

From
"Merlin Moncure"
Date:
> > Try bumping up shared buffers some and sort mem as much as you
safely
> > can.
> Thank you, that did it!
>
> With
> shared_buffers = 3000        # min 16, at least max_connections*2,
8KB
> each
> sort_mem = 128000        # min 64, size in KB
>
> it takes <3 seconds (my hardware is not server-class).

Be careful...sort_mem applies to each connection and (IIRC) in some
cases more than once to a connection.  Of all the configuration
parameters, sort_mem (IMO) is the most important and the hardest to get
right.  128k (or 128MB) is awfully high unless you have a ton of memory
(you don't) or you are running in single connection scenarios.  Do some
experimentation by lowering the value until you get a good balance
between potential memory consumption and speed.

Merlin

Re: NOT IN query takes forever

From
"Scott Marlowe"
Date:
On Tue, 2004-08-03 at 10:10, Merlin Moncure wrote:
> > > Try bumping up shared buffers some and sort mem as much as you
> safely
> > > can.
> > Thank you, that did it!
> >
> > With
> > shared_buffers = 3000        # min 16, at least max_connections*2,
> 8KB
> > each
> > sort_mem = 128000        # min 64, size in KB
> >
> > it takes <3 seconds (my hardware is not server-class).
>
> Be careful...sort_mem applies to each connection and (IIRC) in some
> cases more than once to a connection.  Of all the configuration
> parameters, sort_mem (IMO) is the most important and the hardest to get
> right.  128k (or 128MB) is awfully high unless you have a ton of memory
> (you don't) or you are running in single connection scenarios.  Do some
> experimentation by lowering the value until you get a good balance
> between potential memory consumption and speed.

Minor nit, sort_mem actually applies to EACH sort individually, so a
query that had to run three sorts could use 3 x sort_mem.

Note that one can set sort_mem per backend connection with set
sort_mem=128000 if need be so as not to use up all the memory with other
backends.


Re: NOT IN query takes forever

From
Gaetano Mendola
Date:
Marius Andreiana wrote:

> On Tue, 2004-08-03 at 08:05 -0400, Merlin Moncure wrote:
>
>>>Trying to run this query:
>>>EXPLAIN ANALYSE
>>>select * FROM trans
>>>WHERE query_id NOT IN (select query_id FROM query)
>>>
>>>but it will remain like that forever (cancelled after 30 min).
>>
>>explain analyze actually runs the query to do timings.  Just run explain
>>and see what you come up with.  More than likely there is a nestloop in
>>there which is causing the long query time.
>>
>>Try bumping up shared buffers some and sort mem as much as you safely
>>can.
>
> Thank you, that did it!
>
> With
> shared_buffers = 3000        # min 16, at least max_connections*2, 8KB each
> sort_mem = 128000        # min 64, size in KB

128 MB for sort_mem is too much, consider that in this way each backend can
use 128 MB for sort operations...
Also shared_buffers = 3000 means 24MB  that is not balanced with the 128MB
needed for sort...
Try to bump up 128 MB for shared_buffer ( may be you need to instruct your
OS to allow that ammount of shared memory usage ) and 24MB for sort_mem.



Regards
Gaetano Mendola














Re: NOT IN query takes forever

From
Christopher Kings-Lynne
Date:
> explain analyze actually runs the query to do timings.  Just run explain
> and see what you come up with.  More than likely there is a nestloop in
> there which is causing the long query time.
>
> Try bumping up shared buffers some and sort mem as much as you safely
> can.

Just use an EXISTS query I suggest.

Chris