Thread: Very slow updates when using IN syntax subselect

Very slow updates when using IN syntax subselect

From
Bryce Nesbitt
Date:
If I do:   select event_id from event join token using                   (token_number) where token_status=50 and
reconciled=false limit 1;
Then:   update event set reconciled=true where event_id={XXX};


It returns in about a second, or less.  But If I do the same thing with
the IN syntax:
   update event set reconciled=true where event_id in   (select event_id from event join token using (token_number)
wheretoken_status=50 and reconciled=false LIMIT 1);
 

On a 4 CPU machine, 2 CPU's peg at 100%, and the request just eats CPU
forever.  Any clues what might be going on?  Help would be much
appreciated.  I'm not seeing this on all my DB's... just the important ones.

PostgreSQL 7.4.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3
20030502 (Red Hat Linux 3.2.3-20)



Re: Very slow updates when using IN syntax subselect

From
Tom Lane
Date:
Bryce Nesbitt <bryce1@obviously.com> writes:
>     update event set reconciled=true where event_id in
>     (select event_id from event join token using (token_number)
>      where token_status=50 and reconciled=false LIMIT 1);

> On a 4 CPU machine, 2 CPU's peg at 100%, and the request just eats CPU
> forever.

What does EXPLAIN show for this and for the base query?

> PostgreSQL 7.4.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3

This may be your problem right here.  You are urgently in need of an
update in any case --- there are a lot of nasty bugs fixed since 7.4.1:
http://developer.postgresql.org/docs/postgres/release.html
        regards, tom lane


Re: Very slow updates when using IN syntax subselect

From
Bryce Nesbitt
Date:
Tom Lane wrote:
> Bryce Nesbitt <bryce1@obviously.com> writes:
>   
>>     update event set reconciled=true where event_id in
>>     (select event_id from event join token using (token_number)
>>      where token_status=50 and reconciled=false LIMIT 1);
>>     
>> On a 4 CPU machine, 2 CPU's peg at 100%, and the request just eats CPU
>> forever.
>>     
> What does EXPLAIN show for this and for the base query?                                     QUERY PLAN
-------------------------------------------------------------------------------------------Nested Loop
(cost=0.00..3.04rows=1 width=8)  ->  Seq Scan on event  (cost=0.00..0.00 rows=1 width=408)        Filter: (reconciled =
false) ->  Index Scan using token_token_number_key on token 
 
(cost=0.00..3.03 rows=1 width=11)        Index Cond: (("outer".token_number)::text =
(token.token_number)::text)        Filter: (token_status = 50)
(6 rows)

                                          QUERY PLAN
-------------------------------------------------------------------------------------------------Nested Loop IN Join
(cost=0.00..3.06rows=1 width=616)  Join Filter: ("outer".event_id = "inner".event_id)  ->  Seq Scan on event
(cost=0.00..0.00rows=1 width=616)  ->  Nested Loop  (cost=0.00..3.04 rows=1 width=8)        ->  Seq Scan on event
(cost=0.00..0.00rows=1 width=408)              Filter: (reconciled = false)        ->  Index Scan using
token_token_number_keyon token 
 
(cost=0.00..3.03 rows=1 width=11)              Index Cond: (("outer".token_number)::text =
(token.token_number)::text)              Filter: (token_status = 50)
(9 rows)


select count(*) from event;
-----------116226


stage=# select count(*) from token;
------- 8948




Re: Very slow updates when using IN syntax subselect

From
Tom Lane
Date:
Bryce Nesbitt <bryce1@obviously.com> writes:
> Tom Lane wrote:
>> What does EXPLAIN show for this and for the base query?      
>    ->  Seq Scan on event  (cost=0.00..0.00 rows=1 width=408)
>          Filter: (reconciled = false)

> select count(*) from event;
> -----------
>  116226

It seems pretty clear that you've never vacuumed nor analyzed these
tables ... else the planner would have some clue about their sizes.
Do that and then see what you get.
        regards, tom lane


Re: Very slow updates when using IN syntax subselect

From
Bryce Nesbitt
Date:
Tom Lane wrote:
> Bryce Nesbitt <bryce1@obviously.com> writes:
>   
>> Tom Lane wrote:
>>     
>>> What does EXPLAIN show for this and for the base query?
>>>       
>        
>   
>>    ->  Seq Scan on event  (cost=0.00..0.00 rows=1 width=408)
>>          Filter: (reconciled = false)
>>     
>
>   
>> select count(*) from event;
>> -----------
>>  116226
>>     
>
> It seems pretty clear that you've never vacuumed nor analyzed these
> tables ... else the planner would have some clue about their sizes.
> Do that and then see what you get.
>   
They occur in fine time.  That's good, thanks.  But jeeze, can't
postgres figure this out for itself?


Re: Very slow updates when using IN syntax subselect

From
Peter Eisentraut
Date:
Bryce Nesbitt wrote:
> > It seems pretty clear that you've never vacuumed nor analyzed these
> > tables ... else the planner would have some clue about their sizes.
> > Do that and then see what you get.
>
> They occur in fine time.  That's good, thanks.  But jeeze, can't
> postgres figure this out for itself?

I'm sure you wouldn't appreciate it if PostgreSQL did a full table scan 
before each query to figure out the total size of the involved tables.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Very slow updates when using IN syntax subselect

From
Bryce Nesbitt
Date:
Peter Eisentraut wrote:
> Bryce Nesbitt wrote:
>   
>>> It seems pretty clear that you've never vacuumed nor analyzed these
>>> tables ... else the planner would have some clue about their sizes.
>>> Do that and then see what you get.
>>>       
>> They occur in fine time.  That's good, thanks.  But jeeze, can't
>> postgres figure this out for itself?
>>     
>
> I'm sure you wouldn't appreciate it if PostgreSQL did a full table scan 
> before each query to figure out the total size of the involved tables.
>   
Nope.

But let's say the query optimizer thought the table had one row.
The the query starts, and 111,000 rows later...

It seems that a mismatch between the static table size, and the actual
one counted as you go, would be a quick
check.  That could set a flag for later background processing....
                                                      -Bryce



Re: Very slow updates when using IN syntax subselect

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Bryce Nesbitt wrote:
>> They occur in fine time.  That's good, thanks.  But jeeze, can't
>> postgres figure this out for itself?

> I'm sure you wouldn't appreciate it if PostgreSQL did a full table scan 
> before each query to figure out the total size of the involved tables.

It's also less than polite to complain about the behavior of
two-year-old releases, without making any effort to ascertain
whether more-current versions are smarter.
        regards, tom lane


Re: Very slow updates when using IN syntax subselect

From
Bryce Nesbitt
Date:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>   
>> Bryce Nesbitt wrote:
>>     
>>> They occur in finite time.  That's good, thanks.  But jeeze, can't
>>> postgres figure this out for itself?
>>>       
>> I'm sure you wouldn't appreciate it if PostgreSQL did a full table scan 
>> before each query to figure out the total size of the involved tables.
>>     
>
> It's also less than polite to complain about the behavior of
> two-year-old releases, without making any effort to ascertain
> whether more-current versions are smarter.
>   
Sorry to offend.  Are current versions smarter?
The DB I was working with still had row counts of 1 after tens of thousands
of records had been added.

With new versions, must the DB still be VACUUMED, or is that a bit
more automatic?