Thread: Problems with an update-from statement and pg-8.1.4

Problems with an update-from statement and pg-8.1.4

From
Rafael Martinez
Date:
Hello

We are having some problems with an UPDATE ... FROM sql-statement and
pg-8.1.4. It takes ages to finish. The problem is the Seq Scan of the
table 'mail', this table is over 6GB without indexes, and when we send
thousands of this type of statement, the server has a very high iowait
percent.

How can we get rid of this Seq Scan?

I send the output of an explain and table definitions:
-------------------------------------------------------------------------

mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
'1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..932360.78 rows=7184312 width=57)
   ->  Nested Loop  (cost=0.00..6.54 rows=1 width=0)
         ->  Index Scan using received_queue_id_index on mail_received
mr  (cost=0.00..3.20 rows=1 width=4)
               Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
               Filter: (mailhost = '129.240.10.47'::inet)
         ->  Index Scan using mail_pkey on mail m  (cost=0.00..3.32
rows=1 width=4)
               Index Cond: ("outer".mail_id = m.mail_id)
   ->  Seq Scan on mail  (cost=0.00..860511.12 rows=7184312 width=57)
(8 rows)

mailstats=# \d mail
                                Table "public.mail"
   Column   |     Type     |                       Modifiers
------------+--------------+--------------------------------------------------------
 mail_id    | integer      | not null default
nextval('mail_mail_id_seq'::regclass)
 size       | integer      |
 message_id | text         | not null
 spamscore  | numeric(6,3) |
Indexes:
    "mail_pkey" PRIMARY KEY, btree (mail_id)
    "mail_message_id_key" UNIQUE, btree (message_id)

mailstats=# \d mail_received
                                            Table "public.mail_received"
    Column     |            Type             |
Modifiers
---------------+-----------------------------+----------------------------------------------------------------------
 reception_id  | integer                     | not null default
nextval('mail_received_reception_id_seq'::regclass)
 mail_id       | integer                     | not null
 envelope_from | text                        |
 helohost      | text                        |
 from_host     | inet                        |
 protocol      | text                        |
 mailhost      | inet                        |
 received      | timestamp without time zone | not null
 completed     | timestamp without time zone |
 queue_id      | character varying(16)       | not null
Indexes:
    "mail_received_pkey" PRIMARY KEY, btree (reception_id)
    "mail_received_queue_id_key" UNIQUE, btree (queue_id, mailhost)
    "mail_received_completed_idx" btree (completed)
    "mail_received_mailhost_index" btree (mailhost)
    "mail_received_received_index" btree (received)
    "received_id_index" btree (mail_id)
    "received_queue_id_index" btree (queue_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (mail_id) REFERENCES mail(mail_id)
-------------------------------------------------------------------------

Thanks in advance.
regards,
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/



Re: Problems with an update-from statement and pg-8.1.4

From
Stephan Szabo
Date:
On Wed, 6 Dec 2006, Rafael Martinez wrote:

> We are having some problems with an UPDATE ... FROM sql-statement and
> pg-8.1.4. It takes ages to finish. The problem is the Seq Scan of the
> table 'mail', this table is over 6GB without indexes, and when we send
> thousands of this type of statement, the server has a very high iowait
> percent.
>
> How can we get rid of this Seq Scan?
>
> I send the output of an explain and table definitions:
> -------------------------------------------------------------------------
>
> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';

I don't think this statement does what you expect. You're ending up with
two copies of mail in the above one as "mail" and one as "m". You probably
want to remove the mail m in FROM and use mail rather than m in the
where clause.

Re: Problems with an update-from statement and pg-8.1.4

From
Ted Allen
Date:
Stephan Szabo wrote:
> On Wed, 6 Dec 2006, Rafael Martinez wrote:
>
>
>> We are having some problems with an UPDATE ... FROM sql-statement and
>> pg-8.1.4. It takes ages to finish. The problem is the Seq Scan of the
>> table 'mail', this table is over 6GB without indexes, and when we send
>> thousands of this type of statement, the server has a very high iowait
>> percent.
>>
>> How can we get rid of this Seq Scan?
>>
>> I send the output of an explain and table definitions:
>> -------------------------------------------------------------------------
>>
>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
>> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
>>
>
> I don't think this statement does what you expect. You're ending up with
> two copies of mail in the above one as "mail" and one as "m". You probably
> want to remove the mail m in FROM and use mail rather than m in the
> where clause.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>
Worse yet I think your setting "spamcore" for EVERY row in mail to
'-5.026'.  The above solution should fix it though.

-- Ted

*
* <http://www.blackducksoftware.com>


Re: Problems with an update-from statement and pg-8.1.4

From
Rafael Martinez
Date:
On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote:
> Stephan Szabo wrote:
> > On Wed, 6 Dec 2006, Rafael Martinez wrote:
> >>
> >> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
> >> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
> >> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
> >>
> >
> > I don't think this statement does what you expect. You're ending up with
> > two copies of mail in the above one as "mail" and one as "m". You probably
> > want to remove the mail m in FROM and use mail rather than m in the
> > where clause.
> >
> >
> Worse yet I think your setting "spamcore" for EVERY row in mail to
> '-5.026'.  The above solution should fix it though.
>
> -- Ted
>

Thanks for the answers. I think the 'problem' is explain in the
documentation:

"fromlist

A list of table expressions, allowing columns from other tables to
appear in the WHERE condition and the update expressions. This is
similar to the list of tables that can be specified in the FROMClause of
a SELECT statement. Note that the target table must not appear in the
fromlist, unless you intend a self-join (in which case it must appear
with an alias in the fromlist)".

And as you said, we can not have 'mail m' in the FROM clause. I have
contacted the developers and they will change the statement. I gave then
these 2 examples:

-------------------------------------------------------------------------------
mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM
mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id =
'1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..6.54 rows=1 width=57)
   ->  Index Scan using received_queue_id_index on mail_received mr
(cost=0.00..3.20 rows=1 width=4)
         Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
         Filter: (mailhost = '129.240.10.47'::inet)
   ->  Index Scan using mail_pkey on mail  (cost=0.00..3.32 rows=1
width=57)
         Index Cond: ("outer".mail_id = mail.mail_id)
(6 rows)

mailstats=# explain  update mail SET spamscore = '-5.026' where mail_id
= (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1'
and mailhost = '129.240.10.47');
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Index Scan using mail_pkey on mail  (cost=3.20..6.52 rows=1 width=57)
   Index Cond: (mail_id = $0)
   InitPlan
     ->  Index Scan using received_queue_id_index on mail_received
(cost=0.00..3.20 rows=1 width=4)
           Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
           Filter: (mailhost = '129.240.10.47'::inet)
(6 rows)
-------------------------------------------------------------------------------

regards,
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/



Re: Problems with an update-from statement and pg-8.1.4

From
Erik Jones
Date:
Rafael Martinez wrote:
> On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote:
>
>> Stephan Szabo wrote:
>>
>>> On Wed, 6 Dec 2006, Rafael Martinez wrote:
>>>
>>>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
>>>> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
>>>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
>>>>
>>>>
>>> I don't think this statement does what you expect. You're ending up with
>>> two copies of mail in the above one as "mail" and one as "m". You probably
>>> want to remove the mail m in FROM and use mail rather than m in the
>>> where clause.
>>>
>>>
>>>
>> Worse yet I think your setting "spamcore" for EVERY row in mail to
>> '-5.026'.  The above solution should fix it though.
>>
>> -- Ted
>>
>>
>
> Thanks for the answers. I think the 'problem' is explain in the
> documentation:
>
> "fromlist
>
> A list of table expressions, allowing columns from other tables to
> appear in the WHERE condition and the update expressions. This is
> similar to the list of tables that can be specified in the FROMClause of
> a SELECT statement. Note that the target table must not appear in the
> fromlist, unless you intend a self-join (in which case it must appear
> with an alias in the fromlist)".
>
> And as you said, we can not have 'mail m' in the FROM clause. I have
> contacted the developers and they will change the statement. I gave then
> these 2 examples:
>
> -------------------------------------------------------------------------------
> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM
> mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id =
> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
>                                               QUERY PLAN
> ------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..6.54 rows=1 width=57)
>    ->  Index Scan using received_queue_id_index on mail_received mr
> (cost=0.00..3.20 rows=1 width=4)
>          Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
>          Filter: (mailhost = '129.240.10.47'::inet)
>    ->  Index Scan using mail_pkey on mail  (cost=0.00..3.32 rows=1
> width=57)
>          Index Cond: ("outer".mail_id = mail.mail_id)
> (6 rows)
>
> mailstats=# explain  update mail SET spamscore = '-5.026' where mail_id
> = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1'
> and mailhost = '129.240.10.47');
>                                              QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Index Scan using mail_pkey on mail  (cost=3.20..6.52 rows=1 width=57)
>    Index Cond: (mail_id = $0)
>    InitPlan
>      ->  Index Scan using received_queue_id_index on mail_received
> (cost=0.00..3.20 rows=1 width=4)
>            Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
>            Filter: (mailhost = '129.240.10.47'::inet)
> (6 rows)
> -------------------------------------------------------------------------------
>
Look again at the estimated costs of those two query plans.  You haven't
gained anything there.  Try this out:

EXPLAIN UPDATE mail
SET spamscore = '-5.026'
FROM mail_received mr
WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ;

--
erik jones <erik@myemma.com>
software development
emma(r)


Re: Problems with an update-from statement and pg-8.1.4

From
Rafael Martinez
Date:
On Wed, 2006-12-06 at 14:19 -0600, Erik Jones wrote:
> Rafael Martinez wrote:
> > On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote:
> >
> >> Stephan Szabo wrote:
> >>
> >>> On Wed, 6 Dec 2006, Rafael Martinez wrote:
> >>>
> >>>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
> >>>> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
> >>>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
> >>>>
> >>>>
> >>> I don't think this statement does what you expect. You're ending up with
> >>> two copies of mail in the above one as "mail" and one as "m". You probably
> >>> want to remove the mail m in FROM and use mail rather than m in the
> >>> where clause.
> >>>
> >>>
> >>>
> >> Worse yet I think your setting "spamcore" for EVERY row in mail to
> >> '-5.026'.  The above solution should fix it though.
> >>
> >> -- Ted
> >>
> >>
> >
> > Thanks for the answers. I think the 'problem' is explain in the
> > documentation:
> >
> > "fromlist
> >
> > A list of table expressions, allowing columns from other tables to
> > appear in the WHERE condition and the update expressions. This is
> > similar to the list of tables that can be specified in the FROMClause of
> > a SELECT statement. Note that the target table must not appear in the
> > fromlist, unless you intend a self-join (in which case it must appear
> > with an alias in the fromlist)".
> >
> > And as you said, we can not have 'mail m' in the FROM clause. I have
> > contacted the developers and they will change the statement. I gave then
> > these 2 examples:
> >
> > -------------------------------------------------------------------------------
> > mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM
> > mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id =
> > '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
> >                                               QUERY PLAN
> > ------------------------------------------------------------------------------------------------------
> >  Nested Loop  (cost=0.00..6.54 rows=1 width=57)
> >    ->  Index Scan using received_queue_id_index on mail_received mr
> > (cost=0.00..3.20 rows=1 width=4)
> >          Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
> >          Filter: (mailhost = '129.240.10.47'::inet)
> >    ->  Index Scan using mail_pkey on mail  (cost=0.00..3.32 rows=1
> > width=57)
> >          Index Cond: ("outer".mail_id = mail.mail_id)
> > (6 rows)
> >
> > mailstats=# explain  update mail SET spamscore = '-5.026' where mail_id
> > = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1'
> > and mailhost = '129.240.10.47');
> >                                              QUERY PLAN
> > -----------------------------------------------------------------------------------------------------
> >  Index Scan using mail_pkey on mail  (cost=3.20..6.52 rows=1 width=57)
> >    Index Cond: (mail_id = $0)
> >    InitPlan
> >      ->  Index Scan using received_queue_id_index on mail_received
> > (cost=0.00..3.20 rows=1 width=4)
> >            Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
> >            Filter: (mailhost = '129.240.10.47'::inet)
> > (6 rows)
> > -------------------------------------------------------------------------------
> >
> Look again at the estimated costs of those two query plans.  You haven't
> gained anything there.  Try this out:
>
> EXPLAIN UPDATE mail
> SET spamscore = '-5.026'
> FROM mail_received mr
> WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ;
>

Haven't we?

* In the statement with problems we got this:
Nested Loop  (cost=0.00..932360.78 rows=7184312 width=57)

* In the ones I sent:
Nested Loop  (cost=0.00..6.54 rows=1 width=57)
Index Scan using mail_pkey on mail  (cost=3.20..6.52 rows=1 width=57)

* And in the last one you sent me:
------------------------------------------------------
Nested Loop  (cost=0.00..6.53 rows=1 width=57)
   ->  Index Scan using received_queue_id_index on mail_received mr
(cost=0.00..3.20 rows=1 width=4)
         Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
   ->  Index Scan using mail_pkey on mail  (cost=0.00..3.32 rows=1
width=57)
         Index Cond: (mail.mail_id = "outer".mail_id)
(5 rows)
------------------------------------------------------

I can not see the different.

regards,
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/



Re: Problems with an update-from statement and pg-8.1.4

From
Erik Jones
Date:
Rafael Martinez wrote:
> On Wed, 2006-12-06 at 14:19 -0600, Erik Jones wrote:
>
>> Rafael Martinez wrote:
>>
>>> On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote:
>>>
>>>
>>>> Stephan Szabo wrote:
>>>>
>>>>
>>>>> On Wed, 6 Dec 2006, Rafael Martinez wrote:
>>>>>
>>>>>
>>>>>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
>>>>>> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
>>>>>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
>>>>>>
>>>>>>
>>>>>>
>>>>> I don't think this statement does what you expect. You're ending up with
>>>>> two copies of mail in the above one as "mail" and one as "m". You probably
>>>>> want to remove the mail m in FROM and use mail rather than m in the
>>>>> where clause.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Worse yet I think your setting "spamcore" for EVERY row in mail to
>>>> '-5.026'.  The above solution should fix it though.
>>>>
>>>> -- Ted
>>>>
>>>>
>>>>
>>> Thanks for the answers. I think the 'problem' is explain in the
>>> documentation:
>>>
>>> "fromlist
>>>
>>> A list of table expressions, allowing columns from other tables to
>>> appear in the WHERE condition and the update expressions. This is
>>> similar to the list of tables that can be specified in the FROMClause of
>>> a SELECT statement. Note that the target table must not appear in the
>>> fromlist, unless you intend a self-join (in which case it must appear
>>> with an alias in the fromlist)".
>>>
>>> And as you said, we can not have 'mail m' in the FROM clause. I have
>>> contacted the developers and they will change the statement. I gave then
>>> these 2 examples:
>>>
>>> -------------------------------------------------------------------------------
>>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM
>>> mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id =
>>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';
>>>                                               QUERY PLAN
>>> ------------------------------------------------------------------------------------------------------
>>>  Nested Loop  (cost=0.00..6.54 rows=1 width=57)
>>>    ->  Index Scan using received_queue_id_index on mail_received mr
>>> (cost=0.00..3.20 rows=1 width=4)
>>>          Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
>>>          Filter: (mailhost = '129.240.10.47'::inet)
>>>    ->  Index Scan using mail_pkey on mail  (cost=0.00..3.32 rows=1
>>> width=57)
>>>          Index Cond: ("outer".mail_id = mail.mail_id)
>>> (6 rows)
>>>
>>> mailstats=# explain  update mail SET spamscore = '-5.026' where mail_id
>>> = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1'
>>> and mailhost = '129.240.10.47');
>>>                                              QUERY PLAN
>>> -----------------------------------------------------------------------------------------------------
>>>  Index Scan using mail_pkey on mail  (cost=3.20..6.52 rows=1 width=57)
>>>    Index Cond: (mail_id = $0)
>>>    InitPlan
>>>      ->  Index Scan using received_queue_id_index on mail_received
>>> (cost=0.00..3.20 rows=1 width=4)
>>>            Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
>>>            Filter: (mailhost = '129.240.10.47'::inet)
>>> (6 rows)
>>> -------------------------------------------------------------------------------
>>>
>>>
>> Look again at the estimated costs of those two query plans.  You haven't
>> gained anything there.  Try this out:
>>
>> EXPLAIN UPDATE mail
>> SET spamscore = '-5.026'
>> FROM mail_received mr
>> WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ;
>>
>>
>
> Haven't we?
>
> * In the statement with problems we got this:
> Nested Loop  (cost=0.00..932360.78 rows=7184312 width=57)
>
> * In the ones I sent:
> Nested Loop  (cost=0.00..6.54 rows=1 width=57)
> Index Scan using mail_pkey on mail  (cost=3.20..6.52 rows=1 width=57)
>
> * And in the last one you sent me:
> ------------------------------------------------------
> Nested Loop  (cost=0.00..6.53 rows=1 width=57)
>    ->  Index Scan using received_queue_id_index on mail_received mr
> (cost=0.00..3.20 rows=1 width=4)
>          Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text)
>    ->  Index Scan using mail_pkey on mail  (cost=0.00..3.32 rows=1
> width=57)
>          Index Cond: (mail.mail_id = "outer".mail_id)
> (5 rows)
> ------------------------------------------------------
>
> I can not see the different.
>
> regards,
>
Ah, sorry, I was just looking at the two that you sent in your last
message thinking that they were 'old' and 'new',  not both 'new'.  My bad...

--
erik jones <erik@myemma.com>
software development
emma(r)