Thread: Problems with an update-from statement and pg-8.1.4
Hello We are having some problems with an UPDATE ... FROM sql-statement and pg-8.1.4. It takes ages to finish. The problem is the Seq Scan of the table 'mail', this table is over 6GB without indexes, and when we send thousands of this type of statement, the server has a very high iowait percent. How can we get rid of this Seq Scan? I send the output of an explain and table definitions: ------------------------------------------------------------------------- mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..932360.78 rows=7184312 width=57) -> Nested Loop (cost=0.00..6.54 rows=1 width=0) -> Index Scan using received_queue_id_index on mail_received mr (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) Filter: (mailhost = '129.240.10.47'::inet) -> Index Scan using mail_pkey on mail m (cost=0.00..3.32 rows=1 width=4) Index Cond: ("outer".mail_id = m.mail_id) -> Seq Scan on mail (cost=0.00..860511.12 rows=7184312 width=57) (8 rows) mailstats=# \d mail Table "public.mail" Column | Type | Modifiers ------------+--------------+-------------------------------------------------------- mail_id | integer | not null default nextval('mail_mail_id_seq'::regclass) size | integer | message_id | text | not null spamscore | numeric(6,3) | Indexes: "mail_pkey" PRIMARY KEY, btree (mail_id) "mail_message_id_key" UNIQUE, btree (message_id) mailstats=# \d mail_received Table "public.mail_received" Column | Type | Modifiers ---------------+-----------------------------+---------------------------------------------------------------------- reception_id | integer | not null default nextval('mail_received_reception_id_seq'::regclass) mail_id | integer | not null envelope_from | text | helohost | text | from_host | inet | protocol | text | mailhost | inet | received | timestamp without time zone | not null completed | timestamp without time zone | queue_id | character varying(16) | not null Indexes: "mail_received_pkey" PRIMARY KEY, btree (reception_id) "mail_received_queue_id_key" UNIQUE, btree (queue_id, mailhost) "mail_received_completed_idx" btree (completed) "mail_received_mailhost_index" btree (mailhost) "mail_received_received_index" btree (received) "received_id_index" btree (mail_id) "received_queue_id_index" btree (queue_id) Foreign-key constraints: "$1" FOREIGN KEY (mail_id) REFERENCES mail(mail_id) ------------------------------------------------------------------------- Thanks in advance. regards, -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
On Wed, 6 Dec 2006, Rafael Martinez wrote: > We are having some problems with an UPDATE ... FROM sql-statement and > pg-8.1.4. It takes ages to finish. The problem is the Seq Scan of the > table 'mail', this table is over 6GB without indexes, and when we send > thousands of this type of statement, the server has a very high iowait > percent. > > How can we get rid of this Seq Scan? > > I send the output of an explain and table definitions: > ------------------------------------------------------------------------- > > mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, > mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = > '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; I don't think this statement does what you expect. You're ending up with two copies of mail in the above one as "mail" and one as "m". You probably want to remove the mail m in FROM and use mail rather than m in the where clause.
Stephan Szabo wrote: > On Wed, 6 Dec 2006, Rafael Martinez wrote: > > >> We are having some problems with an UPDATE ... FROM sql-statement and >> pg-8.1.4. It takes ages to finish. The problem is the Seq Scan of the >> table 'mail', this table is over 6GB without indexes, and when we send >> thousands of this type of statement, the server has a very high iowait >> percent. >> >> How can we get rid of this Seq Scan? >> >> I send the output of an explain and table definitions: >> ------------------------------------------------------------------------- >> >> mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, >> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = >> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; >> > > I don't think this statement does what you expect. You're ending up with > two copies of mail in the above one as "mail" and one as "m". You probably > want to remove the mail m in FROM and use mail rather than m in the > where clause. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > > Worse yet I think your setting "spamcore" for EVERY row in mail to '-5.026'. The above solution should fix it though. -- Ted * * <http://www.blackducksoftware.com>
On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote: > Stephan Szabo wrote: > > On Wed, 6 Dec 2006, Rafael Martinez wrote: > >> > >> mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, > >> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = > >> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; > >> > > > > I don't think this statement does what you expect. You're ending up with > > two copies of mail in the above one as "mail" and one as "m". You probably > > want to remove the mail m in FROM and use mail rather than m in the > > where clause. > > > > > Worse yet I think your setting "spamcore" for EVERY row in mail to > '-5.026'. The above solution should fix it though. > > -- Ted > Thanks for the answers. I think the 'problem' is explain in the documentation: "fromlist A list of table expressions, allowing columns from other tables to appear in the WHERE condition and the update expressions. This is similar to the list of tables that can be specified in the FROMClause of a SELECT statement. Note that the target table must not appear in the fromlist, unless you intend a self-join (in which case it must appear with an alias in the fromlist)". And as you said, we can not have 'mail m' in the FROM clause. I have contacted the developers and they will change the statement. I gave then these 2 examples: ------------------------------------------------------------------------------- mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; QUERY PLAN ------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..6.54 rows=1 width=57) -> Index Scan using received_queue_id_index on mail_received mr (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) Filter: (mailhost = '129.240.10.47'::inet) -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 width=57) Index Cond: ("outer".mail_id = mail.mail_id) (6 rows) mailstats=# explain update mail SET spamscore = '-5.026' where mail_id = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1' and mailhost = '129.240.10.47'); QUERY PLAN ----------------------------------------------------------------------------------------------------- Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) Index Cond: (mail_id = $0) InitPlan -> Index Scan using received_queue_id_index on mail_received (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) Filter: (mailhost = '129.240.10.47'::inet) (6 rows) ------------------------------------------------------------------------------- regards, -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote: > >> Stephan Szabo wrote: >> >>> On Wed, 6 Dec 2006, Rafael Martinez wrote: >>> >>>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, >>>> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = >>>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; >>>> >>>> >>> I don't think this statement does what you expect. You're ending up with >>> two copies of mail in the above one as "mail" and one as "m". You probably >>> want to remove the mail m in FROM and use mail rather than m in the >>> where clause. >>> >>> >>> >> Worse yet I think your setting "spamcore" for EVERY row in mail to >> '-5.026'. The above solution should fix it though. >> >> -- Ted >> >> > > Thanks for the answers. I think the 'problem' is explain in the > documentation: > > "fromlist > > A list of table expressions, allowing columns from other tables to > appear in the WHERE condition and the update expressions. This is > similar to the list of tables that can be specified in the FROMClause of > a SELECT statement. Note that the target table must not appear in the > fromlist, unless you intend a self-join (in which case it must appear > with an alias in the fromlist)". > > And as you said, we can not have 'mail m' in the FROM clause. I have > contacted the developers and they will change the statement. I gave then > these 2 examples: > > ------------------------------------------------------------------------------- > mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM > mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id = > '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; > QUERY PLAN > ------------------------------------------------------------------------------------------------------ > Nested Loop (cost=0.00..6.54 rows=1 width=57) > -> Index Scan using received_queue_id_index on mail_received mr > (cost=0.00..3.20 rows=1 width=4) > Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) > Filter: (mailhost = '129.240.10.47'::inet) > -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 > width=57) > Index Cond: ("outer".mail_id = mail.mail_id) > (6 rows) > > mailstats=# explain update mail SET spamscore = '-5.026' where mail_id > = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1' > and mailhost = '129.240.10.47'); > QUERY PLAN > ----------------------------------------------------------------------------------------------------- > Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) > Index Cond: (mail_id = $0) > InitPlan > -> Index Scan using received_queue_id_index on mail_received > (cost=0.00..3.20 rows=1 width=4) > Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) > Filter: (mailhost = '129.240.10.47'::inet) > (6 rows) > ------------------------------------------------------------------------------- > Look again at the estimated costs of those two query plans. You haven't gained anything there. Try this out: EXPLAIN UPDATE mail SET spamscore = '-5.026' FROM mail_received mr WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ; -- erik jones <erik@myemma.com> software development emma(r)
On Wed, 2006-12-06 at 14:19 -0600, Erik Jones wrote: > Rafael Martinez wrote: > > On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote: > > > >> Stephan Szabo wrote: > >> > >>> On Wed, 6 Dec 2006, Rafael Martinez wrote: > >>> > >>>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, > >>>> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = > >>>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; > >>>> > >>>> > >>> I don't think this statement does what you expect. You're ending up with > >>> two copies of mail in the above one as "mail" and one as "m". You probably > >>> want to remove the mail m in FROM and use mail rather than m in the > >>> where clause. > >>> > >>> > >>> > >> Worse yet I think your setting "spamcore" for EVERY row in mail to > >> '-5.026'. The above solution should fix it though. > >> > >> -- Ted > >> > >> > > > > Thanks for the answers. I think the 'problem' is explain in the > > documentation: > > > > "fromlist > > > > A list of table expressions, allowing columns from other tables to > > appear in the WHERE condition and the update expressions. This is > > similar to the list of tables that can be specified in the FROMClause of > > a SELECT statement. Note that the target table must not appear in the > > fromlist, unless you intend a self-join (in which case it must appear > > with an alias in the fromlist)". > > > > And as you said, we can not have 'mail m' in the FROM clause. I have > > contacted the developers and they will change the statement. I gave then > > these 2 examples: > > > > ------------------------------------------------------------------------------- > > mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM > > mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id = > > '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------ > > Nested Loop (cost=0.00..6.54 rows=1 width=57) > > -> Index Scan using received_queue_id_index on mail_received mr > > (cost=0.00..3.20 rows=1 width=4) > > Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) > > Filter: (mailhost = '129.240.10.47'::inet) > > -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 > > width=57) > > Index Cond: ("outer".mail_id = mail.mail_id) > > (6 rows) > > > > mailstats=# explain update mail SET spamscore = '-5.026' where mail_id > > = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1' > > and mailhost = '129.240.10.47'); > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------- > > Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) > > Index Cond: (mail_id = $0) > > InitPlan > > -> Index Scan using received_queue_id_index on mail_received > > (cost=0.00..3.20 rows=1 width=4) > > Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) > > Filter: (mailhost = '129.240.10.47'::inet) > > (6 rows) > > ------------------------------------------------------------------------------- > > > Look again at the estimated costs of those two query plans. You haven't > gained anything there. Try this out: > > EXPLAIN UPDATE mail > SET spamscore = '-5.026' > FROM mail_received mr > WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ; > Haven't we? * In the statement with problems we got this: Nested Loop (cost=0.00..932360.78 rows=7184312 width=57) * In the ones I sent: Nested Loop (cost=0.00..6.54 rows=1 width=57) Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) * And in the last one you sent me: ------------------------------------------------------ Nested Loop (cost=0.00..6.53 rows=1 width=57) -> Index Scan using received_queue_id_index on mail_received mr (cost=0.00..3.20 rows=1 width=4) Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 width=57) Index Cond: (mail.mail_id = "outer".mail_id) (5 rows) ------------------------------------------------------ I can not see the different. regards, -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > On Wed, 2006-12-06 at 14:19 -0600, Erik Jones wrote: > >> Rafael Martinez wrote: >> >>> On Wed, 2006-12-06 at 14:55 -0500, Ted Allen wrote: >>> >>> >>>> Stephan Szabo wrote: >>>> >>>> >>>>> On Wed, 6 Dec 2006, Rafael Martinez wrote: >>>>> >>>>> >>>>>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM mail m, >>>>>> mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id = >>>>>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; >>>>>> >>>>>> >>>>>> >>>>> I don't think this statement does what you expect. You're ending up with >>>>> two copies of mail in the above one as "mail" and one as "m". You probably >>>>> want to remove the mail m in FROM and use mail rather than m in the >>>>> where clause. >>>>> >>>>> >>>>> >>>>> >>>> Worse yet I think your setting "spamcore" for EVERY row in mail to >>>> '-5.026'. The above solution should fix it though. >>>> >>>> -- Ted >>>> >>>> >>>> >>> Thanks for the answers. I think the 'problem' is explain in the >>> documentation: >>> >>> "fromlist >>> >>> A list of table expressions, allowing columns from other tables to >>> appear in the WHERE condition and the update expressions. This is >>> similar to the list of tables that can be specified in the FROMClause of >>> a SELECT statement. Note that the target table must not appear in the >>> fromlist, unless you intend a self-join (in which case it must appear >>> with an alias in the fromlist)". >>> >>> And as you said, we can not have 'mail m' in the FROM clause. I have >>> contacted the developers and they will change the statement. I gave then >>> these 2 examples: >>> >>> ------------------------------------------------------------------------------- >>> mailstats=# EXPLAIN update mail SET spamscore = '-5.026' FROM >>> mail_received mr where mr.mail_id = mail.mail_id AND mr.queue_id = >>> '1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47'; >>> QUERY PLAN >>> ------------------------------------------------------------------------------------------------------ >>> Nested Loop (cost=0.00..6.54 rows=1 width=57) >>> -> Index Scan using received_queue_id_index on mail_received mr >>> (cost=0.00..3.20 rows=1 width=4) >>> Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) >>> Filter: (mailhost = '129.240.10.47'::inet) >>> -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 >>> width=57) >>> Index Cond: ("outer".mail_id = mail.mail_id) >>> (6 rows) >>> >>> mailstats=# explain update mail SET spamscore = '-5.026' where mail_id >>> = (select mail_id from mail_received where queue_id = '1GrxLs-0004N9-I1' >>> and mailhost = '129.240.10.47'); >>> QUERY PLAN >>> ----------------------------------------------------------------------------------------------------- >>> Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) >>> Index Cond: (mail_id = $0) >>> InitPlan >>> -> Index Scan using received_queue_id_index on mail_received >>> (cost=0.00..3.20 rows=1 width=4) >>> Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) >>> Filter: (mailhost = '129.240.10.47'::inet) >>> (6 rows) >>> ------------------------------------------------------------------------------- >>> >>> >> Look again at the estimated costs of those two query plans. You haven't >> gained anything there. Try this out: >> >> EXPLAIN UPDATE mail >> SET spamscore = '-5.026' >> FROM mail_received mr >> WHERE mail.mail_id = mr.mail_id AND mr.queue_id = '1GrxLs-0004N9-I1' ; >> >> > > Haven't we? > > * In the statement with problems we got this: > Nested Loop (cost=0.00..932360.78 rows=7184312 width=57) > > * In the ones I sent: > Nested Loop (cost=0.00..6.54 rows=1 width=57) > Index Scan using mail_pkey on mail (cost=3.20..6.52 rows=1 width=57) > > * And in the last one you sent me: > ------------------------------------------------------ > Nested Loop (cost=0.00..6.53 rows=1 width=57) > -> Index Scan using received_queue_id_index on mail_received mr > (cost=0.00..3.20 rows=1 width=4) > Index Cond: ((queue_id)::text = '1GrxLs-0004N9-I1'::text) > -> Index Scan using mail_pkey on mail (cost=0.00..3.32 rows=1 > width=57) > Index Cond: (mail.mail_id = "outer".mail_id) > (5 rows) > ------------------------------------------------------ > > I can not see the different. > > regards, > Ah, sorry, I was just looking at the two that you sent in your last message thinking that they were 'old' and 'new', not both 'new'. My bad... -- erik jones <erik@myemma.com> software development emma(r)