Thread: Slow update with simple query

Slow update with simple query

From
Arnaud Lesauvage
Date:
Hi list !

I am running a query to update the boolean field of a table based on
another table's fields.

The query is (changed names for readability):
UPDATE t1
SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y')
FROM t2
WHERE t1.uid = t2.uid

t2.uid is the PRIMARY KEY.
t2 only has ~1000 rows, so I think it fits fully in memory.
t1 as ~2.000.000 rows.
There is an index on t1.uid also.

The explain (sorry, not explain analyze available yet) is :

Hash Join  (cost=112.75..307410.10 rows=2019448 width=357)
   Hash Cond: ("outer".uid= "inner".uid)
   ->  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340)
   ->  Hash  (cost=110.20..110.20 rows=1020 width=53)
         ->  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53)

My query has been running for more than 1.5 hour now, and it is still running.
Nothing else is running on the server.
There are two multicolumn-indexes on this column (both are 3-columns indexes). One of them has a
functional column (date_trunc('month', datefield)).

Do you think the problem is with the indexes ?

The hardware is not great, but the database is on a RAID1 array, so its not bad either.
I am surprised that it takes more than 3 seconds per row to be updated.

Thanks for your opinion on this !

--
Arnaud

Re: Slow update with simple query

From
Ragnar
Date:
On mið, 2006-12-13 at 11:51 +0100, Arnaud Lesauvage wrote:
> Hi list !
>
> I am running a query to update the boolean field of a table based on
> another table's fields.
>
> The query is (changed names for readability):
> UPDATE t1
> SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y')
> FROM t2
> WHERE t1.uid = t2.uid
>
> t2.uid is the PRIMARY KEY.
> t2 only has ~1000 rows, so I think it fits fully in memory.
> t1 as ~2.000.000 rows.
> There is an index on t1.uid also.
>
> The explain (sorry, not explain analyze available yet) is :
>
> Hash Join  (cost=112.75..307410.10 rows=2019448 width=357)
>    Hash Cond: ("outer".uid= "inner".uid)
>    ->  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340)
>    ->  Hash  (cost=110.20..110.20 rows=1020 width=53)
>          ->  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53)
>
> My query has been running for more than 1.5 hour now, and it is still running.
> Nothing else is running on the server.
> There are two multicolumn-indexes on this column (both are 3-columns indexes). One of them has a
> functional column (date_trunc('month', datefield)).
>
> Do you think the problem is with the indexes ?

I guess so. are you sure about the index on t1.uid?
what are the column definitions for t1.uid and t2.uid ?
are they the same ?
you should ba able to get a plan similar to:
Merge Join  (cost=0.00..43.56 rows=1000 width=11)
   Merge Cond: ("outer".uid = "inner".uid)
   ->  Index Scan using t1i on t1  (cost=0.00..38298.39 rows=2000035
width=10)
   ->  Index Scan using t2i on t2  (cost=0.00..26.73 rows=1000 width=5)

what postgres version are you using ?

gnari




>
> The hardware is not great, but the database is on a RAID1 array, so its not bad either.
> I am surprised that it takes more than 3 seconds per row to be updated.
>
> Thanks for your opinion on this !
>
> --
> Arnaud
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
Ragnar a écrit :
>> Do you think the problem is with the indexes ?
>
> I guess so. are you sure about the index on t1.uid?
> what are the column definitions for t1.uid and t2.uid ?
> are they the same ?

Man, no !!!
I just checked and indeed, no index on this column. I probably dropped
it lately.
Thanks Ragnar.
(t1.uid and t2.uid were the same, character(32) columns)

> you should ba able to get a plan similar to:
> Merge Join  (cost=0.00..43.56 rows=1000 width=11)
>    Merge Cond: ("outer".uid = "inner".uid)
>    ->  Index Scan using t1i on t1  (cost=0.00..38298.39 rows=2000035
> width=10)
>    ->  Index Scan using t2i on t2  (cost=0.00..26.73 rows=1000 width=5)
>
> what postgres version are you using ?

Ooops, forgot that too : 8.1.4

I am creating the index right now, I'll tell you if this fixes the problem.
Thanks for your help !

--
Arnaud

Re: Slow update with simple query

From
"Jens Schipkowski"
Date:
Hi,

the problem is a combination of bad formed SQL and maybe missing indexes.
try this:
UPDATE t1
SET booleanfield = foo.bar
 FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar FROM
t2) AS foo
WHERE t1.uid=foo.uid;

and index t1.uid, t2.uid, t2.field, t2.field2

regards,
Jens Schipkowski

On Wed, 13 Dec 2006 11:51:10 +0100, Arnaud Lesauvage <thewild@freesurf.fr>
wrote:

> Hi list !
>
> I am running a query to update the boolean field of a table based on
> another table's fields.
>
> The query is (changed names for readability):
> UPDATE t1
> SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y')
> FROM t2
> WHERE t1.uid = t2.uid
>
> t2.uid is the PRIMARY KEY.
> t2 only has ~1000 rows, so I think it fits fully in memory.
> t1 as ~2.000.000 rows.
> There is an index on t1.uid also.
>
> The explain (sorry, not explain analyze available yet) is :
>
> Hash Join  (cost=112.75..307410.10 rows=2019448 width=357)
>    Hash Cond: ("outer".uid= "inner".uid)
>    ->  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340)
>    ->  Hash  (cost=110.20..110.20 rows=1020 width=53)
>          ->  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53)
>
> My query has been running for more than 1.5 hour now, and it is still
> running.
> Nothing else is running on the server.
> There are two multicolumn-indexes on this column (both are 3-columns
> indexes). One of them has a functional column (date_trunc('month',
> datefield)).
>
> Do you think the problem is with the indexes ?
>
> The hardware is not great, but the database is on a RAID1 array, so its
> not bad either.
> I am surprised that it takes more than 3 seconds per row to be updated.
>
> Thanks for your opinion on this !
>
> --
> Arnaud
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org



--
**
APUS Software GmbH

Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
Jens Schipkowski a écrit :
> the problem is a combination of bad formed SQL and maybe missing indexes.
> try this:
> UPDATE t1
> SET booleanfield = foo.bar
>  FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar FROM
> t2) AS foo
> WHERE t1.uid=foo.uid;


Hi Jens,
Why is this query better than the other one ? Because it runs the
"(field IN ('some','other') AND field2 = 'Y')" once and then executes
the join with the resulting set ?

> and index t1.uid, t2.uid, t2.field, t2.field2

t1.field can only take 3 or 4 values (don't remember exactly), and
field2 only 2 ('Y' or 'N'). So this fields have a very low cardinality.
Won't the planner chose to do a table scan in such a case ?

Thanks for your advices !

--
Arnaud

Re: Slow update with simple query

From
"Jens Schipkowski"
Date:
On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage <thewild@freesurf.fr>
wrote:

> Jens Schipkowski a écrit :
>> the problem is a combination of bad formed SQL and maybe missing
>> indexes.
>> try this:
>> UPDATE t1
>> SET booleanfield = foo.bar
>>  FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar
>> FROM  t2) AS foo
>> WHERE t1.uid=foo.uid;
>
>
> Hi Jens,
> Why is this query better than the other one ? Because it runs the
> "(field IN ('some','other') AND field2 = 'Y')" once and then executes
> the join with the resulting set ?
True. The Subselect in FROM clause will be executed once and will be
joined using the condition at where clause. So your condition at t2 is not
executed for each row in t1(2mio records) but for each row in t2(1k
records). And the boolean value is already set during update.

regards,
Jens

>
>> and index t1.uid, t2.uid, t2.field, t2.field2
>
> t1.field can only take 3 or 4 values (don't remember exactly), and
> field2 only 2 ('Y' or 'N'). So this fields have a very low cardinality.
> Won't the planner chose to do a table scan in such a case ?
>
> Thanks for your advices !
>
> --
> Arnaud



--
**
APUS Software GmbH

Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
Jens Schipkowski a écrit :
> On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage <thewild@freesurf.fr>
>> Why is this query better than the other one ? Because it runs the
>> "(field IN ('some','other') AND field2 = 'Y')" once and then executes
>> the join with the resulting set ?
> True. The Subselect in FROM clause will be executed once and will be
> joined using the condition at where clause. So your condition at t2 is not
> executed for each row in t1(2mio records) but for each row in t2(1k
> records). And the boolean value is already set during update.

OK Jens, thanks for clarifying this.
I thought the planner could guess what to do in such cases.

Regards
--
Arnaud

Re: Slow update with simple query

From
Ragnar
Date:
On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote:
> Jens Schipkowski a écrit :
> > On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage <thewild@freesurf.fr>
> >> Why is this query better than the other one ? Because it runs the
> >> "(field IN ('some','other') AND field2 = 'Y')" once and then executes
> >> the join with the resulting set ?
> > True. The Subselect in FROM clause will be executed once and will be
> > joined using the condition at where clause. So your condition at t2 is not
> > executed for each row in t1(2mio records) but for each row in t2(1k
> > records). And the boolean value is already set during update.
>
> OK Jens, thanks for clarifying this.
> I thought the planner could guess what to do in such cases.

don't worry, it will.
this is not your problem

gnari



Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
Ragnar a écrit :
> On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote:
>> Jens Schipkowski a écrit :
>> > On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage <thewild@freesurf.fr>
>> >> Why is this query better than the other one ? Because it runs the
>> >> "(field IN ('some','other') AND field2 = 'Y')" once and then executes
>> >> the join with the resulting set ?
>> > True. The Subselect in FROM clause will be executed once and will be
>> > joined using the condition at where clause. So your condition at t2 is not
>> > executed for each row in t1(2mio records) but for each row in t2(1k
>> > records). And the boolean value is already set during update.
>>
>> OK Jens, thanks for clarifying this.
>> I thought the planner could guess what to do in such cases.
>
> don't worry, it will.
> this is not your problem

Indeed, the new query does not perform that well :

"Hash Join  (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1)"
"  Hash Cond: ("outer".uid = "inner".uid)"
"  ->  Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=338) (actual time=19.342..234304.499 rows=2033001
loops=1)"
"  ->  Hash  (cost=110.20..110.20 rows=1020 width=53) (actual time=4.853..4.853 rows=1020 loops=1)"
"        ->  Seq Scan on t2  (cost=0.00..110.20 rows=1020 width=53) (actual time=0.017..2.586 rows=1020 loops=1)"
"Total runtime: 2777844.892 ms"

I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index
containindthe updated field). 
I believe the multicolumn-functional-index computation is taking some time here, isn't it ?

Regards
--
Arnaud

Re: Slow update with simple query

From
asif ali
Date:
Arnaud,
Have you run "ANALYZE" on the table after creating index?
Also make sure that "#effective_cache_size" is set properly. A higher value makes it more likely to use index scans.

Thanks
asif ali

Arnaud Lesauvage <thewild@freesurf.fr> wrote:
Ragnar a écrit :
> On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote:
>> Jens Schipkowski a écrit :
>> > On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage
>> >> Why is this query better than the other one ? Because it runs the
>> >> "(field IN ('some','other') AND field2 = 'Y')" once and then executes
>> >> the join with the resulting set ?
>> > True. The Subselect in FROM clause will be executed once and will be
>> > joined using the condition at where clause. So your condition at t2 is not
>> > executed for each row in t1(2mio records) but for each row in t2(1k
>> > records). And the boolean value is already set during update.
>>
>> OK Jens, thanks for clarifying this.
>> I thought the planner could guess what to do in such cases.
>
> don't worry, it will.
> this is not your problem

Indeed, the new query does not perform that well :

"Hash Join (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1)"
" Hash Cond: ("outer".uid = "inner".uid)"
" -> Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=338) (actual time=19.342..234304.499 rows=2033001 loops=1)"
" -> Hash (cost=110.20..110.20 rows=1020 width=53) (actual time=4.853..4.853 rows=1020 loops=1)"
" -> Seq Scan on t2 (cost=0.00..110.20 rows=1020 width=53) (actual time=0.017..2.586 rows=1020 loops=1)"
"Total runtime: 2777844.892 ms"

I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index containind the updated field).
I believe the multicolumn-functional-index computation is taking some time here, isn't it ?

Regards
--
Arnaud

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Any questions? Get answers on any topic at Yahoo! Answers. Try it now.

Re: Slow update with simple query

From
Tom Lane
Date:
Arnaud Lesauvage <thewild@freesurf.fr> writes:
> Indeed, the new query does not perform that well :

> "Hash Join  (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1)"
> ...
> "Total runtime: 2777844.892 ms"

> I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index
containindthe updated field). 
> I believe the multicolumn-functional-index computation is taking some time here, isn't it ?

Given that the plan itself only takes 246 sec, there's *something*
associated with row insertion that's eating the other 2500+ seconds.
Either index entry computation or constraint checking ...

            regards, tom lane

Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
asif ali a écrit :
> Arnaud,
>  Have you run "ANALYZE" on the table after creating index?

Yes, I have !

>  Also make sure that "#effective_cache_size" is set properly. A higher         value makes it more likely to use
indexscans. 

It is set to 50.000. I thought this would be enough, and maybe too much !

Thanks for your advice !

--
Arnaud

Re: Slow update with simple query

From
Mark Lewis
Date:
But he's using 8.1.4-- in that version, an explain analyze would list
the time taken to go through triggers, so the fact that we don't see any
of those lines means that it can't be constraint checking, so wouldn't
it have to be the index update overhead?

-- Mark

On Wed, 2006-12-13 at 11:46 -0500, Tom Lane wrote:
> Arnaud Lesauvage <thewild@freesurf.fr> writes:
> > Indeed, the new query does not perform that well :
>
> > "Hash Join  (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1)"
> > ...
> > "Total runtime: 2777844.892 ms"
>
> > I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index
containindthe updated field). 
> > I believe the multicolumn-functional-index computation is taking some time here, isn't it ?
>
> Given that the plan itself only takes 246 sec, there's *something*
> associated with row insertion that's eating the other 2500+ seconds.
> Either index entry computation or constraint checking ...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: Slow update with simple query

From
Tom Lane
Date:
Mark Lewis <mark.lewis@mir3.com> writes:
> But he's using 8.1.4-- in that version, an explain analyze would list
> the time taken to go through triggers, so the fact that we don't see any
> of those lines means that it can't be constraint checking, so wouldn't
> it have to be the index update overhead?

Well, it can't be foreign key checking.  Could have been an expensive
function in a CHECK constraint, maybe...

            regards, tom lane

Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
Tom Lane a écrit :
> Arnaud Lesauvage <thewild@freesurf.fr> writes:
>> Indeed, the new query does not perform that well :
>
>> "Hash Join  (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1)"
>> ...
>> "Total runtime: 2777844.892 ms"
>
>> I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index
containindthe updated field). 
>> I believe the multicolumn-functional-index computation is taking some time here, isn't it ?
>
> Given that the plan itself only takes 246 sec, there's *something*
> associated with row insertion that's eating the other 2500+ seconds.
> Either index entry computation or constraint checking ...

There is an insert trigger (but here I am only updating the data), and a
multicolumn functional index. That's all I can think of.

I must be missing something, so here is the full table description.
The field I am updating is incluredansstats.
The field I am join on is userinternalid.


CREATE TABLE statistiques.log
(
    gid serial NOT NULL,
    userinternalid character(32),
    ip character varying(255),
    browser character varying(255),
    fichier character varying(255),
    querystring text,
    page character varying(255),
    useridentity character varying(100),
    incluredansstats boolean NOT NULL DEFAULT true,
    date character varying,
    heure character varying,
    dateformatee timestamp without time zone,
    sessionid character(32),
    sortindex integer,
    CONSTRAINT log_pkey PRIMARY KEY (gid)
)
WITHOUT OIDS;
ALTER TABLE statistiques.log OWNER TO postgres;ncluredansstats;

CREATE INDEX idx_page_datemonth_incluredansstats
    ON statistiques.log
    USING btree
    (page, date_trunc('month'::text, dateformatee), incluredansstats);

CREATE INDEX idx_userinternalid
    ON statistiques.log
    USING btree
    (userinternalid);

CREATE INDEX idx_userinternalid_page_datemonth
    ON statistiques.log
    USING btree
    (userinternalid, page, date_trunc('month'::text, dateformatee));

ALTER TABLE statistiques.log
    ADD CONSTRAINT log_pkey PRIMARY KEY(gid);

CREATE TRIGGER parse_log_trigger
    BEFORE INSERT
    ON statistiques.log
    FOR EACH ROW
    EXECUTE PROCEDURE statistiques.parse_log_trigger();


This was a "one-shot" query, so I don't really mind it being slow, but if you
want I can still troubleshoot it !

Re: Slow update with simple query

From
Tom Lane
Date:
Arnaud Lesauvage <thewild@freesurf.fr> writes:
> I must be missing something, so here is the full table description.

It looks pretty harmless, except for

> CREATE TRIGGER parse_log_trigger
>     BEFORE INSERT
>     ON statistiques.log
>     FOR EACH ROW
>     EXECUTE PROCEDURE statistiques.parse_log_trigger();

It seems the time must be going into this trigger function.  What
does it do?

            regards, tom lane

Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
Tom Lane a écrit :
> Arnaud Lesauvage <thewild@freesurf.fr> writes:
>> I must be missing something, so here is the full table description.
>
> It looks pretty harmless, except for
>
>> CREATE TRIGGER parse_log_trigger
>>     BEFORE INSERT
>>     ON statistiques.log
>>     FOR EACH ROW
>>     EXECUTE PROCEDURE statistiques.parse_log_trigger();
>
> It seems the time must be going into this trigger function.  What
> does it do?

A lot of things ! Indeed, if it runs it will very badly hurt performances (table
lookups, string manipulation, etc...) !
But it should only be tringered on INSERTs, and I am doing an UPDATE !

I can post the function's body if you want.

Regards
--
Arnaud


Re: Slow update with simple query

From
Tom Lane
Date:
Arnaud Lesauvage <thewild@freesurf.fr> writes:
> Tom Lane a �crit :
>> It seems the time must be going into this trigger function.  What
>> does it do?

> A lot of things ! Indeed, if it runs it will very badly hurt performances (table
> lookups, string manipulation, etc...) !
> But it should only be tringered on INSERTs, and I am doing an UPDATE !

Doh, right, I obviously still need to ingest more caffeine this morning.

I think the conclusion must be that there was just too much I/O to be
done to update all the rows.  Have you done any tuning of shared_buffers
and so forth?  I recall having seen cases where update performance went
bad as soon as the upper levels of a large index no longer fit into
shared_buffers ...

            regards, tom lane

Re: Slow update with simple query

From
Arnaud Lesauvage
Date:
Tom Lane a écrit :
> I think the conclusion must be that there was just too much I/O to be
> done to update all the rows.  Have you done any tuning of shared_buffers
> and so forth?  I recall having seen cases where update performance went
> bad as soon as the upper levels of a large index no longer fit into
> shared_buffers ...

Yes, that's probably it.
I think my raid1 array's performances are very bad.
I am switching to a brand new hardware next week, I am quite confident that this
will solve many problems.

Thanks for helping !

Regards
--
Arnaud


Re: Slow update with simple query

From
Mark Lewis
Date:
Out of curiosity, how hard would it be to modify the output of EXPLAIN
ANALYZE when doing an insert/update to include the index update times
and/or non-FK constraint checking times and/or the table row update
times?  Or any other numbers that might be useful in circumstances like
this.  I'm wondering if it's possible to shed some light on the
remaining dark shadows of PG performance troubleshooting.

-- Mark Lewis

On Thu, 2006-12-14 at 11:19 -0500, Tom Lane wrote:
> Arnaud Lesauvage <thewild@freesurf.fr> writes:
> > Tom Lane a crit :
> >> It seems the time must be going into this trigger function.  What
> >> does it do?
>
> > A lot of things ! Indeed, if it runs it will very badly hurt performances (table
> > lookups, string manipulation, etc...) !
> > But it should only be tringered on INSERTs, and I am doing an UPDATE !
>
> Doh, right, I obviously still need to ingest more caffeine this morning.
>
> I think the conclusion must be that there was just too much I/O to be
> done to update all the rows.  Have you done any tuning of shared_buffers
> and so forth?  I recall having seen cases where update performance went
> bad as soon as the upper levels of a large index no longer fit into
> shared_buffers ...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: Slow update with simple query

From
Tom Lane
Date:
Mark Lewis <mark.lewis@mir3.com> writes:
> Out of curiosity, how hard would it be to modify the output of EXPLAIN
> ANALYZE when doing an insert/update to include the index update times
> and/or non-FK constraint checking times and/or the table row update
> times?

I don't think it'd help much --- in an example like this, other tools
like vmstat would be more likely to be useful in investigating the
problem.  There's also the problem that EXPLAIN ANALYZE overhead is
already too high --- another dozen gettimeofday() calls per row
processed doesn't sound appetizing.

            regards, tom lane