Thread: Optimizer failure on update w/integer column

Optimizer failure on update w/integer column

From
nolan@celery.tssi.com
Date:
Is there a documented problem with optimizing integer key fields
in 7.3.3?

I have two tables.  One was extracted from the other and has
the following four columns.

    mtranmemid   varchar(8),
    mtranseq     integer,
    mtransts     char,
    mtranmemtp   varchar(2)

mtranseq is a unique index on both tables and contains no nulls.

When I try to do an update, it takes several hours.  Here's what
explain says about the query:

explain update missing_ids
set mtransts = a.mtransts,
mtranmemtp = a.mtranmemtp
from memtran as a
where a.mtranmemid = missing_ids.mtranmemid
and a.mtranseq = missing_ids.mtranseq

                                   QUERY PLAN
--------------------------------------------------------------------------------
 Hash Join  (cost=9231.64..58634.93 rows=1 width=48)
   Hash Cond: ("outer".mtranseq = "inner".mtranseq)
   Join Filter: ("outer".mtranmemid = "inner".mtranmemid)
   ->  Seq Scan on memtran a  (cost=0.00..22282.57 rows=714157 width=26)
   ->  Hash  (cost=6289.91..6289.91 rows=351891 width=22)
         ->  Seq Scan on missing_ids  (cost=0.00..6289.91 rows=351891 width=22)
--
Mike Nolan




Re: Optimizer failure on update w/integer column

From
Tom Lane
Date:
nolan@celery.tssi.com writes:
> Is there a documented problem with optimizing integer key fields
> in 7.3.3?

I see nothing particularly wrong with that query plan.

> When I try to do an update, it takes several hours.

That seems excessive for a 700K-row update, but I doubt that the query
plan is the problem.  I'm betting there's a lot of per-update overhead
due to triggers or foreign keys.  What have you got in that line?  Do
you have indexes on both sides of any foreign-key relationships that
missing_ids participates in?

            regards, tom lane

Re: Optimizer failure on update w/integer column

From
nolan@celery.tssi.com
Date:
> That seems excessive for a 700K-row update, but I doubt that the query
> plan is the problem.  I'm betting there's a lot of per-update overhead
> due to triggers or foreign keys.  What have you got in that line?  Do
> you have indexes on both sides of any foreign-key relationships that
> missing_ids participates in?

There are no triggers or foreign key relationships.  There are unique
indexes on mtranseq on both tables, and there is just the one index on
missing_ids.  There is another index on the memtran table on the
mtranmemid and mtranseq columns.

This gets stranger and stranger.  I moved the missing_id's table over to
a faster computer so I could do some timings without it taking all night.
(That increased the size of the memtran table, but the general behavior
is similar.  Most postgres tasks on this computer run 5-10 times faster
than on the other one.)

Here's the revised plan:

explain update missing_ids
set mtransts = a.mtransts,
mtranmemtp = a.mtranmemtp
from memtran as a
where a.mtranmemid = missing_ids.mtranmemid
and a.mtranseq = missing_ids.mtranseq

 Hash Join  (cost=60271.38..88098.75 rows=1 width=48)
   Hash Cond: ("outer".mtranseq = "inner".mtranseq)
   Join Filter: ("inner".mtranmemid = "outer".mtranmemid)
   ->  Seq Scan on missing_ids  (cost=0.00..6289.91 rows=351891 width=22)
   ->  Hash  (cost=27070.30..27070.30 rows=941530 width=26)
         ->  Seq Scan on memtran a  (cost=0.00..27070.30 rows=941530 width=26)

The first time I ran it, it took 318 seconds on this machine.

That's MUCH better than on the other machine, but here's where things
get a bit weird.

If I drop the index on missing_ids completely, it runs much faster, 35
seconds the first time, 38 seconds when I ran it a second time.

I then recreated the index on missing_ids(memtranseq), the execution time
slowed down to 48 seconds the first time I reran the update, and it took
262 seconds when I ran the update again.  Subsequent passes got progressivly
slower: 371 seconds, then 764 seconds.

I dropped the index again, here are consecutive running times for the
query:

54 seconds, 45 seconds, 42 seconds, 43 seconds, 43 seconds, 45 seconds.
(I am the only user on the system this afternoon.)
--
Mike Nolan

Re: Optimizer failure on update w/integer column

From
Tom Lane
Date:
nolan@celery.tssi.com writes:
> If I drop the index on missing_ids completely, it runs much faster, 35
> seconds the first time, 38 seconds when I ran it a second time.

> I then recreated the index on missing_ids(memtranseq), the execution time
> slowed down to 48 seconds the first time I reran the update, and it took
> 262 seconds when I ran the update again.  Subsequent passes got progressivly
> slower: 371 seconds, then 764 seconds.

This is a unique index, right?  Seems like the cost must be related to
checking for uniqueness violations --- the index code has to plow
through all the index entries with the same key, visit their associated
heap tuples, confirm those tuples are dead (or being deleted by the
current transaction).  You could check this by seeing what the cost
profile looks like with a nonunique index in place.

I'm not quite sure *why* it's happening though.  7.3 is supposed to have
code in it to forestall indefinite growth of the number of heap visits
that have to be made.  Hmm ... were you doing the repeated passes all in
a single transaction block, or were you allowing the previous updates to
commit before you tried again?

            regards, tom lane

Re: Optimizer failure on update w/integer column

From
nolan@celery.tssi.com
Date:
> This is a unique index, right?  Seems like the cost must be related to
> checking for uniqueness violations --- the index code has to plow
> through all the index entries with the same key, visit their associated
> heap tuples, confirm those tuples are dead (or being deleted by the
> current transaction).  You could check this by seeing what the cost
> profile looks like with a nonunique index in place.

No, when I rebuilt the index it was NOT as a unique index.

> I'm not quite sure *why* it's happening though.  7.3 is supposed to have
> code in it to forestall indefinite growth of the number of heap visits
> that have to be made.  Hmm ... were you doing the repeated passes all in
> a single transaction block, or were you allowing the previous updates to
> commit before you tried again?

I was not in a transaction block.

I tried it again, this time exiting psql between runs.

72 seconds the first time, 351 seconds the 2nd time, 420 the 3rd time,
351 the 4th time.

Can I do anything further to help track this down?
--
Mike Nolan



Re: Optimizer failure on update w/integer column

From
Tom Lane
Date:
nolan@celery.tssi.com writes:
> No, when I rebuilt the index it was NOT as a unique index.

Hmm, so much for that theory.

> 72 seconds the first time, 351 seconds the 2nd time, 420 the 3rd time,
> 351 the 4th time.

What exactly are you defining as "the first time" --- the first time
after creating a fresh index?  What percentage of table tuples actually
get updated in each command?  I'm wondering if maybe it's just a matter
of the first time not incurring very many btree page splits while the
later runs incur lots.  But that theory seems weak as well.

> Can I do anything further to help track this down?

Perhaps rebuild the backend with profiling enabled and get a runtime
profile in both the faster and slower cases?

            regards, tom lane

Re: Optimizer failure on update w/integer column

From
nolan@celery.tssi.com
Date:
> What exactly are you defining as "the first time" --- the first time
> after creating a fresh index?  What percentage of table tuples actually
> get updated in each command?  I'm wondering if maybe it's just a matter
> of the first time not incurring very many btree page splits while the
> later runs incur lots.  But that theory seems weak as well.

I created the index, ran the update, ran it again, etc.  No other changes
are being made to the data in between consecutive update runs, but every
row is being updated on each pass.

I'm thinking this is related to how the index tracks updated rows
for concurrency, because even though disk space for the table grows
with each update run, the execution times remain similar without an
index on the table.

The performance tips in the docs seem to indicate that performance problems
can occur when indexed columns are updated, but in this case I'm not
updating an indexed column.

> > Can I do anything further to help track this down?
>
> Perhaps rebuild the backend with profiling enabled and get a runtime
> profile in both the faster and slower cases?

I'll see if I can transfer the data over to my prototyping system and
if it exhibits the same behavior then I can try profiling.  I'll try
playing around with it later this week, and also will see if doing a
vacuum in betwen runs has any impact.
--
Mike Nolan