Re: Updates are slow.. - Mailing list pgsql-general

From Tom Burke
Subject Re: Updates are slow..
Date
Msg-id NDBBIIDKBFNMNPDNLCDDKEALDCAA.lists@spamex.com
Whole thread Raw
In response to Re: Updates are slow..  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Updates are slow..
Re: Updates are slow..
List pgsql-general
I think I spoke too soon, and it appears the problem is not solved.
An index on the email column slows down an update on the dept column,
for some unkown reason.

> Do you have enough of a log of your tests
> so far to be able to recreate the pre-reindex situation?

Here are the stats on emp from before the reindexing:
2002-06-11 09:03:01 DEBUG:  Index emp_ix01: Pages 18568; Tuples 1230703: Deleted
597153.
        CPU 2.75s/5.07u sec elapsed 77.46 sec.
2002-06-11 09:06:48 DEBUG:  Index emp_ix02: Pages 27572; Tuples 1230703: Deleted
597153.
        CPU 5.73s/10.16u sec elapsed 227.15 sec.
2002-06-11 09:06:59 DEBUG:  Removed 597153 tuples in 8227 pages.
        CPU 1.42s/1.51u sec elapsed 10.84 sec.
2002-06-11 09:06:59 DEBUG:  Pages 24552: Changed 16249, Empty 0; Tup 1230703:
Vac 597153, Keep 0, UnUsed 4084.
        Total CPU 14.64s/23.74u sec elapsed 399.48 sec.

After doing the Reindex table emp, the stats were:
2002-06-11 15:44:37 NOTICE:  --Relation emp--
2002-06-11 15:45:15 NOTICE:  Index emp_pk: Pages 4018; Tuples 1230703: Deleted
597153.
        CPU 0.87s/9.20u sec elapsed 15.27 sec.
2002-06-11 15:45:33 NOTICE:  Index emp_ix01: Pages 5445; Tuples 1230703: Deleted
597153.
        CPU 0.96s/6.62u sec elapsed 17.83 sec.
2002-06-11 15:46:51 NOTICE:  Index emp_ix02: Pages 13972; Tuples 1230703:
Deleted 597153.
        CPU 2.73s/8.95u sec elapsed 78.56 sec.
2002-06-11 15:47:13 NOTICE:  Removed 1194306 tuples in 16048 pages.
        CPU 3.28s/2.91u sec elapsed 21.83 sec.
2002-06-11 15:47:13 NOTICE:  Pages 32576: Changed 16469, Empty 0; Tup
1230703: Vac 1194306, Keep 0, UnUsed 5501.
        Total CPU 11.32s/28.43u sec elapsed 155.81 sec.

> It would be interesting to try making a third index on just email and
> see where the update time goes to.

Very interesting indeed.

eppend=# create index emp_ix03 on emp(email);

After creating this index, the update time jumped up again to 21 mins!
real    21m3.536s
user    0m0.010s
sys    0m0.000s

This makes no sense to me because the new index is not related to
the column being updated at all.

After this update, I did a vacuum :
eppend=# vacuum verbose analyze clientdata;
NOTICE:  --Relation emp--
NOTICE:  Index emp_pk: Pages 4012; Tuples 1230703: Deleted 597153.
    CPU 0.82s/9.40u sec elapsed 20.02 sec.
NOTICE:  Index emp_ix01: Pages 5463; Tuples 1230703: Deleted 597153.
    CPU 0.93s/6.77u sec elapsed 19.54 sec.
NOTICE:  Index emp_ix03: Pages 12602; Tuples 1230703: Deleted 597153.
    CPU 2.73s/8.51u sec elapsed 63.37 sec.
NOTICE:  Removed 1194306 tuples in 17145 pages.
    CPU 3.44s/2.98u sec elapsed 46.65 sec.
NOTICE:  Pages 40578: Changed 17679, Empty 0; Tup 1230703: Vac 1194306, Keep 0,
UnUsed 606138.
    Total CPU 12.22s/28.50u sec elapsed 177.13 sec.
NOTICE:  Analyzing emp
VACUUM

I reran the update, and it still took 18 minutes.

> Also, could I trouble you for some
> stats on the email column?  Min, max, and average length of the entries
> would be useful to know.

eppend=# select min(length(email)), max(length(email)), avg(length(email))
eppend-# from emp;
 min | max |      avg
-----+-----+---------------
   1 |  50 | 19.6686072919

> And I assume it's unique, or nearly so?

Actually, it is not unique.
This is not really an emp table, and in fact ~600K of the email
addresses are duplicates.

eppend=# select fk_dept_id, count(*) from emp group by fk_dept_id;
 fk_dept_id | count
------------+--------
         2  | 597152
         3  |  36398
         4  | 597153

That is, there are 3 departments right now, and departments 2 and 4
have the exact same row count and have exactly the same email addresses.
Could that be a factor?

I truly have no idea why Postgres would behave this way.
Help!

Thanks,

Tom Burke
Eppend, Inc.
http://www.eppend.com/


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: automatic time zone conversion
Next
From: Håkon Clausen
Date:
Subject: how to get current database name in plpgsql function