Re: UPDATE 66k rows too slow - Mailing list pgsql-performance

From Greg Smith
Subject Re: UPDATE 66k rows too slow
Date
Msg-id Pine.GSO.4.64.0803102350540.18872@westnet.com
Whole thread Raw
In response to Re: UPDATE 66k rows too slow  (Miguel Arroz <arroz@guiamac.com>)
List pgsql-performance
On Mon, 10 Mar 2008, Miguel Arroz wrote:

> I deleted the DB, created a new one and generated new test data. I know have
> 72k rows, and the same query finishes in... 9 seconds.

This seems like more evidence that your problem here is related to dead
rows (this is what Andrew suggested).  If a fresh copy of the database
runs fast but it quickly degrades as you run additional tests that do many
updates on it, that's a popular suspect.

Are you familiar with dead rows?  When you update something, the original
copy doesn't go away; it stays behind until VACUUM gets to cleaning it up.
If you update the same rows, say, 10 times you'll have 9 dead copies of
every row in the way of doing reports on the ones still alive.

Let's go back to your original post a second:

Seq Scan on text_answer_mapping_ebt  (cost=0.00..13945.72 rows=265072
width=92) (actual time=21.123..1049.054 rows=66268 loops=1)

That shows the database estimating there are exactly 4 times your 66268
rows there (4X66268=265072).  That sounds like one active copy of your
data and 3 dead ones left behind from earlier tests.  In that case, it
would take much longer to do that full scan than when the database was
fresh.

> 1) My working_mem is 2 MB. Does an UPDATE query like main depend on
> working_mem?

Nope.  That's used for sorting and that sort of thing.

> Well, the DB itself knows what is doing. Isn't there any way to make it
> tell us that?

Well, the database server itself has a lot of operating system and
hardware components it relies on, and it has no idea how any of those are
working.  So it's unreasonable to expect in every case the database has a
clue what's going on.

In your case, I'm suspecting more strongly the report that will say
something interesting here is the 4th item on the list I sent before,
looking at VACUUM VERBOSE ANALYZE output for a problem.

Here's the educational exercise I'd suggest that might help you track down
what's going on here:

1) Recreate a fresh copy of the database.  Run VACUUM VERBOSE ANALYZE and
save a copy of the output so you know what that looks like with no dead
rows.
2) Run your query with EXPLAIN ANALYZE and save that too.  Should be fast.
3) Do whatever testing it is you do that seems to result in the system
running much slower
4) Save the EXPLAIN ANALYZE output when you're reached slowness
5) Run a VACUUM VERBOSE ANALYZE, save that for comparision to the earlier
6) Run the EXPLAIN ANALYZE again to see if (5) did anything useful.
one
7) Run VACUUM FULL VERBOSE and save that output
8) Run the EXPLAIN ANALYZE again to see if (7) did anything useful.

Comparing the VACUUM reports and the EXPLAIN plans to see what changes
along the way should give you some good insight into what's happening
here.  That is what you're asking for--asking the database to tell you
what it's doing--but actually understanding that report takes a certain
amount of study.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: count * performance issue
Next
From: "Pavan Deolasee"
Date:
Subject: Re: Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit