Re: Slow updates, poor IO - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Slow updates, poor IO
Date
Msg-id dcc563d10809251235h1fc04552v9b2a0ce7844f1e6f@mail.gmail.com
Whole thread Raw
In response to Slow updates, poor IO  (John Huttley <John@mib-infotech.co.nz>)
List pgsql-performance
On Thu, Sep 25, 2008 at 1:24 PM, John Huttley <John@mib-infotech.co.nz> wrote:
> I've just had an interesting encounter with the slow full table update
> problem that is inherent with MVCC
>
> The system is 64 bit linux with 2.6.25 kernel feeding scsi disks.
>
> the table is
>
> CREATE TABLE file (
>   fileid integer NOT NULL,
>   fileindex integer DEFAULT 0 NOT NULL,
>   jobid integer NOT NULL,
>   pathid integer NOT NULL,
>   filenameid integer NOT NULL,
>   markid integer DEFAULT 0 NOT NULL,
>   lstat text NOT NULL,
>   md5 text NOT NULL,
>   perms text
> );
>
> ALTER TABLE ONLY file
>   ADD CONSTRAINT file_pkey PRIMARY KEY (fileid);
>
> CREATE INDEX file_fp_idx ON file USING btree (filenameid, pathid);
> CREATE INDEX file_jobid_idx ON file USING btree (jobid);
>
> There are 2.7M rows.
>
> running    update file set perms='0664' took about 10 mins

So, how many rows would already be set to 0664?  Would adding a where
clause speed it up?

update file set perms='0664' where perms <> '0664';

> during this period, vmstat reported Blocks Out holding in the 4000 to 6000
> range.
>
>
> When I dropped the indexes this query ran in 48sec.
> Blocks out peaking at 55000.
>
> So there is a double whammy.
> MVCC requires more work to be done when indexes are defined and then this
> work
> results in much lower IO, compounding the problem.

That's because it becomes more random and less sequential.  If you had
a large enough drive array you could get that kind of performance for
updating indexes, since the accesses would tend to hit different
drives most the time.

Under heavy load on the production servers at work we can see 30 to 60
Megs a second random access with 12 drives, meaning 2.5 to 5Megs per
second per drive.  Sequential throughput is about 5 to 10 times
higher.

What you're seeing are likely the effects of running a db on
insufficient drive hardware.

pgsql-performance by date:

Previous
From: John Huttley
Date:
Subject: Slow updates, poor IO
Next
From: Alan Hodgson
Date:
Subject: Re: Slow updates, poor IO