Thread: Slow updates, poor IO

Slow updates, poor IO

From
John Huttley
Date:
I've just had an interesting encounter with the slow full table update
problem that is inherent with MVCC

The system is 64 bit linux with 2.6.25 kernel feeding scsi disks.

the table is

CREATE TABLE file (
    fileid integer NOT NULL,
    fileindex integer DEFAULT 0 NOT NULL,
    jobid integer NOT NULL,
    pathid integer NOT NULL,
    filenameid integer NOT NULL,
    markid integer DEFAULT 0 NOT NULL,
    lstat text NOT NULL,
    md5 text NOT NULL,
    perms text
);

ALTER TABLE ONLY file
    ADD CONSTRAINT file_pkey PRIMARY KEY (fileid);

CREATE INDEX file_fp_idx ON file USING btree (filenameid, pathid);
CREATE INDEX file_jobid_idx ON file USING btree (jobid);

There are 2.7M rows.

running    update file set perms='0664' took about 10 mins

during this period, vmstat reported Blocks Out holding in the 4000 to
6000 range.


When I dropped the indexes this query ran in 48sec.
Blocks out peaking at 55000.

So there is a double whammy.
MVCC requires more work to be done when indexes are defined and then
this work
results in much lower IO, compounding the problem.


Comments anyone?


--john



Re: Slow updates, poor IO

From
"Scott Marlowe"
Date:
On Thu, Sep 25, 2008 at 1:24 PM, John Huttley <John@mib-infotech.co.nz> wrote:
> I've just had an interesting encounter with the slow full table update
> problem that is inherent with MVCC
>
> The system is 64 bit linux with 2.6.25 kernel feeding scsi disks.
>
> the table is
>
> CREATE TABLE file (
>   fileid integer NOT NULL,
>   fileindex integer DEFAULT 0 NOT NULL,
>   jobid integer NOT NULL,
>   pathid integer NOT NULL,
>   filenameid integer NOT NULL,
>   markid integer DEFAULT 0 NOT NULL,
>   lstat text NOT NULL,
>   md5 text NOT NULL,
>   perms text
> );
>
> ALTER TABLE ONLY file
>   ADD CONSTRAINT file_pkey PRIMARY KEY (fileid);
>
> CREATE INDEX file_fp_idx ON file USING btree (filenameid, pathid);
> CREATE INDEX file_jobid_idx ON file USING btree (jobid);
>
> There are 2.7M rows.
>
> running    update file set perms='0664' took about 10 mins

So, how many rows would already be set to 0664?  Would adding a where
clause speed it up?

update file set perms='0664' where perms <> '0664';

> during this period, vmstat reported Blocks Out holding in the 4000 to 6000
> range.
>
>
> When I dropped the indexes this query ran in 48sec.
> Blocks out peaking at 55000.
>
> So there is a double whammy.
> MVCC requires more work to be done when indexes are defined and then this
> work
> results in much lower IO, compounding the problem.

That's because it becomes more random and less sequential.  If you had
a large enough drive array you could get that kind of performance for
updating indexes, since the accesses would tend to hit different
drives most the time.

Under heavy load on the production servers at work we can see 30 to 60
Megs a second random access with 12 drives, meaning 2.5 to 5Megs per
second per drive.  Sequential throughput is about 5 to 10 times
higher.

What you're seeing are likely the effects of running a db on
insufficient drive hardware.

Re: Slow updates, poor IO

From
Alan Hodgson
Date:
On Thursday 25 September 2008, John Huttley <John@mib-infotech.co.nz> wrote:
>
> Comments anyone?

Don't do full table updates? This is not exactly a news flash.


--
Alan

Re: Slow updates, poor IO

From
Greg Smith
Date:
On Fri, 26 Sep 2008, John Huttley wrote:

> running    update file set perms='0664' took about 10 mins

What do you have checkpoint_segments and shared_buffers set to?  If you
want something that's doing lots of updates to perform well, you need to
let PostgreSQL have a decent size chunk of memory to buffer the index
writes with, so it's more likely they'll get combined into larger and
therefore more easily sorted blocks rather than as more random ones.  The
randomness of the writes is why your write rate is so slow.  You also need
to cut down on the frequency of checkpoints which are very costly on this
type of statement.

Also:  which version of PostgreSQL?  8.3 includes an improvement aimed at
updates like this you might benefit from.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Slow updates, poor IO

From
Andrew Sullivan
Date:
Hi,

On Fri, Sep 26, 2008 at 07:24:55AM +1200, John Huttley wrote:
> I've just had an interesting encounter with the slow full table update
> problem that is inherent with MVCC

Quite apart from the other excellent observations in this thread, what
makes you think this is an MVCC issue exactly?

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Slow updates, poor IO

From
John Huttley
Date:
Hi Andrew,
There are two problems.
The first is the that if there is a table with a index and an update is performed on a non indexed field,
the index is still re indexed. this is part of the trade-offs of MVCC.
Apparently this is documented under 'MVCC' in the manual. It should be documented under 'performance'

We should reasonably expect that the total amount of IO will go up, over a non-indexed table.

The second thing is that the disk IO throughput goes way down.

This is not an issue with MVCC, as such, except that it exposes the effect of a write to an indexed field.
--even if you don't expect it.

--john

Andrew Sullivan wrote:
Hi,

On Fri, Sep 26, 2008 at 07:24:55AM +1200, John Huttley wrote: 
I've just had an interesting encounter with the slow full table update 
problem that is inherent with MVCC   
Quite apart from the other excellent observations in this thread, what
makes you think this is an MVCC issue exactly?

A
 

Re: Slow updates, poor IO

From
John Huttley
Date:
Hi Greg,

I've got 32M shared on a 1G machine and 16 checkpoint segments.
I'll run some tests against 64 segments and see what happens.

Your previous postings were extremely helpful wrt the MVCC issue.
I thank you!

-john


Greg Smith wrote:
> On Fri, 26 Sep 2008, John Huttley wrote:
>
>> running    update file set perms='0664' took about 10 mins
>
> What do you have checkpoint_segments and shared_buffers set to?  If
> you want something that's doing lots of updates to perform well, you
> need to let PostgreSQL have a decent size chunk of memory to buffer
> the index writes with, so it's more likely they'll get combined into
> larger and therefore more easily sorted blocks rather than as more
> random ones.  The randomness of the writes is why your write rate is
> so slow.  You also need to cut down on the frequency of checkpoints
> which are very costly on this type of statement.
>
> Also:  which version of PostgreSQL?  8.3 includes an improvement aimed
> at updates like this you might benefit from.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>

Re: Slow updates, poor IO

From
Greg Smith
Date:
On Sat, 27 Sep 2008, John Huttley wrote:

> I've got 32M shared on a 1G machine and 16 checkpoint segments.
> I'll run some tests against 64 segments and see what happens.

Increase shared_buffers to 256MB as well.  That combination should give
you much better performance with the type of update you're doing.  Right
now the database server has to write the index blocks updated to disk all
the time because it has so little working room to store them in.  If an
index block is updated but there is room to keep it memory, it doesn't
have to get written out, which considerably lowers the overhead here.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Slow updates, poor IO

From
"Scott Marlowe"
Date:
On Fri, Sep 26, 2008 at 5:03 PM, John Huttley <John@mib-infotech.co.nz> wrote:
> Hi Andrew,
> There are two problems.
> The first is the that if there is a table with a index and an update is
> performed on a non indexed field,
> the index is still re indexed.

I assume you mean updated, not reindexed, as reindexed has a different
meaning as regards postgresql.  Also, this is no longer true as of
version 8.3.  If you're updating non-indexed fields a lot and you're
not running 8.3 you are doing yourself a huge disservice.

>this is part of the trade-offs of MVCC.

was...  was a part of the trade-offs.

> We should reasonably expect that the total amount of IO will go up, over a
> non-indexed table.
>
> The second thing is that the disk IO throughput goes way down.
>
> This is not an issue with MVCC, as such, except that it exposes the effect
> of a write to an indexed field.

It's really an effect of parallel updates / writes / accesses, and is
always an issue for a database running on a poor storage subsystem.  A
db with a two drive mirror set is always going to be at a disadvantage
to one running on a dozen or so drives in a RAID-10

Re: Slow updates, poor IO

From
"Scott Marlowe"
Date:
On Sat, Sep 27, 2008 at 4:33 PM, John Huttley <John@mib-infotech.co.nz> wrote:
>
> > > this is part of the trade-offs of MVCC.
>
> > was...  was a part of the trade-offs.
>
> You are thinking of HOT?
> I don't think it applies in the case of full table updates??

Sure, you just need a table with plenty of empty space in it, either
from vacuumed previous deletes / inserts or with a low fill factor
like 50%.

> It's really an effect of parallel updates / writes / accesses, and is
> always an issue for a database running on a poor storage subsystem.  A
> db with a two drive mirror set is always going to be at a disadvantage
> to one running on a dozen or so drives in a RAID-10
>
> Oh well, I'm forever going to be disadvantaged.

Why?  A decent caching raid controller and a set of 4 to 8 SATA drives
can make a world of difference and the cost is not that high for the
gain in performance.  Even going to 4 drives in a software RAID-10 can
make a lot of difference in these situations, and that can be done
with spare machines and hard drives.

Re: Slow updates, poor IO

From
John Huttley
Date:


Scott Marlowe wrote:
On Fri, Sep 26, 2008 at 5:03 PM, John Huttley <John@mib-infotech.co.nz> wrote: 
Hi Andrew,
There are two problems.
The first is the that if there is a table with a index and an update is
performed on a non indexed field,
the index is still re indexed.   
I assume you mean updated, not reindexed, as reindexed has a different
meaning as regards postgresql.  Also, this is no longer true as of
version 8.3.  If you're updating non-indexed fields a lot and you're
not running 8.3 you are doing yourself a huge disservice.
 

Yes sorry, I mean all indexes are updated even when the updated field is not indexed.
I'm running 8.3.3
this is part of the trade-offs of MVCC.   
was...  was a part of the trade-offs.
 
You are thinking of HOT?
I don't think it applies in the case of full table updates??

We should reasonably expect that the total amount of IO will go up, over a
non-indexed table.

The second thing is that the disk IO throughput goes way down.

This is not an issue with MVCC, as such, except that it exposes the effect
of a write to an indexed field.   
It's really an effect of parallel updates / writes / accesses, and is
always an issue for a database running on a poor storage subsystem.  A
db with a two drive mirror set is always going to be at a disadvantage
to one running on a dozen or so drives in a RAID-10
 
Oh well, I'm forever going to be disadvantaged.


Re: Slow updates, poor IO

From
Tom Lane
Date:
John Huttley <John@mib-infotech.co.nz> writes:
> Scott Marlowe wrote:
>> was...  was a part of the trade-offs.

> You are thinking of HOT?
> I don't think it applies in the case of full table updates??

Sure, as long as there's enough free space on each page.

If you wanted to make a table that was optimized for this kind of thing,
you could try creating it with fillfactor 50.

            regards, tom lane

Re: Slow updates, poor IO

From
"Scott Carey"
Date:
I have had great success using FILLFACTOR on certain tables where big updates like this occur and improving performance.  It is still not as fast as I would like, but there are significant gains.  A big disk array won't help you as much as it should -- yes it will be faster, but it will still be chugging during one of these sorts of large updates and very inefficiently at that.

On some of my cases, a FILLFACTOR of 95 or 98 is enough to do the trick.  On others, 80 or 70 works.
It depends on the size of your rows versus the size of the modifications you make.  A fillfactor of 99 holds between ~80 bytes and one row-width worth of free space in every page, and is all that is needed if you have larger rows and only modify small fields such as ints.  I'm not sure why FILLFACTOR = 99 isn't the default, to be honest.  The size difference on disk is far less than 1% since most tables can't fit an exact number of rows in one page, and the benefit for updates is huge in certain cases.
On the other hand, your table has a narrow row width and will fit many rows on one page, and if you are modifying text or varchars, you may need more space for those reserved in the fillfactor void and a smaller FILLFACTOR setting on the table, down to about 50 for updates where the updated rows account for a big fraction of the row width.

A second benefit of using a fillfactor is that you can CLUSTER on an index and the table will retain that ordering for longer while inserts/updates/deletes occur.  A fillfactor setting, REINDEX, then CLUSTER sequence can have a big impact.


On Sun, Sep 28, 2008 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
John Huttley <John@mib-infotech.co.nz> writes:
> Scott Marlowe wrote:
>> was...  was a part of the trade-offs.

> You are thinking of HOT?
> I don't think it applies in the case of full table updates??


Re: Slow updates, poor IO

From
John Huttley
Date:
Ahh! I've not dealt with that before. I'll look it up.
Thanks Tom.


Tom Lane wrote:
John Huttley <John@mib-infotech.co.nz> writes: 

You are thinking of HOT?
I don't think it applies in the case of full table updates??   
Sure, as long as there's enough free space on each page.

If you wanted to make a table that was optimized for this kind of thing,
you could try creating it with fillfactor 50.
		regards, tom lane
 

Re: Slow updates, poor IO

From
John Huttley
Date:
Thanks to everyone that responded.
I've done some benchmarking

checkpoint _segments=16 is fine, going to 64 made no improvement.
Using "update file set size=99" as a statement, but changing 99 on each
run..

With 32M shared memory, time in sec and leaving the system idle long
enough between runs for auto vacuum to complete.

415
421
470

The I decided to drop the Db and restore from a dump

1150
1500
1018
1071
1077
1140

Then I tried shared_mem=256M as suggested.

593
544

So thats made a big difference. vmstat showed a higher, more consistent,
IO level

I wondered why it slowed down after a restore. I thought it would
improve, less fragmentation
and all that. So I tried a reindex on all three indexes.

209
228

So thats it! lots of ram and reindex as part of standard operation.

Interestingly, the reindexing took about 16s each. The update on the
table with no indexes took about 48sec
So the aggregate time for each step would be about 230s. I take that as
being an indicator that it is
now maximally efficient.


The option of having more spindles for improved IO request processing
isn't feasible in most cases.
With the requirement for redundancy, we end with a lot of them, needing
an external enclosure.
They would have to be expensive SCSI/SAS/FC drives too,  since SATA just
don't have the IO processing.

It will be interesting to see what happens when good performing SSD's
appear.

Meanwhile RAM is cheaper than that drive array!

It would be nice if thing like
* The effect of updates on indexed tables
* Fill Factor
* reindex  after restore

Were mentioned in the 'performance' section of the manual, since that's
the part someone will go
to when looking for a solution.


Again, thanks to everyone,

--John


Re: Slow updates, poor IO

From
Greg Smith
Date:
On Mon, 29 Sep 2008, John Huttley wrote:

> checkpoint _segments=16 is fine, going to 64 made no improvement.

You might find that it does *after* increasing shared_buffers.  If the
buffer cache is really small, the checkpoints can't have very much work to
do, so their impact on performance is smaller.  Once you've got a couple
of hundred MB on there, the per-checkpoint overhead can be considerable.

> It would be nice if thing like
> * The effect of updates on indexed tables
> * Fill Factor
> * reindex  after restore
> Were mentioned in the 'performance' section of the manual, since that's
> the part someone will go to when looking for a solution.

If you have to reindex after restore to get good performance, that means
what you should do instead is drop the indexes on the table during the
restore and then create them once the data is there.  The REINDEX is more
aimed at when the system has been running for a while and getting
fragmented.

Unfortunately most of the people who know enough about those topics to
really do a good treatment of them are too busy fixing slow systems to
have time to write about it.  There are many articles on this general
topic trickling out at
http://wiki.postgresql.org/wiki/Performance_Optimization you might find
valuable in addition to the manual.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Slow updates, poor IO

From
John Huttley
Date:

Greg Smith wrote:
> On Mon, 29 Sep 2008, John Huttley wrote:
>
>> checkpoint _segments=16 is fine, going to 64 made no improvement.
>
> You might find that it does *after* increasing shared_buffers.  If the
> buffer cache is really small, the checkpoints can't have very much
> work to do, so their impact on performance is smaller.  Once you've
> got a couple of hundred MB on there, the per-checkpoint overhead can
> be considerable.
>
Ahh bugger, I've just trashed my test setup.
I've settled on  64Mb shared memory since I've only got 1Gb or RAM and
the system impact of 256M is severe.
Also it uses FB-DIMMS which cost arm+leg+first born


>> It would be nice if thing like
>> * The effect of updates on indexed tables
>> * Fill Factor
>> * reindex  after restore
>> Were mentioned in the 'performance' section of the manual, since
>> that's the part someone will go to when looking for a solution.
>
> If you have to reindex after restore to get good performance, that
> means what you should do instead is drop the indexes on the table
> during the restore and then create them once the data is there.  The
> REINDEX is more aimed at when the system has been running for a while
> and getting fragmented.

I thought that the pg_dump generated files did that, so I dismissed it
initially. Maybe I did a data only restore into an existing schema..
>
> Unfortunately most of the people who know enough about those topics to
> really do a good treatment of them are too busy fixing slow systems to
> have time to write about it.  There are many articles on this general
> topic trickling out at
> http://wiki.postgresql.org/wiki/Performance_Optimization you might
> find valuable in addition to the manual.
>
An of course this is now in mail archive!


> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>
>

Re: Slow updates, poor IO

From
"Scott Marlowe"
Date:
On Sun, Sep 28, 2008 at 8:01 PM, John Huttley <John@mib-infotech.co.nz> wrote:
> Ahh bugger, I've just trashed my test setup.
> I've settled on  64Mb shared memory since I've only got 1Gb or RAM and the
> system impact of 256M is severe.
> Also it uses FB-DIMMS which cost arm+leg+first born

http://www.crucial.com/search/searchresults.aspx?keywords=buffered

Fully buffered memory there is $56.99 for a 1 Gig stick.  That's
hardly an arm and a leg.  Considering many pgsql DBAs make that in 1
to 3 hours, it's not much at all really.  A lot cheaper than pulling
your hair out trying to make a db server run on 1 Gig.

Re: Slow updates, poor IO

From
John Huttley
Date:
Ah yess... actually I can get the Kingston stuff locally.
However at the moment I'm happily married and want to keep it that way!

Everything is in pairs too. Actually its a lot cheaper than when it first came out, but still
a lot more than your corner shop DDR-2 stuff.

--John




Scott Marlowe wrote:
On Sun, Sep 28, 2008 at 8:01 PM, John Huttley <John@mib-infotech.co.nz> wrote: 
Ahh bugger, I've just trashed my test setup.
I've settled on  64Mb shared memory since I've only got 1Gb or RAM and the
system impact of 256M is severe.
Also it uses FB-DIMMS which cost arm+leg+first born   
http://www.crucial.com/search/searchresults.aspx?keywords=buffered

Fully buffered memory there is $56.99 for a 1 Gig stick.  That's
hardly an arm and a leg.  Considering many pgsql DBAs make that in 1
to 3 hours, it's not much at all really.  A lot cheaper than pulling
your hair out trying to make a db server run on 1 Gig.

 

Re: Slow updates, poor IO

From
Dan Langille
Date:
On Sep 28, 2008, at 10:01 PM, John Huttley wrote:

>
>
> Greg Smith wrote:
>> On Mon, 29 Sep 2008, John Huttley wrote:
>>
>>> checkpoint _segments=16 is fine, going to 64 made no improvement.
>>
>> You might find that it does *after* increasing shared_buffers.  If
>> the buffer cache is really small, the checkpoints can't have very
>> much work to do, so their impact on performance is smaller.  Once
>> you've got a couple of hundred MB on there, the per-checkpoint
>> overhead can be considerable.
>>
> Ahh bugger, I've just trashed my test setup.

Pardon?  How did you do that?

--
Dan Langille
http://langille.org/





Re: Slow updates, poor IO

From
John Huttley
Date:
I've canned the db and got rid my of data.
I'm in the midst of doing some other benchmarking for a possible change
to the bacula database.

Loading up 1M records into a table of 60M records complete with indexes.
It's still going...

--john


Dan Langille wrote:
>
> On Sep 28, 2008, at 10:01 PM, John Huttley wrote:
>
>>
>>
>> Greg Smith wrote:
>>> On Mon, 29 Sep 2008, John Huttley wrote:
>>>
>>>> checkpoint _segments=16 is fine, going to 64 made no improvement.
>>>
>>> You might find that it does *after* increasing shared_buffers.  If
>>> the buffer cache is really small, the checkpoints can't have very
>>> much work to do, so their impact on performance is smaller.  Once
>>> you've got a couple of hundred MB on there, the per-checkpoint
>>> overhead can be considerable.
>>>
>> Ahh bugger, I've just trashed my test setup.
>
> Pardon?  How did you do that?
>

Re: Slow updates, poor IO

From
"Scott Marlowe"
Date:
On Sun, Sep 28, 2008 at 9:08 PM, John Huttley <John@mib-infotech.co.nz> wrote:
> Ah yess... actually I can get the Kingston stuff locally.
> However at the moment I'm happily married and want to keep it that way!
>
> Everything is in pairs too. Actually its a lot cheaper than when it first
> came out, but still
> a lot more than your corner shop DDR-2 stuff.

I don't mean to keep arguing here, but it's not any more expensive
than the same speed DDR-2 667MHz memory.  for ECC memory memory,
they're almost the same price.

http://www.crucial.com/store/listparts.aspx?model=PowerEdge%201950

That's memory for my Dell PowerEdge web server, and it's $105.99 for 2
1Gig sticks.  $56.99 * 2 = $113.98.  It's only 7.99 more.  I get the
point about not wanting to anger the wife, but maybe if you ask for it
nice for Christmas?  :)