Re: a heavy duty operation on an "unused" table kills my server - Mailing list pgsql-performance

From Robert Haas
Subject Re: a heavy duty operation on an "unused" table kills my server
Date
Msg-id 603c8f071001130823l69804fe3m5c6fbb081c55c265@mail.gmail.com
Whole thread Raw
In response to Re: a heavy duty operation on an "unused" table kills my server  (Eduardo Piombino <drakorg@gmail.com>)
Responses Re: a heavy duty operation on an "unused" table kills my server
Re: a heavy duty operation on an "unused" table kills my server
List pgsql-performance
On Wed, Jan 13, 2010 at 2:03 AM, Eduardo Piombino <drakorg@gmail.com> wrote:
> Excellent, lots of useful information in your message.
> I will follow your advices, and keep you posted on any progress. I have yet
> to confirm you with some technical details of my setup, but I'm pretty sure
> you hit the nail in any case.
>
> One last question, this IO issue I'm facing, do you think it is just a
> matter of RAID configuration speed, or a matter of queue gluttony (and not
> leaving time for other processes to get into the IO queue in a reasonable
> time)?
>
> Because if it was just a matter of speed, ok, with my actual RAID
> configuration lets say it takes 10 minutes to process the ALTER TABLE
> (leaving no space to other IOs until the ALTER TABLE is done), lets say then
> i put the fastest possible RAID setup, or even remove RAID for the sake of
> speed, and it completes in lets say again, 10 seconds (an unreal
> assumption). But if my table now grows 60 times, I would be facing the very
> same problem again, even with the best RAID configuration.
>
> The problem would seem to be in the way the OS (or hardware, or someone
> else, or all of them) is/are inserting the IO requests into the queue.
> What can I do to control the order in which these IO requests are finally
> entered into the queue?
> I mean .. what i would like to obtain is:
>
> Considering the ALTER TABLE as a sequence of 100.000 READ/WRITE OPERATIONS
> Considering the SELECT * FROM xxx as a sequence of 100 READ OPERATIONS
> (totally unrelated in disk)
>
> First i run the ALTER TABLE on a thread...
> Lets say by the time it generates 1.000 READ/WRITE OPERATIONS, the other
> thread starts with the SELECT * FROM xxx ...
> I would expect the IO system to give chance to the those 100 READ OPERATIONS
> to execute immediately (with no need to wait for the remaining 990.000
> READ/WRITE OPERATIONS finish), that is, to enter the queue at *almost* the
> very same moment the IO request were issued.
>
> If I can not guarantee that, I'm kinda doomed, because the largest the
> amount of IO operations requested by a "heavy duty operation", the longest
> it will take any other thread to start doing anything.

One thing you can do - although it's a darn lot of work compared to
just running a DDL command - is create a new empty table with the
schema you want and then write a script that copies, say, 1000 records
from the old table to the new table.  If your table has a primary key
with a natural sort ordering, it's not too hard to keep track of where
you left off the last time and continue on from there.  Then you can
incrementally get all of your data over without swamping the system.
I realize that's a pain in the neck, of course.

I'm kind of surprised that there are disk I/O subsystems that are so
bad that a single thread doing non-stop I/O can take down the whole
server.  Is that normal?  Does it happen on non-Windows operating
systems?  What kind of hardware should I not buy to make sure this
doesn't happen to me?

...Robert

pgsql-performance by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: [PERFORMANCE] work_mem vs temp files issue
Next
From: Robert Haas
Date:
Subject: Re: [PERFORMANCE] work_mem vs temp files issue