Re: COPY v. java performance comparison - Mailing list pgsql-general

From Rob Sargent
Subject Re: COPY v. java performance comparison
Date
Msg-id 533DB744.8000806@gmail.com
Whole thread Raw
In response to Re: COPY v. java performance comparison  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
On 04/03/2014 01:28 PM, Jeff Janes wrote:
On Thu, Apr 3, 2014 at 9:04 AM, Rob Sargent <robjsargent@gmail.com> wrote:

 
I have to straighten out my environment, which I admit I was hoping to avoid. I reset checkpoint_segments to 12 and restarted my server.
I kicked of the COPY at 19:00. That generated a couple of the "too frequent" statements but 52 "WARNING:  pgstat wait timeout" lines during the next 8 hours
starting at 00:37 (5 hours in) 'til finally keeling over at 03:04 on line 37363768. 

Those things are not necessarily problems.  If there is a problem, those tell you places to look, nothing more.  In particular, "pgstat wait timeout" just means "Someone is beating the snot out of your hard drives, and the stat collector just happened to notice that fact".  This is uninformative, because you already know you are beating the snot out of your hard drives.  That, after all, is the point of the exercise, right?  If you saw this message when you weren't doing anything particularly strenuous, then that would be interesting.

 
That's the last line of the input so obviously I didn't flush my last println properly. I'm beyond getting embarrassed at this point.

Is turning auto-vacuum off a reasonable way through this?

No, no, no, no!  First of all, what is the "this" you are trying to get through?  Previously you said you were not trying to get the data in as fast as possible, but only to see what you can expect.  Well, now you see what you can expect.  You can expect to load at a certain speed given a certain table size, and you can expect to see some log messages about unusual activity.  Is it fast enough, or is it not fast enough?  

If it is fast enough, and if you can ignore a few dozen messages in the log file, then you are done.  (Although you will still want to assess how queries against your tables are affected by the load process, assuming your database is used for interactive queries)

If it is not fast enough, then randomly disabling important parts of the system which have nothing to do with the bulk load is probably not the way to improve things, but is an excellent way to shoot yourself in the foot.

Cheers,

Jeff
Points well taken.

Others in this thread have suggested that I should in fact expect higher through-put so I've been angling at that for a bit.


pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: COPY v. java performance comparison
Next
From: David Rees
Date:
Subject: Re: SSD Drives