Home > mailing lists

Re: COPY Performance - Mailing list pgsql-general

From	Hans Zaunere
Subject	Re: COPY Performance
Date	May 5, 2008 17:14:29
Msg-id	009d01c8aeec$931682a0$b94387e0$@com Whole thread Raw
In response to	Re: COPY Performance ("Scott Marlowe" <scott.marlowe@gmail.com>)
List	pgsql-general

Tree view

> > > > We're using a statement like this to dump between 500K and >5
> > > > million rows.
> > > >
> > > > COPY(SELECT SomeID FROM SomeTable WHERE SomeColumn > '0')
> > > >   TO '/dev/shm/SomeFile.csv'
> > > >
> > > > Upon first run, this operation can take several minutes.  Upon
> > > > second run, it will be complete in generally well under a minute.
> > > >
> > > Hmmm ... define "first" versus "second".  What do you do to return
> > > it to the slow state?
> >
> >  Interesting that you ask.  I haven't found a very reliable way to
> >  reproduce this.
> >
> >  Typically, just waiting a while to run the same query the second
> >  time will reproduce this behavior.  I restarted postgresql and i
> >  was reproduced as well.  However, I can't find a way to flush
> >  buffers/etc, to reproduce the
>
> what happens if you do something like:
>
> select count(*) from (select ...);
>
> i.e. don't make the .csv file each time.  How's the performance
> without making the csv versus making it?

It's the same.

And regarding the /dev/shm, we do watch that memory doesn't become
contentious.  We've also done the dump to another set of disk spindles, and
we've seen the same performance.

So at the end of the day, it certainly does seem like a read-bottleneck off
of the disks.  Unfortunately, from a hardware perspective, there's not much
we can do about it currently.

Does anyone have any experiences they can share about using partitioning or
index tricks to speed up what should be basically large contiguous rows from
a table, based on a single column WHERE constraint?

H

pgsql-general by date:

From: Micah Yoder
Date: 05 May 2008, 16:46:52
Subject: Re: psycopg2 and prepared statements

From: "Dan \"Heron\" Myers"
Date: 05 May 2008, 19:57:20
Subject: Re: Custom C function - is palloc broken?

Re: COPY Performance - Mailing list pgsql-general

Previous

Next