Thread: Simple (hopefully) throughput question?

Simple (hopefully) throughput question?

From

Nick Matheson

Date:

03 November 2010, 13:52:41

Hello

We have an application that needs to do bulk reads of ENTIRE
Postgres tables very quickly (i.e. select * from table). We have
observed that such sequential scans run two orders of magnitude slower
than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
due to the storage overhead we have observed in Postgres. In the
example below, it takes 1 GB to store 350 MB of nominal data. However
that suggests we would expect to get 35 MB/s bulk read rates.

Observations using iostat and top during these bulk reads suggest
that the queries are CPU bound, not I/O bound. In fact, repeating the
queries yields similar response times. Presumably if it were an I/O
issue the response times would be much shorter the second time through
with the benefit of caching.

We have tried these simple queries using psql, JDBC, pl/java stored
procedures, and libpq. In all cases the client code ran on the same
box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.

We also tried playing around with some of the server tuning parameters such as shared_buffers to no avail.

Here is uname -a for a machine we have tested on:

Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

A sample dataset that reproduces these results looks like the following (there are no indexes):

There are 22 million rows in this case.

We HAVE observed that summation queries run considerably faster. In this case,

select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts

runs at 35 MB/s.

Our business logic does operations on the resulting data such that
the output is several orders of magnitude smaller than the input. So
we had hoped that by putting our business logic into stored procedures
(and thus drastically reducing the amount of data flowing to the
client) our throughput would go way up. This did not happen.

So our questions are as follows:

Is there any way using stored procedures (maybe C code that calls
SPI directly) or some other approach to get close to the expected 35
MB/s doing these bulk reads? Or is this the price we have to pay for
using SQL instead of some NoSQL solution. (We actually tried Tokyo
Cabinet and found it to perform quite well. However it does not measure
up to Postgres in terms of replication, data interrogation, community
support, acceptance, etc).

Thanks

Dan Schaffer
Paul Hamer
Nick Matheson

Re: Simple (hopefully) throughput question?

From

Heikki Linnakangas

Date:

03 November 2010, 14:10:40

On 03.11.2010 17:52, Nick Matheson wrote:
> We have an application that needs to do bulk reads of ENTIRE
> Postgres tables very quickly (i.e. select * from table). We have
> observed that such sequential scans run two orders of magnitude slower
> than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
> due to the storage overhead we have observed in Postgres. In the
> example below, it takes 1 GB to store 350 MB of nominal data. However
> that suggests we would expect to get 35 MB/s bulk read rates.
>
> Observations using iostat and top during these bulk reads suggest
> that the queries are CPU bound, not I/O bound. In fact, repeating the
> queries yields similar response times. Presumably if it were an I/O
> issue the response times would be much shorter the second time through
> with the benefit of caching.
>
> We have tried these simple queries using psql, JDBC, pl/java stored
> procedures, and libpq. In all cases the client code ran on the same
> box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.

Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Simple (hopefully) throughput question?

From

Marti Raudsepp

Date:

03 November 2010, 14:23:30

Just some ideas that went through my mind when reading your post.

On Wed, Nov 3, 2010 at 17:52, Nick Matheson <Nick.D.Matheson@noaa.gov> wrote:
> than observed raw disk reads (5 MB/s versus 100 MB/s).  Part of this is
> due to the storage overhead we have observed in Postgres.  In the
> example below, it takes 1 GB to store 350 MB of nominal data.

PostgreSQL 8.3 and later have 22 bytes of overhead per row, plus
page-level overhead and internal fragmentation. You can't do anything
about row overheads, but you can recompile the server with larger
pages to reduce page overhead.

> Is there any way using stored procedures (maybe C code that calls
> SPI directly) or some other approach to get close to the expected 35
> MB/s doing these bulk reads?

Perhaps a simpler alternative would be writing your own aggregate
function with four arguments.

If you write this aggregate function in C, it should have similar
performance as the sum() query.

Regards,
Marti

Re: Simple (hopefully) throughput question?

From

Andy Colson

Date:

03 November 2010, 14:40:23

On 11/3/2010 10:52 AM, Nick Matheson wrote:
> Hello
>
> We have an application that needs to do bulk reads of ENTIRE
> Postgres tables very quickly (i.e. select * from table). We have
> observed that such sequential scans run two orders of magnitude slower
> than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
> due to the storage overhead we have observed in Postgres. In the
> example below, it takes 1 GB to store 350 MB of nominal data. However
> that suggests we would expect to get 35 MB/s bulk read rates.
>
> Observations using iostat and top during these bulk reads suggest
> that the queries are CPU bound, not I/O bound. In fact, repeating the
> queries yields similar response times. Presumably if it were an I/O
> issue the response times would be much shorter the second time through
> with the benefit of caching.
>
> We have tried these simple queries using psql, JDBC, pl/java stored
> procedures, and libpq. In all cases the client code ran on the same
> box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.
>
> We also tried playing around with some of the server tuning parameters
> such as shared_buffers to no avail.
>
> Here is uname -a for a machine we have tested on:
>
> Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20
> 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>
> A sample dataset that reproduces these results looks like the following
> (there are no indexes):
>
> Table "bulk_performance.counts"
> Column | Type | Modifiers
> --------+---------+-----------
> i1 | integer |
> i2 | integer |
> i3 | integer |
> i4 | integer |
>
> There are 22 million rows in this case.
>
> We HAVE observed that summation queries run considerably faster. In this
> case,
>
> select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts
>
> runs at 35 MB/s.
>
> Our business logic does operations on the resulting data such that
> the output is several orders of magnitude smaller than the input. So
> we had hoped that by putting our business logic into stored procedures
> (and thus drastically reducing the amount of data flowing to the
> client) our throughput would go way up. This did not happen.
>
> So our questions are as follows:
>
> Is there any way using stored procedures (maybe C code that calls
> SPI directly) or some other approach to get close to the expected 35
> MB/s doing these bulk reads? Or is this the price we have to pay for
> using SQL instead of some NoSQL solution. (We actually tried Tokyo
> Cabinet and found it to perform quite well. However it does not measure
> up to Postgres in terms of replication, data interrogation, community
> support, acceptance, etc).
>
> Thanks
>
> Dan Schaffer
> Paul Hamer
> Nick Matheson
>
>

I have no idea if this would be helpful or not, never tried it, but when
you fire off "select * from bigtable" pg will create the entire
resultset in memory (and maybe swap?) and then send it all to the client
in one big lump.  You might try a cursor and fetch 100-1000 at a time
from the cursor.  No idea if it would be faster or slower.

-Andy

Re: Simple (hopefully) throughput question?

From

"Pierre C"

Date:

04 November 2010, 06:12:23

> Is there any way using stored procedures (maybe C code that calls
> SPI directly) or some other approach to get close to the expected 35
> MB/s doing these bulk reads?  Or is this the price we have to pay for
> using SQL instead of some NoSQL solution.  (We actually tried Tokyo
> Cabinet and found it to perform quite well. However it does not measure
> up to Postgres in terms of replication, data interrogation, community
> support, acceptance, etc).

Reading from the tables is very fast, what bites you is that postgres has
to convert the data to wire format, send it to the client, and the client
has to decode it and convert it to a format usable by your application.
Writing a custom aggregate in C should be a lot faster since it has direct
access to the data itself. The code path from actual table data to an
aggregate is much shorter than from table data to the client...

Re: Simple (hopefully) throughput question?

From

Nick Matheson

Date:

04 November 2010, 11:31:32

Heikki-
>
> Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".
>
Thanks for the suggestion. A preliminary test shows an improvement
closer to our expected 35 MB/s.

Are you familiar with any Java libraries for decoding the COPY format?
The spec is clear and we could clearly write our own, but figured I
would ask. ;)

Nick

Re: Simple (hopefully) throughput question?

From

Nick Matheson

Date:

04 November 2010, 11:35:02

Marti-

Just some ideas that went through my mind when reading your post

PostgreSQL 8.3 and later have 22 bytes of overhead per row, plus
page-level overhead and internal fragmentation. You can't do anything
about row overheads, but you can recompile the server with larger
pages to reduce page overhead.

Is there any way using stored procedures (maybe C code that calls
SPI directly) or some other approach to get close to the expected 35
MB/s doing these bulk reads?

Perhaps a simpler alternative would be writing your own aggregate
function with four arguments.

If you write this aggregate function in C, it should have similar
performance as the sum() query.

You comments seem to confirm some of our foggy understanding of the storage 'overhead' and nudge us in the direction of C stored procedures.

Do you have any results or personal experiences from moving calculations in this way? I think we are trying to get an understanding of how much we might stand to gain by the added investment.

Thanks,

Nick

Re: Simple (hopefully) throughput question?

From

Nick Matheson

Date:

04 November 2010, 11:38:30

Andy-
> I have no idea if this would be helpful or not, never tried it, but
> when you fire off "select * from bigtable" pg will create the entire
> resultset in memory (and maybe swap?) and then send it all to the
> client in one big lump.  You might try a cursor and fetch 100-1000 at
> a time from the cursor.  No idea if it would be faster or slower.
I am pretty sure we have tried paged datasets and didn't see any
improvement. But we will put this on our list of things to double check,
better safe than sorry you know.

Thanks,

Nick

Re: Simple (hopefully) throughput question?

From

Nick Matheson

Date:

04 November 2010, 11:42:15

Pierre-

Reading from the tables is very fast, what bites you is that postgres
has to convert the data to wire format, send it to the client, and the
client has to decode it and convert it to a format usable by your
application. Writing a custom aggregate in C should be a lot faster
since it has direct access to the
data itself. The code path from actual table data to an aggregate is
much shorter than from table data to the client...


I think your comments really get at what our working hypothesis was, but
given that our experience is limited compared to you all here on the
mailing lists we really wanted to make sure we weren't missing any
alternatives. Also the writing of custom aggregators will likely
leverage any improvements we make to our storage throughput.

Thanks,

Nick

Re: Simple (hopefully) throughput question?

From

Vitalii Tymchyshyn

Date:

04 November 2010, 12:07:41

04.11.10 16:31, Nick Matheson написав(ла):
> Heikki-
>>
>> Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".
>>
> Thanks for the suggestion. A preliminary test shows an improvement
> closer to our expected 35 MB/s.
>
> Are you familiar with any Java libraries for decoding the COPY format?
> The spec is clear and we could clearly write our own, but figured I
> would ask. ;)
JDBC driver has some COPY support, but I don't remember details. You'd
better ask in JDBC list.

Best regards, Vitalii Tymchyshyn

Re: Simple (hopefully) throughput question?

From

Nick Matheson

Date:

04 November 2010, 12:09:18

mark-

Thanks for all the good questions/insights.
> People are probably going to want more detail on the list to give alternate
> ways of attacking the problem. That said....
>
I am going to try and fill in some of the gaps where I can...
> The copy suggestion is a good one if you are unloading to another
> application for actual data processing.
>
> Are you watching an avg of all cores or just one core when you see this cpu
> bound issue ? I would expect a simple "select * from table " to get some IO
> wait.
>
Our application doesn't have many concurrent requests so this
observation is of a single query resulting in a single cpu being maxed.
We have played with making multiple requests each for a subset of the
data and this scales fairly well, but again with the process seemingly
very cpu bound for a conceptually simple data retrieval. Our problem is
really about answering a single large question as quickly as possible vs
the more typical requests per/s type throughput (more OLAP than OLTP).
> Does the table your reading from have most of the data stored in TOAST? I
> ask because as a default there is some compression with TOAST and maybe your
> spending more time with decompression that expected. Not the first thing
> that I would think to change or where I suspect the problem comes from.
>
No the example table isn't too far from the real thing (simple I know)
just add a few int/short metadata columns and you have it. And the
example table exhibits the same performance issue so I think it is a
reasonable test case.
> More detailed hardware list and more detailed example case will probably
> help a lot with getting some of the really smart PG people on it, (like Tom
> Lane or some of people who work for a postgresql paid support company)
>
Hardware
------------------------------------
2 x Dual Core 2.4GHz Opteron
8G Ram
4-Drive Raid 5
> For the record: 35MB/s seq reads isn't that fast so a lot of people are
> going to wonder why that number is so low.
>
I completely agree. (With the large RAID arrays or even SSDs arrays I
have seen here on the boards 3G/s isn't crazy any longer)

I think our dilemma was that we weren't seemingly able to make use of
what we had for IO throughput.  This was most evident when we did some
tests with smaller datasets that could fit entirely into the disk cache,
we saw (via iostat) that indeed the query ran almost entirely from the
disk cache, but yielded nearly the same 5 MB/s throughput. This seemed
to indicate that our problem was somewhere other than the storage
infrastructure and lead us to the single CPU bottleneck discovery.
> Anyways since I suspect that improving IO some will actually speed up your
> select * case I submit the following for you to consider.
>
> My suggestions to improve a "select * from table" case (and copy
> performance):
>
> First, if you haven't, bump your read ahead value, this should improve
> things some - however probably not enough by itself.
>
> blockdev --setra 16384 /dev/<devicename>
>
Will check into this one.
> The default on most linux installs is IMO woefully small at 256. 16384 might
> be a little high for you but it has worked well for us and our hardware.
>
>
> If your data directory is mounted as its own partition or logical disk you
> might consider mounting it with the noatime flag.
>
> Also are you running ext3?  If you can pull in XFS (and do noatime there as
> well) you should see about a 20% increase. It looks like this is a redhat or
> cent box. If this is RHEL-ES you will need to either do a custom import for
> xfsdump and xfsprogs your self and risk support issues from RH, or if it is
> cent you can pull in the "extras". (if Redhat and not Adv. Server you can
> buy it from Redhat for ES servers) CENT/RH 6 should have XFS support by
> default that might be too far off for you.
>
> (this 20% number came from our own inhouse testing of sequential read tests
> with dd) but there are plenty of other posts on web showing how much of an
> improvement XFS is over ext3.
>
>
> If you haven't broken out the data directory to it's own partition (and
> hopefully spindles) there are some problems with using noatime on a system,
> be aware of what they are.
>
> You will probably still be annoyed with a single long running query getting
> bottlenecked at a single cpu core but without a more detailed example case
> people might have a hard time helping with solving that problem..
>
> Anyways try these and see if that gets you anywhere.
>
ext3 currently, our support/IT layer may balk at non-Redhat RPM stuff,
have to see on that one.

I think if we can get past the seeming CPU bound then all of the above
would be candidates for optimizing the IO. We had actually slated to buy
a system with an SSD array separate spindles for OS, data, etc, but when
I couldn't make a convincing case for the end user response time
improvements it was put on hold, and hence the source of our questions
here. ;)
> You could always play with the greenplum singlenode addition if you want to
> see a way to still sort-of be on postgres and use all cores... but that
> introduces a whole host of other issues to solve.
>
Interesting. I had heard of Greenplum, but thought it was more about
scaling to clusters rather than single node improvements. We will have
to look into that.

Thanks again for all the ideas, questions and things to look into I
think you have opened a number of new possibilities.

Cheers,

Nick
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Nick Matheson
> Sent: Wednesday, November 03, 2010 9:53 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Simple (hopefully) throughput question?
>
> Hello
>
> We have an application that needs to do bulk reads of ENTIRE
> Postgres tables very quickly (i.e. select * from table).  We have
> observed that such sequential scans run two orders of magnitude slower
> than observed raw disk reads (5 MB/s versus 100 MB/s).  Part of this is
> due to the storage overhead we have observed in Postgres.  In the
> example below, it takes 1 GB to store 350 MB of nominal data.  However
> that suggests we would expect to get 35 MB/s bulk read rates.
>
> Observations using iostat and top during these bulk reads suggest
> that the queries are CPU bound, not I/O bound.  In fact, repeating the
> queries yields similar response times.  Presumably if it were an I/O
> issue the response times would be much shorter the second time through
> with the benefit of caching.
>
> We have tried these simple queries using psql, JDBC, pl/java stored
> procedures, and libpq.  In all cases the client code ran on the same
> box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.
>
> We also tried playing around with some of the server tuning parameters such
> as shared_buffers to no avail.
>
> Here is uname -a for a machine we have tested on:
>
> Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06
> EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>
> A sample dataset that reproduces these results looks like the following
> (there are no indexes):
>
> Table "bulk_performance.counts"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  i1     | integer |
>  i2     | integer |
>  i3     | integer |
>  i4     | integer |
>
> There are 22 million rows in this case.
>
> We HAVE observed that summation queries run considerably faster.  In this
> case,
>
> select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts
>
> runs at 35 MB/s.
>
> Our business logic does operations on the resulting data such that
> the output is several orders of magnitude smaller than the input.  So
> we had hoped that by putting our business logic into stored procedures
> (and thus drastically reducing the amount of data flowing to the
> client) our throughput would go way up.  This did not happen.
>
> So our questions are as follows:
>
> Is there any way using stored procedures (maybe C code that calls
> SPI directly) or some other approach to get close to the expected 35
> MB/s doing these bulk reads?  Or is this the price we have to pay for
> using SQL instead of some NoSQL solution.  (We actually tried Tokyo
> Cabinet and found it to perform quite well. However it does not measure
> up to Postgres in terms of replication, data interrogation, community
> support, acceptance, etc).
>
> Thanks
>
> Dan Schaffer
> Paul Hamer
> Nick Matheson
>
>
>

Re: Simple (hopefully) throughput question?

From

Maciek Sakrejda

Date:

04 November 2010, 12:13:13

> JDBC driver has some COPY support, but I don't remember details. You'd
> better ask in JDBC list.

As long as we're here: yes, the JDBC driver has COPY support as of
8.4(?) via the CopyManager PostgreSQL-specific API. You can call
((PGConnection)conn).getCopyManager() and do either push- or
pull-based COPY IN or OUT. We've been using it for several years and
it works like a charm. For more details, ask the JDBC list or check
out the docs: http://jdbc.postgresql.org/documentation/publicapi/index.html

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Simple (hopefully) throughput question?

From

Nick Matheson

Date:

04 November 2010, 16:25:08

Maciek/Vitalii-

Thanks for the pointers to the JDBC work.

Luckily, we had already found the COPY support in the pg driver, but
were wondering if anyone had already written the complimentary unpacking
code for the raw data returned from the copy.

Again the spec is clear enough that we could write it, but we just
didn't want to re-invent the wheel if it wasn't necessary.

Cheers,

Nick

Re: Simple (hopefully) throughput question?

From

"Pierre C"

Date:

05 November 2010, 06:05:41

On Thu, 04 Nov 2010 15:42:08 +0100, Nick Matheson
<Nick.D.Matheson@noaa.gov> wrote:
> I think your comments really get at what our working hypothesis was, but
> given that our experience is limited compared to you all here on the
> mailing lists we really wanted to make sure we weren't missing any
> alternatives. Also the writing of custom aggregators will likely
> leverage any improvements we make to our storage throughput.

Quick test : SELECT sum(x) FROM a table with 1 INT column, 3M rows, cached
=> 244 MB/s
=> 6.7 M rows/s

Same on MySQL :

          size    SELECT sum(x) (cached)
postgres 107 MB  0.44 s
myisam   20 MB   0.42 s
innodb   88 MB   1.98 s

As you can see, even though myisam is much smaller (no transaction data to
store !) the aggregate performance isn't any better, and for innodb it is
much worse.

Even though pg's per-row header is large, seq scan / aggregate performance
is very good.

You can get performance in this ballpark by writing a custom aggregate in
C ; it isn't very difficult, the pg source code is clean and full of
insightful comments.

- take a look at how contrib/intagg works
- http://www.postgresql.org/files/documentation/books/aw_pgsql/node168.html
- and the pg manual of course

Re: Simple (hopefully) throughput question?

From

Robert Klemme

Date:

05 November 2010, 14:35:20

On 11/03/2010 04:52 PM, Nick Matheson wrote:

> We have an application that needs to do bulk reads of ENTIRE
> Postgres tables very quickly (i.e. select * from table). We have
> observed that such sequential scans run two orders of magnitude slower
> than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
> due to the storage overhead we have observed in Postgres. In the
> example below, it takes 1 GB to store 350 MB of nominal data. However
> that suggests we would expect to get 35 MB/s bulk read rates.

> Our business logic does operations on the resulting data such that
> the output is several orders of magnitude smaller than the input. So
> we had hoped that by putting our business logic into stored procedures
> (and thus drastically reducing the amount of data flowing to the
> client) our throughput would go way up. This did not happen.

Can you disclose what kinds of manipulations you want to do on the data?
  I am asking because maybe there is a fancy query (possibly using
windowing functions and / or aggregation functions) that gets you the
speed that you need without transferring the whole data set to the client.

> So our questions are as follows:
>
> Is there any way using stored procedures (maybe C code that calls
> SPI directly) or some other approach to get close to the expected 35
> MB/s doing these bulk reads? Or is this the price we have to pay for
> using SQL instead of some NoSQL solution. (We actually tried Tokyo
> Cabinet and found it to perform quite well. However it does not measure
> up to Postgres in terms of replication, data interrogation, community
> support, acceptance, etc).

Kind regards

    robert

Re: Simple (hopefully) throughput question?

From

Samuel Gendler

Date:

05 November 2010, 18:28:12

On Fri, Nov 5, 2010 at 12:23 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote:

On Thu, Nov 4, 2010 at 8:07 AM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:
04.11.10 16:31, Nick Matheson написав(ла):

Heikki-

Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".

Thanks for the suggestion. A preliminary test shows an improvement closer to our expected 35 MB/s.

Are you familiar with any Java libraries for decoding the COPY format? The spec is clear and we could clearly write our own, but figured I would ask. ;)
JDBC driver has some COPY support, but I don't remember details. You'd better ask in JDBC list.

The JDBC driver support works fine. You can pass a Reader or InputStream (if I recall correctly, the InputStream path is more efficient. Or maybe the Reader path was buggy. Regardless, I wound up using an InputStream in the driver which I then wrap in a Reader in order to get it line-by-line.

You can write a COPY statement to send standard CSV format - take a look at the postgres docs for the COPY statement to see the full syntax. I then have a subclass of BufferedReader which parses each line of CSV and does something interesting with it. I've had it working very reliably for many months now, processing about 500 million rows per day (I'm actually COPYing out, rather than in, but the concept is the same, rgardless - my outputstream is wrapper in a writer, which reformats data on the fly).

I should mention that I found basically no documentation of the copy api in the jdbc driver in 8.4. I have no idea if that has changed with 9.x. I had to figure it out by reading the source code. Fortunately, it is very simple:

return ((PGConnection) con).getCopyAPI().copyIn(sql, this.fis);

Where this.fis is an InputStream. There's an alternative copyIn implementation that takes a Reader instead. I'm sure the copyOut methods are the same.

Note: my earlier email was confusing. copyIn, copies into the db and receives an InputStream that will deliver data when it is read. copyOut copies data from the db and receives an OutputStream which will receive the data. I inverted those in my earlier email.

You can look at the source code to the CopyAPI to learn more about the mechanism.

Re: Simple (hopefully) throughput question?

From

Samuel Gendler

Date:

05 November 2010, 19:04:42

On Thu, Nov 4, 2010 at 8:07 AM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:

04.11.10 16:31, Nick Matheson написав(ла):

Heikki-

Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".

Thanks for the suggestion. A preliminary test shows an improvement closer to our expected 35 MB/s.

Are you familiar with any Java libraries for decoding the COPY format? The spec is clear and we could clearly write our own, but figured I would ask. ;)
JDBC driver has some COPY support, but I don't remember details. You'd better ask in JDBC list.

The JDBC driver support works fine. You can pass a Reader or InputStream (if I recall correctly, the InputStream path is more efficient. Or maybe the Reader path was buggy. Regardless, I wound up using an InputStream in the driver which I then wrap in a Reader in order to get it line-by-line.

You can write a COPY statement to send standard CSV format - take a look at the postgres docs for the COPY statement to see the full syntax. I then have a subclass of BufferedReader which parses each line of CSV and does something interesting with it. I've had it working very reliably for many months now, processing about 500 million rows per day (I'm actually COPYing out, rather than in, but the concept is the same, rgardless - my outputstream is wrapper in a writer, which reformats data on the fly).