Thread: Help with bulk read performance
Hello We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). Wehave observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the storage overhead we have observed in Postgres. In the example below, it takes 1 GB to store 350 MBof nominal data. However that suggests we would expect to get 35 MB/s bulk read rates. Observations using iostat and top during these bulk reads suggest that the queries are CPU bound, not I/O bound. In fact,repeating the queries yields similar response times. Presumably if it were an I/O issue the response times would be much shorter thesecond time through with the benefit of caching. We have tried these simple queries using psql, JDBC, pl/java stored procedures, and libpq. In all cases the client coderan on the same box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0. We also tried playing around with some of the server tuning parameters such as shared_buffers to no avail. Here is uname -a for a machine we have tested on: Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux A sample dataset that reproduces these results looks like the following (there are no indexes): Table "bulk_performance.counts" Column | Type | Modifiers --------+---------+----------- i1 | integer | i2 | integer | i3 | integer | i4 | integer | There are 22 million rows in this case. We HAVE observed that summation queries run considerably faster. In this case, select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts runs at 35 MB/s. Our business logic does operations on the resulting data such that the output is several orders of magnitude smaller thanthe input. So we had hoped that by putting our business logic into stored procedures (and thus drastically reducing the amount ofdata flowing to the client) our throughput would go way up. This did not happen. So our questions are as follows: Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close to theexpected 35 MB/s doing these bulk reads? Or is this the price we have to pay for using SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and found it to perform quite well. However it does not measure up to Postgres in terms of replication,data interrogation, community support, acceptance, etc). Thanks Dan Schaffer Paul Hamer Nick Matheson
Attachment
On Nov 1, 2010, at 9:15 AM, Dan Schaffer wrote: > We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus100 MB/s). Part of this is due to the storage overhead we have observed in Postgres. In the example below, it takes1 GB to store 350 MB of nominal data. However that suggests we would expect to get 35 MB/s bulk read rates. > > Observations using iostat and top during these bulk reads suggest that the queries are CPU bound, not I/O bound. In fact,repeating the queries yields similar response times. Presumably if it were an I/O issue the response times would bemuch shorter the second time through with the benefit of caching. > > We have tried these simple queries using psql, JDBC, pl/java stored procedures, and libpq. In all cases the client coderan on the same box as the server. > We have experimented with Postgres 8.1, 8.3 and 9.0. > > We also tried playing around with some of the server tuning parameters such as shared_buffers to no avail. > > Here is uname -a for a machine we have tested on: > > Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux > > A sample dataset that reproduces these results looks like the following (there are no indexes): > > Table "bulk_performance.counts" > Column | Type | Modifiers > --------+---------+----------- > i1 | integer | > i2 | integer | > i3 | integer | > i4 | integer | > > There are 22 million rows in this case. > > We HAVE observed that summation queries run considerably faster. In this case, > > select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts > > runs at 35 MB/s. > > > Our business logic does operations on the resulting data such that the output is several orders of magnitude smaller thanthe input. So we had hoped that by putting our business logic into stored procedures (and thus drastically reducingthe amount of data flowing to the client) our throughput would go way up. This did not happen. > > So our questions are as follows: > > Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close tothe expected 35 MB/s doing these bulk reads? Or is this the price we have to pay for using SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and found it to perform quite well. However it does not measure up to Postgresin terms of replication, data interrogation, community support, acceptance, etc). Have you by chance tried EXPLAIN ANALYZE SELECT * FROM bulk_performance.counts? That will throw away the query results, whichremoves client-server considerations. Also, when you tested raw disk IO, did you do it with an 8k block size? That's the default size of a Postgres block, so allof it's IO is done that way. What does iostat show you? Are you getting a decent number of read requests/second? -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 11/1/2010 9:15 AM, Dan Schaffer wrote: > Hello > > We have an application that needs to do bulk reads of ENTIRE Postgres > tables very quickly (i.e. select * from table). We have observed that > such sequential scans run two orders of magnitude slower than observed > raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the > storage overhead we have observed in Postgres. In the example below, it > takes 1 GB to store 350 MB of nominal data. However that suggests we > would expect to get 35 MB/s bulk read rates. > > Observations using iostat and top during these bulk reads suggest that > the queries are CPU bound, not I/O bound. In fact, repeating the queries > yields similar response times. Presumably if it were an I/O issue the > response times would be much shorter the second time through with the > benefit of caching. > > We have tried these simple queries using psql, JDBC, pl/java stored > procedures, and libpq. In all cases the client code ran on the same box > as the server. > We have experimented with Postgres 8.1, 8.3 and 9.0. > > We also tried playing around with some of the server tuning parameters > such as shared_buffers to no avail. > > Here is uname -a for a machine we have tested on: > > Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 > 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux > > A sample dataset that reproduces these results looks like the following > (there are no indexes): > > Table "bulk_performance.counts" > Column | Type | Modifiers > --------+---------+----------- > i1 | integer | > i2 | integer | > i3 | integer | > i4 | integer | > > There are 22 million rows in this case. > > We HAVE observed that summation queries run considerably faster. In this > case, > > select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts > > runs at 35 MB/s. > > > Our business logic does operations on the resulting data such that the > output is several orders of magnitude smaller than the input. So we had > hoped that by putting our business logic into stored procedures (and > thus drastically reducing the amount of data flowing to the client) our > throughput would go way up. This did not happen. > > So our questions are as follows: > > Is there any way using stored procedures (maybe C code that calls SPI > directly) or some other approach to get close to the expected 35 MB/s > doing these bulk reads? Or is this the price we have to pay for using > SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and > found it to perform quite well. However it does not measure up to > Postgres in terms of replication, data interrogation, community support, > acceptance, etc). > > Thanks > > Dan Schaffer > Paul Hamer > Nick Matheson > > > > Whoa... Deja Vu Is this the same thing Nick is working on? How'd he get along? http://archives.postgresql.org/message-id/4CD1853F.2010800@noaa.gov -Andy
On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: > Is this the same thing Nick is working on? How'd he get along? > > http://archives.postgresql.org/message-id/4CD1853F.2010800@noaa.gov So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 12/14/2010 9:41 AM, Jim Nasby wrote: > On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: >> Is this the same thing Nick is working on? How'd he get along? >> >> http://archives.postgresql.org/message-id/4CD1853F.2010800@noaa.gov > > So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. > -- > Jim C. Nasby, Database Architect jim@nasby.net > 512.569.9461 (cell) http://jim.nasby.net > > > Oh.. I didn't even notice the date... I thought it was a new post. But still... (and I'll cc Nick on this) I'd love to hear an update on how this worked out. Did you get it to go fast? What'd you use? Did the project go over budget and did you all get fired? COME ON MAN! We need to know! :-) -Andy
BTW, have you tried prepared statements? bytea is most likely faster (in part) due to less parsing in the backend. Preparedstatements would eliminate that parsing step. On Dec 14, 2010, at 10:07 AM, Nick Matheson wrote: > Hey all- > > Glad to know you are still interested... ;) > > Didn't mean to leave you hanging, the holiday and all have put some bumps in the road. > > Dan my co-worker might be able to post some more detailed information here, but here is a brief summary of what I am awareof: > > 1. We have not tested any stored procedure/SPI based solutions to date. > 2. The COPY API has been the best of the possible solutions explored to date. > 3. We were able to get rates on the order of 35 MB/s with the original problem this way. > 4. Another variant of the problem we were working on included some metadata fields and 300 float values (for this we triedthree variants) > a. 300 float values as columns > b. 300 float in a float array column > c. 300 floats packed into a bytea column > Long story short on these three variants a and b largely performed the same. C was the winner and seems to have improvedthe throughput on multiple counts. 1. it reduces the data transmitted over the wire by a factor of two (float columnsand float arrays have a 2x overhead over the raw data requirement.) 2. this reduction seems to have reduced the cpuburdens on the server side thus producing a better than the expected 2x speed. I think the final numbers left us somewherein the 80-90 MB/s. > > Thanks again for all the input. If you have any other questions let us know. Also if we get results for the stored procedure/SPIroute we will try and post, but the improvements via standard JDBC are such that we aren't really pressed atthis point in time to get more throughput so it may not happen. > > Cheers, > > Nick >> On 12/14/2010 9:41 AM, Jim Nasby wrote: >>> On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: >>>> Is this the same thing Nick is working on? How'd he get along? >>>> >>>> http://archives.postgresql.org/message-id/4CD1853F.2010800@noaa.gov >>> >>> So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. >>> -- >>> Jim C. Nasby, Database Architect jim@nasby.net >>> 512.569.9461 (cell) http://jim.nasby.net >>> >>> >>> >> >> Oh.. I didn't even notice the date... I thought it was a new post. >> >> But still... (and I'll cc Nick on this) I'd love to hear an update on how this worked out. >> >> Did you get it to go fast? What'd you use? Did the project go over budget and did you all get fired? COME ON MAN! We need to know! :-) >> >> -Andy > -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Hey all- Glad to know you are still interested... ;) Didn't mean to leave you hanging, the holiday and all have put some bumps in the road. Dan my co-worker might be able to post some more detailed information here, but here is a brief summary of what I am aware of: 1. We have not tested any stored procedure/SPI based solutions to date. 2. The COPY API has been the best of the possible solutions explored to date. 3. We were able to get rates on the order of 35 MB/s with the original problem this way. 4. Another variant of the problem we were working on included some metadata fields and 300 float values (for this we tried three variants) a. 300 float values as columns b. 300 float in a float array column c. 300 floats packed into a bytea column Long story short on these three variants a and b largely performed the same. C was the winner and seems to have improved the throughput on multiple counts. 1. it reduces the data transmitted over the wire by a factor of two (float columns and float arrays have a 2x overhead over the raw data requirement.) 2. this reduction seems to have reduced the cpu burdens on the server side thus producing a better than the expected 2x speed. I think the final numbers left us somewhere in the 80-90 MB/s. Thanks again for all the input. If you have any other questions let us know. Also if we get results for the stored procedure/SPI route we will try and post, but the improvements via standard JDBC are such that we aren't really pressed at this point in time to get more throughput so it may not happen. Cheers, Nick > On 12/14/2010 9:41 AM, Jim Nasby wrote: >> On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: >>> Is this the same thing Nick is working on? How'd he get along? >>> >>> http://archives.postgresql.org/message-id/4CD1853F.2010800@noaa.gov >> >> So it is. The one I replied to stood out because no one had replied >> to it; I didn't see the earlier email. >> -- >> Jim C. Nasby, Database Architect jim@nasby.net >> 512.569.9461 (cell) http://jim.nasby.net >> >> >> > > Oh.. I didn't even notice the date... I thought it was a new post. > > But still... (and I'll cc Nick on this) I'd love to hear an update on > how this worked out. > > Did you get it to go fast? What'd you use? Did the project go over > budget and did you all get fired? COME ON MAN! We need to know! :-) > > -Andy
Hi, My name is Dan and I'm a co-worker of Nick Matheson who initially submitted this question (because the mail group had meblacklisted for awhile for some reason). Thank you for all of the suggestions. We were able to improve out bulk read performance from 3 MB/s to 60 MB/s (assumingthe data are NOT in cache in both cases) by doing the following: 1. Storing the data in a "bytea" column instead of an "array" column. 2. Retrieving the data via the Postgres 9 CopyManager#copyOut(String sql, OutputStream stream) method The key to the dramatic improvement appears to be the reduction in packing and unpacking time on the server and client, respectively. The server packing occurs when the retrieved data are packed into a bytestream for sending across the network. Storing the data as a simple byte array reduces this time substantially. The client-side unpacking time is spent generatinga ResultSet object. By unpacking the bytestream into the desired arrays of floats by hand instead, this time became close tonegligible. The only downside of storing the data in byte arrays is the loss of transparency. That is, a simple "select *" of a fewrows shows bytes instead of floats. We hope to mitigate this by writing a simple stored procedures that unpacks the bytes into floats. A couple of other results: If the data are stored as a byte array but retrieve into a ResultSet, the unpacking time goes up by an order of magnitudeand the observed total throughput is 25 MB/s. If the data are stored in a Postgres float array and unpacked into a byte stream,the observed throughput is 20 MB/s. Dan (and Nick) Andy Colson wrote: > On 12/14/2010 9:41 AM, Jim Nasby wrote: >> On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: >>> Is this the same thing Nick is working on? How'd he get along? >>> >>> http://archives.postgresql.org/message-id/4CD1853F.2010800@noaa.gov >> >> So it is. The one I replied to stood out because no one had replied to >> it; I didn't see the earlier email. >> -- >> Jim C. Nasby, Database Architect jim@nasby.net >> 512.569.9461 (cell) http://jim.nasby.net >> >> >> > > Oh.. I didn't even notice the date... I thought it was a new post. > > But still... (and I'll cc Nick on this) I'd love to hear an update on > how this worked out. > > Did you get it to go fast? What'd you use? Did the project go over > budget and did you all get fired? COME ON MAN! We need to know! :-) > > -Andy >
Attachment
Hello Daniel, We have the same scenario for the native Java arrays, so we are storing bytea and doing conversion at the client side, butfor the server side SQL, plJava comes very handy: No sure how you want to create stored procedures to convert internally but this is how we do this: One has to define conversion routines in Java then deploy them to plJava. Scanning though this field would be still CPU bound,around 2x slower than with native arrays and 6x slower than with blobs, but at least one has this ability. It's evenpossible to pass them to plR to do some statistical processing directly, so depending on the operations you do it maybe still cheaper then streaming out over the wire to the regular JDBC client. 1. deploy class like this within plJava (null handling left out for brevity) import java.io.File; import java.io.IOException; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; public class myArrayConversion { public myArrayConversion() {} /** Serialize double array to blob */ public static byte[] convertDoubleArrayToBytea(double[] obj) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(obj); return baos.toByteArray(); } /** Serialize int array to blob */ public static byte[] convertIntToBytea(int[] obj) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(obj); return baos.toByteArray(); } /** Deserialize blob to double array */ public static double[] convertToDoubleArray(byte[] obj) throws IOException, ClassNotFoundException { // Deserialize from a byte array ObjectInputStream ios = new ObjectInputStream(new ByteArrayInputStream(obj)); return (double[])ios.readObject(); } /** Deserialize blob to it array */ public static int[] convertIntToArray(byte[] obj) throws IOException, ClassNotFoundException { // Deserialize from a byte array ObjectInputStream ios = new ObjectInputStream(new ByteArrayInputStream(obj)); return (int[])ios.readObject(); } // other types arrays streaming... //... } 2. then create a mapping functions as a db owner: <sql> CREATE OR REPLACE FUNCTION public.convertDoubleArrayToBytea(double precision[]) RETURNS bytea AS 'mappingPkg.convertDoubleArrayToBytea(double[])' LANGUAGE 'javau' IMMUTABLE COST 50; GRANT EXECUTE ON FUNCTION public.convertDoubleArrayToBytea(double precision[]) TO public; CREATE OR REPLACE FUNCTION public.convertToDoubleArray(bytea) RETURNS double precision[] AS 'mappingPkg.convertToDoubleArray(byte[])' LANGUAGE 'javau' IMMUTABLE COST 50; GRANT EXECUTE ON FUNCTION public.convertToDoubleArray(bytea) TO public; </sql> then you can have conversion either way: select convertToDoubleArray(convertDoubleArrayToBytea(array[i::float8,1.1,100.1,i*0.1]::float8[])) from generate_series(1,100)i; so you'd be also able to create bytea objects from native SQL arrays within SQL. PLJava seems to be enjoying revival last days thanks to Johann 'Myrkraverk' Oskarsson who fixed several long-standing bugs.Check out the plJava list for details. Krzysztof On Dec 16, 2010, at 10:22 AM, pgsql-performance-owner@postgresql.org wrote: > From: Dan Schaffer <Daniel.S.Schaffer@noaa.gov> > Date: December 15, 2010 9:15:14 PM GMT+01:00 > To: Andy Colson <andy@squeakycode.net> > Cc: Jim Nasby <jim@nasby.net>, pgsql-performance@postgresql.org, Nick Matheson <Nick.D.Matheson@noaa.gov> > Subject: Re: Help with bulk read performance > Reply-To: Daniel.S.Schaffer@noaa.gov > > > Hi, > My name is Dan and I'm a co-worker of Nick Matheson who initially submitted this question (because the mail group had meblacklisted for awhile for some reason). > > > Thank you for all of the suggestions. We were able to improve out bulk read performance from 3 MB/s to 60 MB/s (assumingthe data are NOT in cache in both cases) by doing the following: > > 1. Storing the data in a "bytea" column instead of an "array" column. > 2. Retrieving the data via the Postgres 9 CopyManager#copyOut(String sql, OutputStream stream) method > > The key to the dramatic improvement appears to be the reduction in packing and unpacking time on the server and client,respectively. The server packing occurs when the retrieved data are packed into a bytestream for sending across thenetwork. Storing the data as a simple byte array reduces this time substantially. The client-side unpacking time is spentgenerating a ResultSet object. By unpacking the bytestream into the desired arrays of floats by hand instead, this timebecame close to negligible. > > The only downside of storing the data in byte arrays is the loss of transparency. That is, a simple "select *" of a fewrows shows bytes instead of floats. We hope to mitigate this by writing a simple stored procedures that unpacks the bytesinto floats. > > A couple of other results: > > If the data are stored as a byte array but retrieve into a ResultSet, the unpacking time goes up by an order of magnitudeand the observed total throughput is 25 MB/s. If the data are stored in a Postgres float array and unpacked intoa byte stream, the observed throughput is 20 MB/s. > > Dan (and Nick)
> If the data are stored as a byte array but retrieve into a ResultSet, > the unpacking time goes up by an order of magnitude and the > observed total throughput is 25 MB/s. If the data are stored in a > Postgres float array and unpacked into a byte stream, the > observed throughput is 20 MB/s. float <-> text conversions are very slow, this is in fact due to the mismatch between base-2 (IEEE754) and base-10 (text) floating point representation, which needs very very complex calculations.
Pierre- I agree with your observation of float <-> text conversion costs, but in this case Dan is talking about storing the raw float data (ie: 4 bytes per float) in a bytea array so there is only the conversion from java float[n] to java byte[4*n] which is not nearly as costly as float <-> text conversion (especially if you leave it in architecture byte order). Nick > >> If the data are stored as a byte array but retrieve into a ResultSet, >> the unpacking time goes up by an order of magnitude and the >> observed total throughput is 25 MB/s. If the data are stored in a >> Postgres float array and unpacked into a byte stream, the >> observed throughput is 20 MB/s. > > > float <-> text conversions are very slow, this is in fact due to the > mismatch between base-2 (IEEE754) and base-10 (text) floating point > representation, which needs very very complex calculations.