Thread: Producer/Consumer Issues in the COPY across network
I'm looking at ways to reduce the number of network calls and/or the waiting time while we perform network COPY. The COPY calls in libpq allow asynchronous actions, yet are coded in a synchronous manner in pg_dump, Slony and psql \copy. Does anybody have any experience with running COPY in asynchronous mode? When we're running a COPY over a high latency link then network time is going to become dominant, so potentially, running COPY asynchronously might help performance for loads or initial Slony configuration. This is potentially more important on Slony where we do both a PQgetCopyData() and PQputCopyData() in a tight loop. I also note that PQgetCopyData always returns just one row. Is there an underlying buffering between the protocol (which always sends one message per row) and libpq (which is one call per row)? It seems possible for us to request a number of rows from the server up to a preferred total transfer size. PQputCopyData seems to be more efficient with smaller rows. Ideas? Experience? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Tue, Feb 26, 2008 at 11:00:33AM +0000, Simon Riggs wrote: > I'm looking at ways to reduce the number of network calls and/or the > waiting time while we perform network COPY. > > The COPY calls in libpq allow asynchronous actions, yet are coded in a > synchronous manner in pg_dump, Slony and psql \copy. I don't think it's the synchronous/asynchronous mode that's making the difference. Rather, usually the network stack will coalesce packets into larger chunks to improve performance. I wonder whether it's COPY interacting badly with the TCP_NODELAY option (which disables the coalescing). > When we're running a COPY over a high latency link then network time is > going to become dominant, so potentially, running COPY asynchronously > might help performance for loads or initial Slony configuration. This is > potentially more important on Slony where we do both a PQgetCopyData() > and PQputCopyData() in a tight loop. When you check the packets being sent, are you showing only one record being sent per packet? If so, there's your problem. > I also note that PQgetCopyData always returns just one row. Is there an > underlying buffering between the protocol (which always sends one > message per row) and libpq (which is one call per row)? It seems > possible for us to request a number of rows from the server up to a > preferred total transfer size. AIUI the server merely streams the rows to you, the client doesn't get to say how many :) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Tue, 2008-02-26 at 12:29 +0100, Martijn van Oosterhout wrote: > > When we're running a COPY over a high latency link then network time is > > going to become dominant, so potentially, running COPY asynchronously > > might help performance for loads or initial Slony configuration. This is > > potentially more important on Slony where we do both a PQgetCopyData() > > and PQputCopyData() in a tight loop. > > When you check the packets being sent, are you showing only one record > being sent per packet? If so, there's your problem. I've not inspected the packet flow. It seemed easier to ask. > > I also note that PQgetCopyData always returns just one row. Is there an > > underlying buffering between the protocol (which always sends one > > message per row) and libpq (which is one call per row)? It seems > > possible for us to request a number of rows from the server up to a > > preferred total transfer size. > > AIUI the server merely streams the rows to you, the client doesn't get > to say how many :) Right, but presumably we generate a new message per PQgetCopyData() request? So my presumption is we need to wait for that to be generated each time? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On Thu, Feb 28, 2008 at 01:57:49AM +0000, Simon Riggs wrote: > > > > AIUI the server merely streams the rows to you, the client doesn't get > > to say how many :) > > Right, but presumably we generate a new message per PQgetCopyData() > request? So my presumption is we need to wait for that to be generated > each time? No, PQgetCopyData() doesn't send anything. It merely reads what's in the kernel socket buffer to a local buffer and when it has a complete line it mallocs a string and returns it to you. Similarly, PQputCopyData() doesn't expect anything from the server during transmission. That's why I was wondering about the rows per packet. Sending bigger packets reduces overall overhead. (The malloc/free per row doesn't seem too efficient.) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy
On Thu, 2008-02-28 at 15:39 +0100, Martijn van Oosterhout wrote: > That's why I was wondering about the rows per packet. Sending bigger > packets reduces overall overhead. > > (The malloc/free per row doesn't seem too efficient.) I guess neither of us know then. Oh well. That's good 'cos it sounds like something worth looking into if anybody has a protocol sniffer and some time. I'll skip on that test 'cos its not really my area. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk