Thread: Is it possible to make a streaming replication faster using COPY instead of lots of INSERTS?

Hi,

An application server does a lot of small inserts on several tables
(kind of users activity logging) in my database. These inserts are
creating a lot of (if not the most of) the traffic on the server. The
server has a hot standby replica working through a very slow channel
between them.

When the amount of this inserts increases the master does not manage
to send such a big stream of changes (because of the low bandwidth) to
the replica in time and the lag between them grows up dramatically.

Would it be more compact from the point of view of streaming
replication if we make the application accumulate changes and do one
COPY instead of lots of INSERTS say once a minute? And if it will be
so how to estimate the effect approximately?

PostgreSQL version is 9.0.4 on both servers. If you need more
specifics about the configuration just let me know what exactly.

Thank you very much in advance.

--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp

On 11/30/2011 10:32 PM, Sergey Konoplev wrote:
> Would it be more compact from the point of view of streaming
> replication if we make the application accumulate changes and do one
> COPY instead of lots of INSERTS say once a minute? And if it will be
> so how to estimate the effect approximately?
Streaming replication works on a rather lower level than that. It
records information about transaction starts, rollbacks and commits, and
records disk block changes. It does not record SQL statements. It's not
using INSERT, so you can't switch to COPY. Streaming replication
basically just copies the WAL data, and WAL data is not all that compact.

Try to run streaming replication over a compressed channel. PostgreSQL
might gain the ability to do this natively - if someone cares enough to
implement it and if it doesn't already do it without my noticing - but
in the mean time you can use a compressed SSH tunnel, compressed VPN, etc.

Alternately, investigate 3rd party replication options like Slony and
Bucardo that might be better suited to your use case.

--
Craig Ringer

On Nov 30, 2011, at 18:44, Craig Ringer <ringerc@ringerc.id.au> wrote:

> On 11/30/2011 10:32 PM, Sergey Konoplev wrote:
>> Would it be more compact from the point of view of streaming
>> replication if we make the application accumulate changes and do one
>> COPY instead of lots of INSERTS say once a minute? And if it will be
>> so how to estimate the effect approximately?
> Streaming replication works on a rather lower level than that. It records information about transaction starts,
rollbacksand commits, and records disk block changes. It does not record SQL statements. It's not using INSERT, so you
can'tswitch to COPY. Streaming replication basically just copies the WAL data, and WAL data is not all that compact. 

I think a better way to phrase the question is whether these three types of constructs affect different results on the
replicationside: 

Insert into tbl values(...); [times 50]
insert into tbl values (...), (...), (...), ...; [ once with 50 values ]
Copy [ with 50 input rows provided ]

I would presume the first one is badly performing but no idea whether the multi-value version of insert would be
outperformedby an equivalent Copy command (both on the main query and during replication) 

Though, does auto-commit affect the results in the first case; I.e., without auto-commit do the first two results
replicateequivalently? 

>
David J

On 1 December 2011 03:44, Craig Ringer <ringerc@ringerc.id.au> wrote:
> Streaming replication works on a rather lower level than that. It records
> information about transaction starts, rollbacks and commits, and records
> disk block changes. It does not record SQL statements. It's not using
> INSERT, so you can't switch to COPY. Streaming replication basically just
> copies the WAL data, and WAL data is not all that compact.

My thought was about saving bytes on the information about transaction
starts, rollbacks and commits. I case of lost of small inserts each in
different transaction I suppose there will be more data like this.

> Try to run streaming replication over a compressed channel. PostgreSQL might
> gain the ability to do this natively - if someone cares enough to implement
> it and if it doesn't already do it without my noticing - but in the mean
> time you can use a compressed SSH tunnel, compressed VPN, etc.

Thank you for the advice.

> Alternately, investigate 3rd party replication options like Slony and
> Bucardo that might be better suited to your use case.
>
> --
> Craig Ringer



--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp

On 1 December 2011 04:00, David Johnston <polobo@yahoo.com> wrote:
> On Nov 30, 2011, at 18:44, Craig Ringer <ringerc@ringerc.id.au> wrote:
>
>> On 11/30/2011 10:32 PM, Sergey Konoplev wrote:
> Insert into tbl values(...); [times 50]
> insert into tbl values (...), (...), (...), ...; [ once with 50 values ]
> Copy [ with 50 input rows provided ]
>
> I would presume the first one is badly performing but no idea whether the multi-value version of insert would be
outperformedby an equivalent Copy command (both on the main query and during replication) 
>
> Though, does auto-commit affect the results in the first case; I.e., without auto-commit do the first two results
replicateequivalently? 

So the guaranteed solutions are either

BEGIN;
INSERT INTO table1 VALUES (...), (...), ...;
COMMIT;

or

COPY FROM ...;

correct?

>
>>
> David J



--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp