Thread: COPY and heap_sync

COPY and heap_sync

From
Jeff Janes
Date:
If you insert tuples with COPY into a table created or truncated in the same transaction, at the end of the COPY it calls heap_sync.

But there cases were people use COPY in a loop with a small amount of data in each statement.  Now it is calling heap_sync many times, and if NBuffers is large doing that gets very slow.

Could the heap_sync be safely delayed until the end of the transaction, rather than the end of the COPY?

Cheers,

Jeff

Re: COPY and heap_sync

From
Amit Kapila
Date:
On Sat, Aug 30, 2014 at 11:56 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> If you insert tuples with COPY into a table created or truncated in the same transaction, at the end of the COPY it calls heap_sync.
>
> But there cases were people use COPY in a loop with a small amount of data in each statement.  Now it is calling heap_sync many times, and if NBuffers is large doing that gets very slow.
>
> Could the heap_sync be safely delayed until the end of the transaction, rather than the end of the COPY?

Wouldn't unconditionally delaying sync until end of transaction
can lead to burst of I/O at that time especially if there are many
such copy commands in a transaction, leading to delay in some
other operation's that might be happening concurrently in the
system.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: COPY and heap_sync

From
Atri Sharma
Date:


On Saturday, August 30, 2014, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Aug 30, 2014 at 11:56 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> If you insert tuples with COPY into a table created or truncated in the same transaction, at the end of the COPY it calls heap_sync.
>
> But there cases were people use COPY in a loop with a small amount of data in each statement.  Now it is calling heap_sync many times, and if NBuffers is large doing that gets very slow.
>
> Could the heap_sync be safely delayed until the end of the transaction, rather than the end of the COPY?

Wouldn't unconditionally delaying sync until end of transaction
can lead to burst of I/O at that time especially if there are many
such copy commands in a transaction, leading to delay in some
other operation's that might be happening concurrently in the
system.




I agree with that but then, it can provide us the same benefits like group commit,especially when most of the copy commands touch pages which are nearby,hence reducing the seek time overhead.

We could look at making it optional through a GUC, since it is useful albeit for some specific usecases.

Regards,

Atri 


--
Regards,
 
Atri
l'apprenant

Re: COPY and heap_sync

From
Fabrízio de Royes Mello
Date:
<div dir="ltr"><div class="gmail_extra"><br />On Sat, Aug 30, 2014 at 5:05 AM, Atri Sharma <<a
href="mailto:atri.jiit@gmail.com">atri.jiit@gmail.com</a>>wrote:<br />><br />><br />> On Saturday, August
30,2014, Amit Kapila <<a href="mailto:amit.kapila16@gmail.com">amit.kapila16@gmail.com</a>> wrote:<br />
>><br/>>> On Sat, Aug 30, 2014 at 11:56 AM, Jeff Janes <<a
href="mailto:jeff.janes@gmail.com">jeff.janes@gmail.com</a>>wrote:<br />>> ><br />>> > If you
inserttuples with COPY into a table created or truncated in the same transaction, at the end of the COPY it calls
heap_sync.<br/> >> ><br />>> > But there cases were people use COPY in a loop with a small amount of
datain each statement.  Now it is calling heap_sync many times, and if NBuffers is large doing that gets very slow.<br
/>>> ><br />>> > Could the heap_sync be safely delayed until the end of the transaction, rather than
theend of the COPY?<br />>><br />>> Wouldn't unconditionally delaying sync until end of transaction<br />
>>can lead to burst of I/O at that time especially if there are many<br />>> such copy commands in a
transaction,leading to delay in some<br />>> other operation's that might be happening concurrently in the<br />
>>system.<br />>><br />>><br />>><br />><br />> I agree with that but then, it can
provideus the same benefits like group commit,especially when most of the copy commands touch pages which are
nearby,hencereducing the seek time overhead.<br /> ><br />> We could look at making it optional through a GUC,
sinceit is useful albeit for some specific usecases.<br />><br /><br /></div><div class="gmail_extra">It's
interesting...maybe something analogous to "SET CONSTRAINTS DEFERRED"... <br /><br /></div><div class="gmail_extra">SET
COPYCOMMIT { IMMEDIATE | DEFERRED }<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">or<br
/><br/>SET COPY MODE { IMMEDIATE | DEFERRED }<br /><br /></div><div class="gmail_extra"> Just some thoughts!<br
/></div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Regards,<br /><br /></div><div
class="gmail_extra">--<br/>Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a
href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/> >> Blog: <a
href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/>>> Linkedin: <a
href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a
href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/> >> Github: <a
href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div></div>

Re: COPY and heap_sync

From
Peter Eisentraut
Date:
On 8/30/14 2:26 AM, Jeff Janes wrote:
> But there cases were people use COPY in a loop with a small amount of
> data in each statement.

What would be the reason for doing that?




Re: COPY and heap_sync

From
Fabrízio de Royes Mello
Date:
<div dir="ltr"><div class="gmail_extra"><br />On Sun, Aug 31, 2014 at 10:10 AM, Peter Eisentraut <<a
href="mailto:peter_e@gmx.net">peter_e@gmx.net</a>>wrote:<br />><br />> On 8/30/14 2:26 AM, Jeff Janes
wrote:<br/>> > But there cases were people use COPY in a loop with a small amount of<br /> > > data in each
statement.<br/>><br />> What would be the reason for doing that?<br />><br /><br />I used that to the same
thingmany times. In a company that I was employed we developed scripts to migrate data from one database do another.<br
/><br/>The first version we used INSERT statements and was very very slow. Then we wrote a second version changing the
INSERTby COPY statements. The performance was very better, but  we believe that could be better, so in the third
versionwe created some kind of "cache" (using arrays) to accumulate the records in memory then after N rows we build
theCOPY statement with the cache contents and run it. This was a really good performance improvement.<br /><br />It's
myuse case to we have a feature to postpone the heap_sync in COPY statements. I don't know if it's a feature that a lot
ofpeople wants, but IMHO it could be nice to improve the bulk load operations.<br /><br />Regards,<br /><br />--<br
/>Fabríziode Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a
href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog: <a
href="http://fabriziomello.github.io">http://fabriziomello.github.io</a><br/> >> Linkedin: <a
href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a
href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/>>> Github: <a
href="http://github.com/fabriziomello">http://github.com/fabriziomello</a></div></div>

Re: COPY and heap_sync

From
Jeff Janes
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Aug 31, 2014 at 6:10 AM, Peter Eisentraut
<spandir="ltr"><<a href="mailto:peter_e@gmx.net" target="_blank">peter_e@gmx.net</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On
8/30/142:26 AM, Jeff Janes wrote:<br /> > But there cases were people use COPY in a loop with a small amount of<br
/>> data in each statement.<br /><br /></div>What would be the reason for doing that?<br /><br
/></blockquote></div><br/></div><div class="gmail_extra">As far as I can tell, DRY.  They need code to do bulk inserts
anyway. So, just use that everywhere even when it is not in bulk.</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">Also,you can't interleave a copy command with other queries on the same connection.  So you code it
tostart a COPY, use it until you discover you need to run a query (because you encounter something not in you local
cache),end the COPY and do that query, then restart the query.  </div><div class="gmail_extra"><br /></div><div
class="gmail_extra">Undersome conditions, the interruption occurs very seldom, under other conditions it is pretty much
everyrow.</div><div class="gmail_extra"><br /></div><div class="gmail_extra"> Cheers,</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">Jeff</div></div> 

Re: COPY and heap_sync

From
Robert Haas
Date:
On Sat, Aug 30, 2014 at 2:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> If you insert tuples with COPY into a table created or truncated in the same
> transaction, at the end of the COPY it calls heap_sync.
>
> But there cases were people use COPY in a loop with a small amount of data
> in each statement.  Now it is calling heap_sync many times, and if NBuffers
> is large doing that gets very slow.
>
> Could the heap_sync be safely delayed until the end of the transaction,
> rather than the end of the COPY?

I don't think there's any data integrity problem with that, but if the
fsync() should fail it would be reported at commit time rather than in
response to the COPY.  That might be OK though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company