Thread: libpgtcl COPY out problem - why PQgetlineAsync() ?

libpgtcl COPY out problem - why PQgetlineAsync() ?

From
ljb
Date:
While working to get Pgaccess up with my pure-Tcl interface (pgin.tcl,
available on gborg), I found that COPY TO STDOUT is not working with
libpgtcl. Or, it is working so slowly it might as well not work. Or, it
works for small data sets but fails for large ones. This happens with both
the bundled PostgreSQL-7.3.2 libpgtcl and the beta libpgtcl on gborg.  For
example, with Pgaccess, I can import records at about 5,000 per second, but
when I export records it goes at about 20 per second.

It looks to me like the problem is in PgInputProc() which reads from the
PostgreSQL connection during COPY TO STDOUT. Why is this doing non-blocking
reads using PQgetlineAsync()? If there isn't a complete line of data
available from the backend, this will return 0, which Tcl will see as EOF
since the channel is in non-blocking mode. Am I missing something?



Re: libpgtcl COPY out problem - why PQgetlineAsync() ?

From
Tom Lane
Date:
ljb <lbayuk@mindspring.com> writes:
> It looks to me like the problem is in PgInputProc() which reads from the
> PostgreSQL connection during COPY TO STDOUT. Why is this doing non-blocking
> reads using PQgetlineAsync()?

So that it won't block, of course ;-)

> If there isn't a complete line of data
> available from the backend, this will return 0, which Tcl will see as EOF
> since the channel is in non-blocking mode. Am I missing something?

It won't see it as EOF, *because* the channel is in non-blocking mode.

I don't see a reason for a performance issue here.  At worst, this code
should execute one extra traversal of PgInputProc per received packet.
Can you dig into it and find out what's really happening?
        regards, tom lane



Re: libpgtcl COPY out problem - why PQgetlineAsync() ?

From
ljb
Date:
tgl@sss.pgh.pa.us wrote:
> ljb <lbayuk@mindspring.com> writes:
> > It looks to me like the problem is in PgInputProc() which reads from the
> > PostgreSQL connection during COPY TO STDOUT. Why is this doing non-blocking
> > reads using PQgetlineAsync()?
> 
> So that it won't block, of course ;-)
> 
> > If there isn't a complete line of data
> > available from the backend, this will return 0, which Tcl will see as EOF
> > since the channel is in non-blocking mode. Am I missing something?
> 
> It won't see it as EOF, *because* the channel is in non-blocking mode.

Why do you say the channel is in non-blocking mode? After COPY table TO STDOUT,
I run [fconfigure $db] and I see:  -blocking 1 -buffering line -buffersize 4096  -encoding iso8859-1 -eofchar {{} {}}
-translation{auto lf}
 

So the channel is in blocking mode, isn't it? (The channel implementation
in pgtclId.c doesn't have a BlockModeProc, so I don't think it implements
non-blocking mode at all.) But it looks to me like libpgtcl is talking to
the PostgreSQL back-end in non-blocking mode during COPY TO STDOUT, while
the Tcl channel is in blocking mode on the other side.

> I don't see a reason for a performance issue here.  At worst, this code
> should execute one extra traversal of PgInputProc per received packet.
> Can you dig into it and find out what's really happening?

OK, I'm not entirely convinced the blocking/non-blocking thing is what
is killing performance. When exporting 10,000 records by waiting for
the pg_result -status to change, my test ignored 1,279 "gets" calls which
returned 0. That doesn't seem to be enough to make it so slow.

But besides being slow, it is still wrong. The (old) README in libpgtcl
says:
|  I/O routines are now defined for the connection handle.  I/O to/from
|  the connection is only valid under certain circumstances:
|  ...
|  For copy outs, read until the standard EOF indication is encountered.                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
This does not work. You cannot read until EOF because it can stop before
you have read all the records (and you lose sync with the backend). This
seems to depend on the size of the data and how fast your client and server
are.  (If I put "after 10" inside the loop, for a 10ms delay, it reads all
10,000 records, but slowly. Without the delay, it reads 138 and gives up.)