Thread: I want to change libpq and libpgtcl for better handling of large query results

I want to change libpq and libpgtcl for better handling of large query results

From
Constantin Teodorescu
Date:
Hello all,

WARNING : It's a long mail, but please have patience and read it *all*

I have reached a point in developing PgAccess when I discovered that
some functions in libpgtcl are bad implemented and even libpq does not
help me a lot!

What's the problem ! Is about working with large queries result, big
tables with thousand of records that have to be processed in order to
get a full report for example.

Getting a query result from Tcl/Tk (pg_select function) uses PQexec.
But PQexec IS GETTING ALL THE RECORDS IN MEMORY and after that user can
handle query results.
But what if table has thousand records ? Probably I would need more than
512 Mb of RAM in order to get a report finished.

Viorel Mitache from RENEL Braila, (mviorel@flex.ro, please cc him and me
because we aren't on hacker list) proposed another sollution.

With some small changes in libpq-fe.h

 (  void (* callback)(PGresult *,void *ptr,int stat);
    void *usr_ptr;)

and also in libpq to allow a newly defined function in libpgtcl
(pg_loop) to initiate a query and then calling back a user defined
function after every record fetched from the connection.

In order to do this, the connection is 'cloned' and on this new
connection the query is issued. For every record fetched, the C callback
function is called, here the Tcl interpreted is invoked for the source
inside the loop, then memory used by the record is release and the next
record is ready to come.
More than that, after processing some records, user can choose to break
the loop (using break command in Tcl) that is actually breaking the
connection.

What we achieve making this patches ?

First of all the ability of sequential processing large tables.
Then increasing performance due to parallel execution of receiving data
on the network and local processing. The backend process on the server
is filling the communication channel with data and the local task is
processing it as it comes.
In the old version, the local task has to wait until *all* data has
comed (buffered in memory if it was room enough) and then processing it.

What I would ask from you?
1) First of all, if my needs could be satisfied in other way with
current functions in libpq of libpgtcl. I can assure you that with
current libpgtcl is rather impossible. I am not sure if there is another
mechanism using some subtle functions that I didn't know about them.
2) Then, if you agree with the idea, to whom we must send more accurate
the changes that we would like to make in order to be analysed and
checked for further development of Pg.
3) Is there any other normal mode to tell to the backend not to send any
more tuples instead of breaking the connection ?
4) Even working in C, using PQexec , it's impossible to handle large
query results, am I true ?

Please cc to : mviorel@flex.ro      and also     teo@flex.ro

--
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA

> Getting a query result from Tcl/Tk (pg_select function) uses PQexec.
> But PQexec IS GETTING ALL THE RECORDS IN MEMORY and after that user can
> handle query results.
> But what if table has thousand records ? Probably I would need more than
> 512 Mb of RAM in order to get a report finished.

This issue has come up before.  The accepted solution is to open a
cursor, and fetch whatever records you need.  The backend still
generates the full result, but the front end requests the records it
wants.

Does that not work in your case?


--
Bruce Momjian
maillist@candle.pha.pa.us

On Mon, 5 Jan 1998, Constantin Teodorescu wrote:

> In order to do this, the connection is 'cloned' and on this new
> connection the query is issued. For every record fetched, the C callback
> function is called, here the Tcl interpreted is invoked for the source
> inside the loop, then memory used by the record is release and the next
> record is ready to come.
> More than that, after processing some records, user can choose to break
> the loop (using break command in Tcl) that is actually breaking the
> connection.
>
> What we achieve making this patches ?
>
> First of all the ability of sequential processing large tables.
> Then increasing performance due to parallel execution of receiving data
> on the network and local processing. The backend process on the server
> is filling the communication channel with data and the local task is
> processing it as it comes.
> In the old version, the local task has to wait until *all* data has
> comed (buffered in memory if it was room enough) and then processing it.
>
> What I would ask from you?
> 1) First of all, if my needs could be satisfied in other way with
> current functions in libpq of libpgtcl. I can assure you that with
> current libpgtcl is rather impossible. I am not sure if there is another
> mechanism using some subtle functions that I didn't know about them.

    Bruce answered this one by asking about cursors...

> 2) Then, if you agree with the idea, to whom we must send more accurate
> the changes that we would like to make in order to be analysed and
> checked for further development of Pg.

    Here, on this mailing list...

    Now, let's see if I understand what you are thinking of...

    Basically, by "cloning", you are effectively looking at implementing ftp's
way of dealing with a connection, having one "control" channel, and one "data"
channel, is this right?  So that the "frontend" has a means of sending a STOP
command to the backend even while the backend is still sending the frontend
the data?

    Now, from reading Bruce's email before reading this, this doesn't get
around the fact that the backend is still going to have to finish generating
a response to the query before it can send *any* data back, so, as Bruce has
asked, don't cursors already provide what you are looking for?  With cursors,
as I understand it, you basically tell the backend to send forward X tuples at
a time and after that, if you want to break the connection, you just break
the connection.

    With what you are proposing (again, if I'm understanding correctly), the
frontend would effectively accept X bytes of data (or X tuples) and then it
would have an opportunity to send back a STOP over a control channel...

    Oversimplified, I know, but I'm a simple man *grin*

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [HACKERS] I want to change libpq and libpgtcl for better handling of large query results

From
Constantin Teodorescu
Date:
The Hermit Hacker wrote:
>
> > What I would ask from you?
> > 1) First of all, if my needs could be satisfied in other way with
> > current functions in libpq of libpgtcl. I can assure you that with
> > current libpgtcl is rather impossible. I am not sure if there is another
> > mechanism using some subtle functions that I didn't know about them.
>
>         Bruce answered this one by asking about cursors...

Yes. It's true. I have used cursors for speeding up opening tables in
PgAccess fetching only the first 200 records from the table.
But for a 10 thousand record table I will send over the network 10
thousand "FETCH 1 IN CURSOR" because in a report table I am processing
records one by one.
The time for this kind of retrieval would be more than twice as in the
'callback' mechanism.

If you think that is better to keep libpq and libpgtcl as they are, then
I will use cursors.
But using the 'callback' method it would increase performance.

I am waiting for the final resolution :-)

>         Basically, by "cloning", you are effectively looking at implementing ftp's
> way of dealing with a connection, having one "control" channel, and one "data"
> channel, is this right?  So that the "frontend" has a means of sending a STOP
> command to the backend even while the backend is still sending the frontend
> the data?

Not exactly. Looking from Tcl/Tk point of view, the mechanism is
transparent. I am using this structure :

pg_loop $database "select * from sometable" record {
    set something $record(somefield)
}

But the new libpgtcl is opening a 'cloned' connection in order to :
- send the query through it
- receive the data from it
I am not able to break the connection using commands send through the
'original' one. The query is 'stopped' by breaking the connection.
That's why we needed another connection. Because there isn't (yet) a
mechanism to tell the backend to abort transmission of the rest of the
query. I understand that the backend is not reading any more the socket
in order to receive some CANCEL signal from the frontend. So, dropping
the rest of the query results isn't possible without a hard break of the
connection.


--
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA

On Mon, 5 Jan 1998, Constantin Teodorescu wrote:

> I have reached a point in developing PgAccess when I discovered that
> some functions in libpgtcl are bad implemented and even libpq does not
> help me a lot!
>
> What's the problem ! Is about working with large queries result, big
> tables with thousand of records that have to be processed in order to
> get a full report for example.

In the past, I've had a lot of people complaining about the performance
(or lack of) when handling large results in JDBC.

> Getting a query result from Tcl/Tk (pg_select function) uses PQexec.
> But PQexec IS GETTING ALL THE RECORDS IN MEMORY and after that user can
> handle query results.
> But what if table has thousand records ? Probably I would need more than
> 512 Mb of RAM in order to get a report finished.

The only solution I was able to give was for them to use cursors, and
fetch the result in chunks.

> With some small changes in libpq-fe.h
>
>  (  void (* callback)(PGresult *,void *ptr,int stat);
>     void *usr_ptr;)
>
> and also in libpq to allow a newly defined function in libpgtcl
> (pg_loop) to initiate a query and then calling back a user defined
> function after every record fetched from the connection.
>
> In order to do this, the connection is 'cloned' and on this new
> connection the query is issued. For every record fetched, the C callback
> function is called, here the Tcl interpreted is invoked for the source
> inside the loop, then memory used by the record is release and the next
> record is ready to come.

I understand the idea here as I've use this trick before with tcl, but
this could cause a problem with the other languages that we support. I
don't know how this would be done for Perl, but with Java, the JDBC spec
doesn't have this type of callback.

Some time back (around v6.0), I did look at having a seperate thread on
the client, that read the results in the background, and the foreground
thread would then get the results almost immediately. It would only wait,
if it had read everything transfered so far, and (as JDBC cannot go back a
row in a ResultSet), the read rows are freed once used.

Although the idea was sound, in practice, it didn't work well. Not every
JVM implemented threading in the same way, so it locked up a lot. In the
end, the idea was dropped.

> More than that, after processing some records, user can choose to break
> the loop (using break command in Tcl) that is actually breaking the
> connection.

What side effects could this have to the backend if the second connection
is broken. I think the existing code would simply terminate.

> What we achieve making this patches ?
>
> First of all the ability of sequential processing large tables.
> Then increasing performance due to parallel execution of receiving data
> on the network and local processing. The backend process on the server
> is filling the communication channel with data and the local task is
> processing it as it comes.
> In the old version, the local task has to wait until *all* data has
> comed (buffered in memory if it was room enough) and then processing it.

> What I would ask from you?
> 1) First of all, if my needs could be satisfied in other way with
> current functions in libpq of libpgtcl. I can assure you that with
> current libpgtcl is rather impossible. I am not sure if there is another
> mechanism using some subtle functions that I didn't know about them.

We were talking about some changes to the protocol. Perhaps, we could do
something like changing it so it sends the result in blocks of tuples,
rather than everything in one block. Then, in between each packet, an ACK
or CAN style packet could be sent to the backend, either asking for the
next, or canceling the results.

Another alternative is (as an option definable by the client at run time)
to have results open another connection on a per-result basis (aka FTP).
However, I can see a performance hit with the overhead involved in opening
a new connection every time.

Also, I can see a possible problem:-

Say, a client has executed a query, which returns a large number of rows.
We have read in the first 100 rows. The backend still has the majority of
the result queued up behind it.

Now in JDBC, we have getAsciiStream/getBinaryStream/getUnicodeStream which
are the standard way of getting at BLOBS.

If one of the columns is a blob, and the client tries to read from the
blob, it will fail, because we are not in the main loop in the backend
(were still transfering a result, and BLOBS use fastpath).

There are ways around this, but things could get messy if were not
careful.

> 3) Is there any other normal mode to tell to the backend not to send any
> more tuples instead of breaking the connection ?

Apart from using cursors, not that I know of.

> 4) Even working in C, using PQexec , it's impossible to handle large
> query results, am I true ?

Memory is the only limitation to this.

> Please cc to : mviorel@flex.ro      and also     teo@flex.ro

Done..

It would be interesting to see what the others think. Both TCL & Java are
close relatives, and Sun are working on a TCL extension to Java, so any
changes could (in the future) help both of us.

--
Peter T Mount  petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk


>
> The Hermit Hacker wrote:
> >
> > > What I would ask from you?
> > > 1) First of all, if my needs could be satisfied in other way with
> > > current functions in libpq of libpgtcl. I can assure you that with
> > > current libpgtcl is rather impossible. I am not sure if there is another
> > > mechanism using some subtle functions that I didn't know about them.
> >
> >         Bruce answered this one by asking about cursors...
>
> Yes. It's true. I have used cursors for speeding up opening tables in
> PgAccess fetching only the first 200 records from the table.
> But for a 10 thousand record table I will send over the network 10
> thousand "FETCH 1 IN CURSOR" because in a report table I am processing
> records one by one.
> The time for this kind of retrieval would be more than twice as in the
> 'callback' mechanism.

You can tell fetch to give you as many records as you want, so you can
read in 100-tuple blocks.

>
> If you think that is better to keep libpq and libpgtcl as they are, then
> I will use cursors.
> But using the 'callback' method it would increase performance.
>
> I am waiting for the final resolution :-)
>
> >         Basically, by "cloning", you are effectively looking at implementing ftp's
> > way of dealing with a connection, having one "control" channel, and one "data"
> > channel, is this right?  So that the "frontend" has a means of sending a STOP
> > command to the backend even while the backend is still sending the frontend
> > the data?
>
> Not exactly. Looking from Tcl/Tk point of view, the mechanism is
> transparent. I am using this structure :
>
> pg_loop $database "select * from sometable" record {
>     set something $record(somefield)
> }
>
> But the new libpgtcl is opening a 'cloned' connection in order to :
> - send the query through it
> - receive the data from it
> I am not able to break the connection using commands send through the
> 'original' one. The query is 'stopped' by breaking the connection.
> That's why we needed another connection. Because there isn't (yet) a
> mechanism to tell the backend to abort transmission of the rest of the
> query. I understand that the backend is not reading any more the socket
> in order to receive some CANCEL signal from the frontend. So, dropping
> the rest of the query results isn't possible without a hard break of the
> connection.

We have this on the TODO list.  We could use the TCP/IP out-of-band
connection option to inform the backend to stop things, but no one has
implemented it yet.  (For the new Unix domain sockets, we could use
signals.)  Anyone want to tackle it?

man send shows:

     The flags parameter may include one or more of the following:

     #define MSG_OOB        0x1  /* process out-of-band data */
     #define MSG_DONTROUTE  0x4  /* bypass routing, use direct interface */

     The flag MSG_OOB is used to send ``out-of-band'' data on sockets that
     support this notion (e.g.  SOCK_STREAM); the underlying protocol must al-
     so support ``out-of-band'' data.  MSG_DONTROUTE is usually used only by
     diagnostic or routing programs.


--
Bruce Momjian
maillist@candle.pha.pa.us

Sorry, just to clear things:

> We were talking about some changes to the protocol. Perhaps, we could do
> something like changing it so it sends the result in blocks of tuples,
> rather than everything in one block. Then, in between each packet, an ACK
              ^^^^^^^^^^^^^^^^^^^^^^^

Backend sends tuples one by one - just after executor gets next tuple
from upper plan, backend sends this tuple to client-side.

> or CAN style packet could be sent to the backend, either asking for the
> next, or canceling the results.

Vadim

On Tue, 6 Jan 1998, Bruce Momjian wrote:

> We have this on the TODO list.  We could use the TCP/IP out-of-band
> connection option to inform the backend to stop things, but no one has
> implemented it yet.  (For the new Unix domain sockets, we could use
> signals.)  Anyone want to tackle it?

I'll have to check, but I'm not sure if OOB is possible with Java.

--
Peter T Mount  petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk


On Wed, 7 Jan 1998, Vadim B. Mikheev wrote:

> Sorry, just to clear things:
>
> > We were talking about some changes to the protocol. Perhaps, we could do
> > something like changing it so it sends the result in blocks of tuples,
> > rather than everything in one block. Then, in between each packet, an ACK
>               ^^^^^^^^^^^^^^^^^^^^^^^
>
> Backend sends tuples one by one - just after executor gets next tuple
> from upper plan, backend sends this tuple to client-side.

Oops, of course it does, sorry ;-)

> > or CAN style packet could be sent to the backend, either asking for the
> > next, or canceling the results.

--
Peter T Mount  petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk


Re: [HACKERS] I want to change libpq and libpgtcl for better handling of large query results

From
Constantin Teodorescu
Date:
Peter T Mount wrote:
>
> The only solution I was able to give was for them to use cursors, and
> fetch the result in chunks.

Got it!!!

Seems everyone has 'voted' for using cursors.

As a matter of fact, I have tested both a
BEGIN ; DECLARE CURSOR ; FETCH N; END;
and a
SELECT FROM

Both of them are locking for write the tables that they use, until end
of processing.

Fetching records in chunks (100) would speed up a little the processing.

But I am still convinced that if frontend would be able to process
tuples as soon as they come, the overall time of processing a big table
would be less.
Fetching in chunks, the frontend waits for the 100 records to come (time
A) and then process them (time B). A and B cannot be overlapped.

Thanks a lot for helping me to decide. Reports in PgAccess will use
cursors.

--
Constantin Teodorescu
FLEX Consulting Braila, ROMANIA

As far as I understood, this seems to be another solution to the older
problem of speeding up the browser display of large results. The first one
consisted on nonblocking exec/blocking fetchtuple in libpq (the patch is
very simple). But the main point is that I learned at that time that
backend send tuples as soon as it computes them.

Can someone give an authorized answer?

On Tue, 6 Jan 1998, The Hermit Hacker wrote:

> On Mon, 5 Jan 1998, Constantin Teodorescu wrote:
...
>
>     Now, from reading Bruce's email before reading this, this doesn't get
> around the fact that the backend is still going to have to finish generating
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> a response to the query before it can send *any* data back, so, as Bruce has
...
>
> Marc G. Fournier
> Systems Administrator @ hub.org
> primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org
>

PS. On the other hand, if someone is working on back/front protocol, could
he think about how difficult would be to have a async full duplex
connection?

Costin Oproiu