Re: [HACKERS] I want to change libpq and libpgtcl for better handling of large query results - Mailing list pgsql-hackers
From | Peter T Mount |
---|---|
Subject | Re: [HACKERS] I want to change libpq and libpgtcl for better handling of large query results |
Date | |
Msg-id | Pine.LNX.3.95.980106113350.16836A-100000@maidast Whole thread Raw |
In response to | I want to change libpq and libpgtcl for better handling of large query results (Constantin Teodorescu <teo@flex.ro>) |
List | pgsql-hackers |
On Mon, 5 Jan 1998, Constantin Teodorescu wrote: > I have reached a point in developing PgAccess when I discovered that > some functions in libpgtcl are bad implemented and even libpq does not > help me a lot! > > What's the problem ! Is about working with large queries result, big > tables with thousand of records that have to be processed in order to > get a full report for example. In the past, I've had a lot of people complaining about the performance (or lack of) when handling large results in JDBC. > Getting a query result from Tcl/Tk (pg_select function) uses PQexec. > But PQexec IS GETTING ALL THE RECORDS IN MEMORY and after that user can > handle query results. > But what if table has thousand records ? Probably I would need more than > 512 Mb of RAM in order to get a report finished. The only solution I was able to give was for them to use cursors, and fetch the result in chunks. > With some small changes in libpq-fe.h > > ( void (* callback)(PGresult *,void *ptr,int stat); > void *usr_ptr;) > > and also in libpq to allow a newly defined function in libpgtcl > (pg_loop) to initiate a query and then calling back a user defined > function after every record fetched from the connection. > > In order to do this, the connection is 'cloned' and on this new > connection the query is issued. For every record fetched, the C callback > function is called, here the Tcl interpreted is invoked for the source > inside the loop, then memory used by the record is release and the next > record is ready to come. I understand the idea here as I've use this trick before with tcl, but this could cause a problem with the other languages that we support. I don't know how this would be done for Perl, but with Java, the JDBC spec doesn't have this type of callback. Some time back (around v6.0), I did look at having a seperate thread on the client, that read the results in the background, and the foreground thread would then get the results almost immediately. It would only wait, if it had read everything transfered so far, and (as JDBC cannot go back a row in a ResultSet), the read rows are freed once used. Although the idea was sound, in practice, it didn't work well. Not every JVM implemented threading in the same way, so it locked up a lot. In the end, the idea was dropped. > More than that, after processing some records, user can choose to break > the loop (using break command in Tcl) that is actually breaking the > connection. What side effects could this have to the backend if the second connection is broken. I think the existing code would simply terminate. > What we achieve making this patches ? > > First of all the ability of sequential processing large tables. > Then increasing performance due to parallel execution of receiving data > on the network and local processing. The backend process on the server > is filling the communication channel with data and the local task is > processing it as it comes. > In the old version, the local task has to wait until *all* data has > comed (buffered in memory if it was room enough) and then processing it. > What I would ask from you? > 1) First of all, if my needs could be satisfied in other way with > current functions in libpq of libpgtcl. I can assure you that with > current libpgtcl is rather impossible. I am not sure if there is another > mechanism using some subtle functions that I didn't know about them. We were talking about some changes to the protocol. Perhaps, we could do something like changing it so it sends the result in blocks of tuples, rather than everything in one block. Then, in between each packet, an ACK or CAN style packet could be sent to the backend, either asking for the next, or canceling the results. Another alternative is (as an option definable by the client at run time) to have results open another connection on a per-result basis (aka FTP). However, I can see a performance hit with the overhead involved in opening a new connection every time. Also, I can see a possible problem:- Say, a client has executed a query, which returns a large number of rows. We have read in the first 100 rows. The backend still has the majority of the result queued up behind it. Now in JDBC, we have getAsciiStream/getBinaryStream/getUnicodeStream which are the standard way of getting at BLOBS. If one of the columns is a blob, and the client tries to read from the blob, it will fail, because we are not in the main loop in the backend (were still transfering a result, and BLOBS use fastpath). There are ways around this, but things could get messy if were not careful. > 3) Is there any other normal mode to tell to the backend not to send any > more tuples instead of breaking the connection ? Apart from using cursors, not that I know of. > 4) Even working in C, using PQexec , it's impossible to handle large > query results, am I true ? Memory is the only limitation to this. > Please cc to : mviorel@flex.ro and also teo@flex.ro Done.. It would be interesting to see what the others think. Both TCL & Java are close relatives, and Sun are working on a TCL extension to Java, so any changes could (in the future) help both of us. -- Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk Main Homepage: http://www.demon.co.uk/finder Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk
pgsql-hackers by date: