Thread: Practical Cursors
Hello, We are just about to wrap up our book, Pratical PostgreSQL (the one that used to be The Elephant never forgets) and we have a question. There is an obvious benefit to the use of cursors within a persistent environment. In other words, if my connection to the database is live, I can increase my query and display efficiency through the use of a cursor. However, this seems to be useless within a web based environment. If we have a live connection through a C++ application, we can perform a transaction and interact within the results. However, using something like PHP will not allow this because HTTP is stateless and PostgreSQL will not know from one transaction to the next that the results of the connection are related. Is this truly the case, or is there a way for PostgreSQL to remember the connection identifier so that the next time a PHP connection is made with the same identifier a transaction can be completed? Sincerely, Joshua Drake -- Practical PostgreSQL: http://stage.linuxports.com/projects/postgres/book1.htm by way of: pgsql-general@commandprompt.com
> However, using something like PHP will not allow this because HTTP is > stateless and PostgreSQL will not know from one transaction to the next > that the results of the connection are related. > > Is this truly the case, or is there a way for PostgreSQL to remember the > connection identifier so that the next time a PHP connection is made with > the same identifier a transaction can be completed? > > Sincerely, > > Joshua Drake > I have actually been thinking recently about just this question. I looked through the source for the PHP pgsql extension and it is clear that no cursor is used. There is, however, the ability to open a persistent connection to PostgreSQL from PHP. I think it would be possible to create a persistent resource identifier to a cursor in a similar manner, and "reuse" the cursor across multiple HTTP requests. The downside I can see is that PHP would have no way to know when it could garbage-collect the allocated resource. So while it could be done, what would happen when the user closes their browser with your cursor still open? -- Joe
> The downside I can see is that PHP would have no way to know when it could > garbage-collect the allocated resource. So while it could be done, what > would happen when the user closes their browser with your cursor still open? Well, I thought of two things. One PostgreSQL could have some identifier that was set by PHP, similar to a session ID that would have a time out. Like a cookie timeout. If PostgreSQL did not receive a CLOSE or COMMIT within that time limit, it would drop the transaction. On the other hand, you could have it track the transaction so that the next time I come back, if my identifier it matched to my session you could code an application that will inform you of the transactions that you that are not completed. Alas, that would be cool but would take a lot of code on the PHP end. The second thought I had was a small java applet that kept a live connection to PostgreSQL and proxied certain things but that seems extreme. J > > -- Joe > -- -- by way of pgsql-general@commandprompt.com
On Mon, 17 Sep 2001, Command Prompt, Inc. wrote: > However, using something like PHP will not allow this because HTTP is > stateless and PostgreSQL will not know from one transaction to the next > that the results of the connection are related. > > Is this truly the case, or is there a way for PostgreSQL to remember the > connection identifier so that the next time a PHP connection is made with > the same identifier a transaction can be completed? Not currently. Cursor right now is a per-backend thing. A lot of changes are needed to make cursors global. (particularly transaction-awareness) -alex
On Mon, 17 Sep 2001, Command Prompt, Inc. wrote: > Is this truly the case, or is there a way for PostgreSQL to remember the > connection identifier so that the next time a PHP connection is made with > the same identifier a transaction can be completed? Anything is possible - you could write a server process that associates cursors with session IDs and then have your PHP app make requests through the server. Would it be a good idea? Probably not. Would it qualify as "practical postgres" usage? Definitely not! I'd call it an impractical idea that might be possible through a lot of hard work. -sam
(sorry to reply to a week-old message. need to keep up with this list more!) On Monday 17 September 2001 17:04, you wrote: > There is an obvious benefit to the use of cursors within a persistent > environment. In other words, if my connection to the database is live, I > can increase my query and display efficiency through the use of a cursor. > > However, this seems to be useless within a web based environment. If we > have a live connection through a C++ application, we can perform a > transaction and interact within the results. Yep. That seems to be a disadvantage with ALL database systems & HTTP based apps. I once wrote a MySQL app (I know, but that's what the company used) to do a fairly complicated search on a huge database full of domain names. The query was time consuming (10-30 seconds) so it obviously could not be performed for every prev/next page request. My first approach was to have the PHP script write the entire data resultset to a fixed-length file, which could be easily accessed for each request to the point where the user was in the file. Only problem there was when the result set was large, initial query time was significantly longer. And that happened a lot. I then wrote a daemon in C to do the work and store the results in RAM. The PHP script connected to the daemon via a socket, and passed a request ID and the numbers of the records it wanted. Sure, it was convoluted, but I actually got the speed up to where I was fairly happy with it. If there's a better solution than that, I'm not aware of it. But like someone else mentioned, it's not quite "practical" database usage. -- Like to travel? http://TravTalk.org Micah Yoder Internet Development http://yoderdev.com
>>>>> "Micah" == Micah Yoder <yodermk@home.com> writes: Micah> I then wrote a daemon in C to do the work and store the results Micah> in RAM. The PHP script connected to the daemon via a socket, Micah> and passed a request ID and the numbers of the records it Micah> wanted. Sure, it was convoluted, but I actually got the speed Micah> up to where I was fairly happy with it. Micah> If there's a better solution than that, I'm not aware of it. What you need is a "middleware" (heh) solution... I've done a number of magazine columns using a Perl-based mini-web-server (search google for "site:stonehenge.com HTTP::Daemon"). You could set up your frontware begin-search CGI script to fire off a daemon, and tell it to do the hard work, using a session ID to keep track of subsequent hits, and let the front CGI request proxy through using HTTP to talk to the daemon (very easy). Or, you could redirect the client to the mini-daemon directly. I've done the latter a couple of times, and it's pretty practical up to light-commercial-volume hits. And before you scream "Perl is slow", I have a working Perl webserver based on HTTP::Daemon that pre-forks just like Apache (taking current load into consideration), handles streaming proxy connections (including SSL proxy) and local file delivery, and benchmarks at *half* the speed of Apache in both proxy and local file mode. That's much better than I initially expected, and completely satisfactory for most applications. And it's only 300 lines of code. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Micah Yoder wrote: > (sorry to reply to a week-old message. need to keep up with this list more!) > > On Monday 17 September 2001 17:04, you wrote: > > > There is an obvious benefit to the use of cursors within a persistent > > environment. In other words, if my connection to the database is live, I > > can increase my query and display efficiency through the use of a cursor. > > > > However, this seems to be useless within a web based environment. If we > > have a live connection through a C++ application, we can perform a > > transaction and interact within the results. > > Yep. That seems to be a disadvantage with ALL database systems & HTTP based > apps. > > I once wrote a MySQL app (I know, but that's what the company used) to do a > fairly complicated search on a huge database full of domain names. The query > was time consuming (10-30 seconds) so it obviously could not be performed for > every prev/next page request. > > My first approach was to have the PHP script write the entire data resultset > to a fixed-length file, which could be easily accessed for each request to > the point where the user was in the file. Only problem there was when the > result set was large, initial query time was significantly longer. And that > happened a lot. > > I then wrote a daemon in C to do the work and store the results in RAM. The > PHP script connected to the daemon via a socket, and passed a request ID and > the numbers of the records it wanted. Sure, it was convoluted, but I > actually got the speed up to where I was fairly happy with it. > > If there's a better solution than that, I'm not aware of it. > > But like someone else mentioned, it's not quite "practical" database usage. Since search engines and data warehousing tend to have huge databases with sometimes complicated, long running queries that produce empty to huge result sets, it's a quite common problem. Thus, I would consider any solution that leads to success at first "practical". PHP together with cursors might be an alternate solution. You open a cursor for the entire result set. You have a function that fetches the next n rows from the cursor and generates the resulting html output in a file. It returns true if more rows have been found. You call it once and if it returns false display "No match found" or so. If it returns true, you call it again to create a cache file for the second result page, and know if there will be one (telling you if to provide a NEXT button). You register a shutdown function that will call the cache file generator another m times. Now you display the first cache file, leave the DB connection with the open transaction and cursor where they are and exit. The user will already see the first result page while your server is still working. After calling the cache file generator function m times, the shutdown function closes the cursor, terminates the transaction and closes the DB connection. I think 95% of users will not hit NEXT more than 10 times before refining their search, so that should be enough. If one really does, well, than you'd have to run the entire query again and this time create either more cache files or all of them. Now you need some sort of vacuum cleaner for the cache files and are done. The drawback for this solution is, that you don't know how many pages there will be in total when you display the first one. But the benefits are that it fit's into the connectionless HTTP nature, has a small resource footprint, provides first results early and does not require open transactions over user interaction. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
On Tue, Sep 25, 2001 at 12:06:48AM -0400, Micah Yoder wrote: > (sorry to reply to a week-old message. need to keep up with this list more!) Ditto, but more so. > I then wrote a daemon in C to do the work and store the results in RAM. The > PHP script connected to the daemon via a socket, and passed a request ID and > the numbers of the records it wanted. Sure, it was convoluted, but I > actually got the speed up to where I was fairly happy with it. > > If there's a better solution than that, I'm not aware of it. A technique I've used with some success is to select the primary keys from the rows you're interested in, and only have to memorize a list of integers. Then for each page, select the rows "WHERE pkey IN (...)". It's sort of a middle ground as far as tradeoffs go. You don't have to store a huge amount of data in RAM or temporary files, but you still have to do the work up front. The problem I have with persistent per-session connections is that you end up having basically the same (per-transaction-overhead * simultaneous-transactions), and you add (per-connection-overhead * simultaneous-open-sessions) on top. There are certainly situations where you can do better one way or the other.. figuring out how to best tune the per-session case scares me. -- Christopher Masto CB461C61 8AFC E3A8 7CE5 9023 B35D C26A D849 1F6E CB46 1C61