Thread: Large selects handled inefficiently?
Hiya, I am running a very large SELECT - it selects every row from a ~10 000 000 row table. I'm running this in Perl using DBD:Pg, with the general sequence: $sth = $dbh->prepare("SELECT $fields FROM $from") || return 0; $fh = new FileHandle(">$file") || die "Can't open $file : $!"; $sth->execute() || return 0; $sth->bind_columns(undef,\(@fields)); while ($sth->fetch){ print $fh join("\t",@fields),"\n"; } if ($sth->err) { return 0; } close $fh; ...which works fine in most circumstance, but on this really big select: * the client process's memory gradually climbs to around 1 gigabyte (ouch!) * eventually the stdout of the process shows over and over again: Backend sent D message without prior T It seems as if the client is fetching the /whole/ query result, or trying to, all at once. I expected it to only actually fetch result rows as I called fetch(). Is this: * A deficiency in DBD::Pg? * A deficiency in the postgresql client libraries? * A silly bug in my code? I believe I can work around this problem using cursors (although I don't know how well DBD::Pg copes with cursors). However, that doesn't seem right -- cursors should be needed to fetch a large query without having it all in memory at once... Jules
> I believe I can work around this problem using cursors (although I > don't know how well DBD::Pg copes with cursors). However, that > doesn't seem right -- cursors should be needed to fetch a large query > without having it all in memory at once... Actually, I think thats why cursors were invented in the first place ;-) A cursor is what you are using if you're not fetching all the results of a query. - Andrew
On Thu, Aug 31, 2000 at 12:22:36AM +1000, Andrew Snow wrote: > > > I believe I can work around this problem using cursors (although I > > don't know how well DBD::Pg copes with cursors). However, that > > doesn't seem right -- cursors should be needed to fetch a large query > > without having it all in memory at once... > > Actually, I think thats why cursors were invented in the first place ;-) A > cursor is what you are using if you're not fetching all the results of a > query. I really can't agree with you there. A cursor is another slightly foolish SQL hack. A query language specifies the syntax of queries ('SELECT ...'). It doesn't specify the manner in which these are actually returned. It seems totally within the bounds of the remit of a decent client-side library (and a decent back-end) to realise that in practice a client will want some control over the speed with which rows are returned. Whilst explicit cursors are needed for some (IMO ugly) procedural SQL code, explicit cursors should not be necessary for the simple (and common) task of carrying out a SELECT which takes up more memory than you wish to have available at any single time. Jules
On Wed, 30 Aug 2000, Jules Bean wrote: > * the client process's memory gradually climbs to around 1 gigabyte > (ouch!) Have you tried playing with RowCacheSize in DBI? Not sure if it works with DBD::Pg. Andrew
> A cursor is another slightly foolish SQL hack. > > A query language specifies the syntax of queries ('SELECT ...'). It > doesn't specify the manner in which these are actually returned. It > seems totally within the bounds of the remit of a decent client-side > library (and a decent back-end) to realise that in practice a client > will want some control over the speed with which rows are returned. > > Whilst explicit cursors are needed for some (IMO ugly) procedural SQL > code, explicit cursors should not be necessary for the simple (and > common) task of carrying out a SELECT which takes up more memory than > you wish to have available at any single time. Hmm, I agree. So, does the PostgreSQL protocol support some form of non-SQL cursor? - Andrew
Jules Bean wrote: > > On Thu, Aug 31, 2000 at 12:22:36AM +1000, Andrew Snow wrote: > > > > > I believe I can work around this problem using cursors (although I > > > don't know how well DBD::Pg copes with cursors). However, that > > > doesn't seem right -- cursors should be needed to fetch a large query > > > without having it all in memory at once... > > > > Actually, I think thats why cursors were invented in the first place ;-) A > > cursor is what you are using if you're not fetching all the results of a > > query. > > I really can't agree with you there. > > A cursor is another slightly foolish SQL hack. Not quite, but it is true that this is a flaw in postgres. It has been discussed on hackers from time to time about implementing a "streaming" interface. This means that the client doesn't absorb all the results before allowing access to the results. You can start processing results as and when they become available by blocking in the client. The main changes would be to the libpq client library, but there would be also other issues to address like what happens if an error happens half way through. In short, I'm sure this will be fixed at some stage, but for now cursors is the only real answer.
On Thu, Aug 31, 2000 at 03:28:14PM +1100, Chris wrote: > Jules Bean wrote: > > > > On Thu, Aug 31, 2000 at 12:22:36AM +1000, Andrew Snow wrote: > > > > > > > I believe I can work around this problem using cursors (although I > > > > don't know how well DBD::Pg copes with cursors). However, that > > > > doesn't seem right -- cursors should be needed to fetch a large query > > > > without having it all in memory at once... > > > > > > Actually, I think thats why cursors were invented in the first place ;-) A > > > cursor is what you are using if you're not fetching all the results of a > > > query. > > > > I really can't agree with you there. > > > > A cursor is another slightly foolish SQL hack. > > Not quite, All right ;) Can we say that, like some other SQL features of which I'm not fond (e.g. NULLs), cursors do have a use, but many ways in which they're often used in practice are bad practice... > but it is true that this is a flaw in postgres. It has been > discussed on hackers from time to time about implementing a "streaming" > interface. This means that the client doesn't absorb all the results > before allowing access to the results. You can start processing results > as and when they become available by blocking in the client. The main > changes would be to the libpq client library, but there would be also > other issues to address like what happens if an error happens half way > through. In short, I'm sure this will be fixed at some stage, but for > now cursors is the only real answer. Or ...LIMIT...OFFSET, I guess. [As long as I remember to set the transaction isolation to serializable. *sigh* Why isn't that the default?] I shall investigate whether LIMIT...OFFSET or cursors seems to be better for my application. Jules
On Thu, Aug 31, 2000 at 09:58:34AM +0100, Jules Bean wrote: > On Thu, Aug 31, 2000 at 03:28:14PM +1100, Chris wrote: > > > but it is true that this is a flaw in postgres. It has been > > discussed on hackers from time to time about implementing a "streaming" > > interface. This means that the client doesn't absorb all the results > > before allowing access to the results. You can start processing results > > as and when they become available by blocking in the client. The main > > changes would be to the libpq client library, but there would be also > > other issues to address like what happens if an error happens half way > > through. In short, I'm sure this will be fixed at some stage, but for > > now cursors is the only real answer. > > Or ...LIMIT...OFFSET, I guess. [As long as I remember to set the > transaction isolation to serializable. *sigh* Why isn't that the > default?] > > I shall investigate whether LIMIT...OFFSET or cursors seems to be > better for my application. OK, I'm using cursors (after having checked that they work with DBD::Pg!). I'm a little confused about transaction isolation levels, though. I'm setting the level to 'serializable' --- this seems important, since other INSERTS might occur during my SELECT. However, the documentation for DECLARE cursor suggests that the 'INSENSITIVE' keyword is useless, which seems to me to be equivalent to saying that the transaction level is always SERIALIZABLE? Jules
Hi!, I'm installing the ODBC Driver, and evrything goes right in all the clients (NT Workst.) except in one of them where I get this error at 85% of the set up process: "Unable to create ODBC Core Subkey" Sure, It's not ODBC Driver faulty (I guess), but if someone knows how to work out, would be nice to me, since I'm migrating from MS-SQL Server to PostgreSQL, and that computer is an important one, so it is delaying the process Thanks in advance! Pablo.
> Hi!, > > I'm installing the ODBC Driver, and evrything goes right in all the > clients (NT Workst.) except in one of them where I get this error at > 85% of the set up process: > > "Unable to create ODBC Core Subkey" > > Sure, It's not ODBC Driver faulty (I guess), but if someone knows how > to work out, would be nice to me, since I'm migrating from MS-SQL > Server to PostgreSQL, and that computer is an important one, so it is > delaying the process Under Win98, I had similar problems installing ODBC drivers (at some point, the ODBC manager seem to become corrupted, and I couldn't add new drivers.) Are you able to add another new ODBC driver? (You could download MySQL's, for example, & see if that works.) I solved my problem by going into the registry and deleting the ODBC managers keys and reinstalling PostgreSQL w/driver manager. I love my other ODBC datasources, but reinstalled those. It's hardly elegant, I'm sure they're are better ways, but, hey, it worked for me. Your mileage may *definitely* vary, You've been warned, etc., etc. -- Joel Burton, Director of Information Systems -*- jburton@scw.org Support Center of Washington (www.scw.org)
> I solved my problem by going into the registry and deleting the > ODBC managers keys and reinstalling PostgreSQL w/driver manager. > I love my other ODBC datasources, but reinstalled those. It's hardly > elegant, I'm sure they're are better ways, but, hey, it worked for > me. Your mileage may *definitely* vary, You've been warned, etc., > etc. Hi. Thanks for the reply. I had a vision and I solved the problem. But my solution is, perhaps, harder than yours: I've deleted the user and created again. Quick for me but the user looked angrrrry :). The driver, then, installed perfectly as usual. Maybe a corruption of the registry. Bloody Windows :( Well, that's another emergency solution. Pablo.
Can any of you running PostgreSQL on Alphas comment on the performance vs. Intel-based servers? steve
> -----Original Message----- > From: Jules Bean > > On Thu, Aug 31, 2000 at 09:58:34AM +0100, Jules Bean wrote: > > On Thu, Aug 31, 2000 at 03:28:14PM +1100, Chris wrote: > > > > > but it is true that this is a flaw in postgres. It has been > > > discussed on hackers from time to time about implementing a > "streaming" > > > interface. This means that the client doesn't absorb all the results > > > before allowing access to the results. You can start > processing results > > > as and when they become available by blocking in the client. The main > > > changes would be to the libpq client library, but there would be also > > > other issues to address like what happens if an error happens half way > > > through. In short, I'm sure this will be fixed at some stage, but for > > > now cursors is the only real answer. > > > > Or ...LIMIT...OFFSET, I guess. [As long as I remember to set the > > transaction isolation to serializable. *sigh* Why isn't that the > > default?] > > > > I shall investigate whether LIMIT...OFFSET or cursors seems to be > > better for my application. > > OK, I'm using cursors (after having checked that they work with > DBD::Pg!). I'm a little confused about transaction isolation levels, > though. I'm setting the level to 'serializable' --- this seems > important, since other INSERTS might occur during my SELECT. However, > the documentation for DECLARE cursor suggests that the 'INSENSITIVE' > keyword is useless, which seems to me to be equivalent to saying that > the transaction level is always SERIALIZABLE? > The default transaction isolation level of PostgreSQL is READ COMMITTED. However transaction isolation levels seems to be irrelevant to your problem. PostgreSQL cursors don't see any changes made by other backends after it was declared(opened) regardless of transaction isolation levels. INSENSITIVE cursors aren't implemented yet. Cursors aren't INSENSITIVE because they would see the changes made by the backend itself. It's also regardless of transaction isolation levels. Regards. Hiroshi Inoue
"Andrew Snow" <als@fl.net.au> writes: > > I believe I can work around this problem using cursors (although I > > don't know how well DBD::Pg copes with cursors). However, that > > doesn't seem right -- cursors should be needed to fetch a large query > > without having it all in memory at once... > Yes, I have noticed that particular bad behaviour, too. With DBD::Pg and DBD::mysql. At the same time, DBD::Oracle, DBD::InterBase and DBD::Sybase work as expected. Rows are fetched with fetchrow...() functions instead of all being sucked up into memory at the time execute() is called. Anybody know why is that happening? > Actually, I think thats why cursors were invented in the first place ;-) A > cursor is what you are using if you're not fetching all the results of a > query. > What bothers me is different behaviour of different DBD drivers. But, yes, I have just subscribed to dbi-users list which is the right place to ask that question. -- Zlatko
On Thu, Aug 31, 2000 at 12:55:35AM +0200, Zlatko Calusic wrote: > "Andrew Snow" <als@fl.net.au> writes: > > > > I believe I can work around this problem using cursors (although I > > > don't know how well DBD::Pg copes with cursors). However, that > > > doesn't seem right -- cursors should be needed to fetch a large query > > > without having it all in memory at once... > > > > Yes, I have noticed that particular bad behaviour, too. > With DBD::Pg and DBD::mysql. > > At the same time, DBD::Oracle, DBD::InterBase and DBD::Sybase work as > expected. Rows are fetched with fetchrow...() functions instead of all > being sucked up into memory at the time execute() is called. > > Anybody know why is that happening? Yes. It's a defect in 'libpq', the underlying PostgreSQL client library. This library is not capable of reading partial results - it always reads whole ones. > > > Actually, I think thats why cursors were invented in the first place ;-) A > > cursor is what you are using if you're not fetching all the results of a > > query. > > > > What bothers me is different behaviour of different DBD drivers. But, > yes, I have just subscribed to dbi-users list which is the right place > to ask that question. No, it's not really the DBD driver's fault. There is no (easy) way around it, since the flaw lies in the C library it uses. If the DBD driver wished to change behaviour, it could 'secretly' use cursors, but that would involve parsing queries to detect selects, which might be fragile. Jules
I would like to update a tuple based on its oid update table set name='text' where oid=234 does not seem to update anything... franck@sopac.org
On Wed, Sep 06, 2000 at 09:33:07AM +0100, Jules Bean wrote: > > No, it's not really the DBD driver's fault. There is no (easy) way > around it, since the flaw lies in the C library it uses. If the DBD > driver wished to change behaviour, it could 'secretly' use cursors, > but that would involve parsing queries to detect selects, which might > be fragile. > Recent experience points out the fragility: the ODBC driver has an option to try this: and there was just a bug report involving the driver's attempts to use cursors with SELECT FOR UPDATE. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005