Thread: Large selects handled inefficiently?

Large selects handled inefficiently?

From

Jules Bean

Date:

30 August 2000, 07:46:22

Hiya,

I am running a very large SELECT - it selects every row from a ~10 000
000 row table. I'm running this in Perl using DBD:Pg, with the general
sequence:

$sth = $dbh->prepare("SELECT $fields FROM $from") || return 0;
$fh = new FileHandle(">$file") || die "Can't open $file : $!";
$sth->execute()  || return 0;
$sth->bind_columns(undef,\(@fields));
while ($sth->fetch){
      print $fh join("\t",@fields),"\n";
}
if ($sth->err) { return 0; }
close $fh;

...which works fine in most circumstance, but on this really big
select:

* the client process's memory gradually climbs to around 1 gigabyte
(ouch!)
* eventually the stdout of the process shows over and over again:

Backend sent D message without prior T

It seems as if the client is fetching the /whole/ query result, or
trying to, all at once.  I expected it to only actually fetch result
rows as I called fetch().

Is this:

* A deficiency in DBD::Pg?
* A deficiency in the postgresql client libraries?
* A silly bug in my code?

I believe I can work around this problem using cursors (although I
don't know how well DBD::Pg copes with cursors).  However, that
doesn't seem right -- cursors should be needed to fetch a large query
without having it all in memory at once...

Jules

RE: Large selects handled inefficiently?

From

"Andrew Snow"

Date:

30 August 2000, 09:24:50

> I believe I can work around this problem using cursors (although I
> don't know how well DBD::Pg copes with cursors).  However, that
> doesn't seem right -- cursors should be needed to fetch a large query
> without having it all in memory at once...

Actually, I think thats why cursors were invented in the first place ;-)  A
cursor is what you are using if you're not fetching all the results of a
query.


- Andrew

Re: Large selects handled inefficiently?

From

Jules Bean

Date:

30 August 2000, 10:42:01

On Thu, Aug 31, 2000 at 12:22:36AM +1000, Andrew Snow wrote:
>
> > I believe I can work around this problem using cursors (although I
> > don't know how well DBD::Pg copes with cursors).  However, that
> > doesn't seem right -- cursors should be needed to fetch a large query
> > without having it all in memory at once...
>
> Actually, I think thats why cursors were invented in the first place ;-)  A
> cursor is what you are using if you're not fetching all the results of a
> query.

I really can't agree with you there.

A cursor is another slightly foolish SQL hack.

A query language specifies the syntax of queries ('SELECT ...').  It
doesn't specify the manner in which these are actually returned.  It
seems totally within the bounds of the remit of a decent client-side
library (and a decent back-end) to realise that in practice a client
will want some control over the speed with which rows are returned.

Whilst explicit cursors are needed for some (IMO ugly) procedural SQL
code, explicit cursors should not be necessary for the simple (and
common) task of carrying out a SELECT which takes up more memory than
you wish to have available at any single time.

Jules

Re: Large selects handled inefficiently?

From

andrew@ugh.net.au

Date:

30 August 2000, 21:12:29

On Wed, 30 Aug 2000, Jules Bean wrote:

> * the client process's memory gradually climbs to around 1 gigabyte
> (ouch!)

Have you tried playing with RowCacheSize in DBI? Not sure if it works with
DBD::Pg.

Andrew

RE: Large selects handled inefficiently?

From

"Andrew Snow"

Date:

31 August 2000, 00:09:07

> A cursor is another slightly foolish SQL hack.
>
> A query language specifies the syntax of queries ('SELECT ...').  It
> doesn't specify the manner in which these are actually returned.  It
> seems totally within the bounds of the remit of a decent client-side
> library (and a decent back-end) to realise that in practice a client
> will want some control over the speed with which rows are returned.
>
> Whilst explicit cursors are needed for some (IMO ugly) procedural SQL
> code, explicit cursors should not be necessary for the simple (and
> common) task of carrying out a SELECT which takes up more memory than
> you wish to have available at any single time.


Hmm, I agree.  So, does the PostgreSQL protocol support some form of non-SQL
cursor?


- Andrew

Re: Large selects handled inefficiently?

From

Chris

Date:

31 August 2000, 00:40:09

Jules Bean wrote:
>
> On Thu, Aug 31, 2000 at 12:22:36AM +1000, Andrew Snow wrote:
> >
> > > I believe I can work around this problem using cursors (although I
> > > don't know how well DBD::Pg copes with cursors).  However, that
> > > doesn't seem right -- cursors should be needed to fetch a large query
> > > without having it all in memory at once...
> >
> > Actually, I think thats why cursors were invented in the first place ;-)  A
> > cursor is what you are using if you're not fetching all the results of a
> > query.
>
> I really can't agree with you there.
>
> A cursor is another slightly foolish SQL hack.

Not quite, but it is true that this is a flaw in postgres. It has been
discussed on hackers from time to time about implementing a "streaming"
interface. This means that the client doesn't absorb all the results
before allowing access to the results. You can start processing results
as and when they become available by blocking in the client. The main
changes would be to the libpq client library, but there would be also
other issues to address like what happens if an error happens half way
through. In short, I'm sure this will be fixed at some stage, but for
now cursors is the only real answer.

Re: Large selects handled inefficiently?

From

Jules Bean

Date:

31 August 2000, 04:59:29

On Thu, Aug 31, 2000 at 03:28:14PM +1100, Chris wrote:
> Jules Bean wrote:
> >
> > On Thu, Aug 31, 2000 at 12:22:36AM +1000, Andrew Snow wrote:
> > >
> > > > I believe I can work around this problem using cursors (although I
> > > > don't know how well DBD::Pg copes with cursors).  However, that
> > > > doesn't seem right -- cursors should be needed to fetch a large query
> > > > without having it all in memory at once...
> > >
> > > Actually, I think thats why cursors were invented in the first place ;-)  A
> > > cursor is what you are using if you're not fetching all the results of a
> > > query.
> >
> > I really can't agree with you there.
> >
> > A cursor is another slightly foolish SQL hack.
>
> Not quite,

All right ;)  Can we say that, like some other SQL features of which
I'm not fond (e.g. NULLs), cursors do have a use, but many ways in
which they're often used in practice are bad practice...

> but it is true that this is a flaw in postgres. It has been
> discussed on hackers from time to time about implementing a "streaming"
> interface. This means that the client doesn't absorb all the results
> before allowing access to the results. You can start processing results
> as and when they become available by blocking in the client. The main
> changes would be to the libpq client library, but there would be also
> other issues to address like what happens if an error happens half way
> through. In short, I'm sure this will be fixed at some stage, but for
> now cursors is the only real answer.

Or ...LIMIT...OFFSET, I guess. [As long as I remember to set the
transaction isolation to serializable.  *sigh*  Why isn't that the
default?]

I shall investigate whether LIMIT...OFFSET or cursors seems to be
better for my application.

Jules

Re: Large selects handled inefficiently?

From

Jules Bean

Date:

31 August 2000, 06:29:15

On Thu, Aug 31, 2000 at 09:58:34AM +0100, Jules Bean wrote:
> On Thu, Aug 31, 2000 at 03:28:14PM +1100, Chris wrote:
>
> > but it is true that this is a flaw in postgres. It has been
> > discussed on hackers from time to time about implementing a "streaming"
> > interface. This means that the client doesn't absorb all the results
> > before allowing access to the results. You can start processing results
> > as and when they become available by blocking in the client. The main
> > changes would be to the libpq client library, but there would be also
> > other issues to address like what happens if an error happens half way
> > through. In short, I'm sure this will be fixed at some stage, but for
> > now cursors is the only real answer.
>
> Or ...LIMIT...OFFSET, I guess. [As long as I remember to set the
> transaction isolation to serializable.  *sigh*  Why isn't that the
> default?]
>
> I shall investigate whether LIMIT...OFFSET or cursors seems to be
> better for my application.

OK, I'm using cursors (after having checked that they work with
DBD::Pg!). I'm a little confused about transaction isolation levels,
though.  I'm setting the level to 'serializable'  --- this seems
important, since other INSERTS might occur during my SELECT.  However,
the documentation for DECLARE cursor suggests that the 'INSENSITIVE'
keyword is useless, which seems to me to be equivalent to saying that
the transaction level is always SERIALIZABLE?

Jules

Error installing ODBC in NT

From

"Pablo Prieto"

Date:

31 August 2000, 08:35:46

Hi!,

I'm installing the ODBC Driver, and evrything goes right in all the clients
(NT Workst.) except in one of them where I get this error at 85% of the set
up process:

"Unable to create ODBC Core Subkey"

Sure, It's not ODBC Driver faulty (I guess), but if someone knows how to
work out, would be nice to me, since I'm migrating from MS-SQL Server to
PostgreSQL, and that computer is an important one, so it is delaying the
process

Thanks in advance!

Pablo.

Re: Error installing ODBC in NT

From

"Joel Burton"

Date:

31 August 2000, 10:43:24

> Hi!,
>
> I'm installing the ODBC Driver, and evrything goes right in all the
> clients (NT Workst.) except in one of them where I get this error at
> 85% of the set up process:
>
> "Unable to create ODBC Core Subkey"
>
> Sure, It's not ODBC Driver faulty (I guess), but if someone knows how
> to work out, would be nice to me, since I'm migrating from MS-SQL
> Server to PostgreSQL, and that computer is an important one, so it is
> delaying the process

Under Win98, I had similar problems installing ODBC drivers (at some
point, the ODBC manager seem to become corrupted, and I couldn't
add new drivers.) Are you able to add another new ODBC driver?
(You could download MySQL's, for example, & see if that works.)

I solved my problem by going into the registry and deleting the
ODBC managers keys and reinstalling PostgreSQL w/driver manager.
I love my other ODBC datasources, but reinstalled those. It's hardly
elegant, I'm sure they're are better ways, but, hey, it worked for
me. Your mileage may *definitely* vary, You've been warned, etc.,
etc.

--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)

Re: Error installing ODBC in NT

From

"Pablo Prieto"

Date:

31 August 2000, 12:50:11

> I solved my problem by going into the registry and deleting the
> ODBC managers keys and reinstalling PostgreSQL w/driver manager.
> I love my other ODBC datasources, but reinstalled those. It's hardly
> elegant, I'm sure they're are better ways, but, hey, it worked for
> me. Your mileage may *definitely* vary, You've been warned, etc.,
> etc.

Hi. Thanks for the reply. I had a vision and I solved the problem. But my
solution is, perhaps, harder than yours: I've deleted the user and created
again. Quick for me but the user looked angrrrry :).

The driver, then, installed perfectly as usual. Maybe a corruption of the
registry. Bloody Windows :(

Well, that's another emergency solution.

Pablo.

PostgreSQL on Alpha

From

"Steve Wolfe"

Date:

31 August 2000, 13:56:35

  Can any of you running PostgreSQL on Alphas comment on the performance vs.
Intel-based servers?

steve

RE: Large selects handled inefficiently?

From

"Hiroshi Inoue"

Date:

31 August 2000, 21:15:49

> -----Original Message-----
> From: Jules Bean
>
> On Thu, Aug 31, 2000 at 09:58:34AM +0100, Jules Bean wrote:
> > On Thu, Aug 31, 2000 at 03:28:14PM +1100, Chris wrote:
> >
> > > but it is true that this is a flaw in postgres. It has been
> > > discussed on hackers from time to time about implementing a
> "streaming"
> > > interface. This means that the client doesn't absorb all the results
> > > before allowing access to the results. You can start
> processing results
> > > as and when they become available by blocking in the client. The main
> > > changes would be to the libpq client library, but there would be also
> > > other issues to address like what happens if an error happens half way
> > > through. In short, I'm sure this will be fixed at some stage, but for
> > > now cursors is the only real answer.
> >
> > Or ...LIMIT...OFFSET, I guess. [As long as I remember to set the
> > transaction isolation to serializable.  *sigh*  Why isn't that the
> > default?]
> >
> > I shall investigate whether LIMIT...OFFSET or cursors seems to be
> > better for my application.
>
> OK, I'm using cursors (after having checked that they work with
> DBD::Pg!). I'm a little confused about transaction isolation levels,
> though.  I'm setting the level to 'serializable'  --- this seems
> important, since other INSERTS might occur during my SELECT.  However,
> the documentation for DECLARE cursor suggests that the 'INSENSITIVE'
> keyword is useless, which seems to me to be equivalent to saying that
> the transaction level is always SERIALIZABLE?
>

The default transaction isolation level of PostgreSQL is READ COMMITTED.
However transaction isolation levels seems to be irrelevant to your problem.
PostgreSQL cursors don't see any changes made by other backends
after it was declared(opened) regardless of transaction isolation levels.
INSENSITIVE cursors aren't implemented yet. Cursors aren't INSENSITIVE
because they would see the changes made by the backend itself. It's also
regardless of transaction isolation levels.

Regards.

Hiroshi Inoue

Re: Large selects handled inefficiently?

From

Zlatko Calusic

Date:

06 September 2000, 01:05:23

"Andrew Snow" <als@fl.net.au> writes:

> > I believe I can work around this problem using cursors (although I
> > don't know how well DBD::Pg copes with cursors).  However, that
> > doesn't seem right -- cursors should be needed to fetch a large query
> > without having it all in memory at once...
>

Yes, I have noticed that particular bad behaviour, too.
With DBD::Pg and DBD::mysql.

At the same time, DBD::Oracle, DBD::InterBase and DBD::Sybase work as
expected. Rows are fetched with fetchrow...() functions instead of all
being sucked up into memory at the time execute() is called.

Anybody know why is that happening?

> Actually, I think thats why cursors were invented in the first place ;-)  A
> cursor is what you are using if you're not fetching all the results of a
> query.
>

What bothers me is different behaviour of different DBD drivers. But,
yes, I have just subscribed to dbi-users list which is the right place
to ask that question.
--
Zlatko

Re: Large selects handled inefficiently?

From

Jules Bean

Date:

06 September 2000, 04:34:34

On Thu, Aug 31, 2000 at 12:55:35AM +0200, Zlatko Calusic wrote:
> "Andrew Snow" <als@fl.net.au> writes:
>
> > > I believe I can work around this problem using cursors (although I
> > > don't know how well DBD::Pg copes with cursors).  However, that
> > > doesn't seem right -- cursors should be needed to fetch a large query
> > > without having it all in memory at once...
> >
>
> Yes, I have noticed that particular bad behaviour, too.
> With DBD::Pg and DBD::mysql.
>
> At the same time, DBD::Oracle, DBD::InterBase and DBD::Sybase work as
> expected. Rows are fetched with fetchrow...() functions instead of all
> being sucked up into memory at the time execute() is called.
>
> Anybody know why is that happening?

Yes.  It's a defect in 'libpq', the underlying PostgreSQL client library. This
library is not capable of reading partial results - it always reads whole ones.

>
> > Actually, I think thats why cursors were invented in the first place ;-)  A
> > cursor is what you are using if you're not fetching all the results of a
> > query.
> >
>
> What bothers me is different behaviour of different DBD drivers. But,
> yes, I have just subscribed to dbi-users list which is the right place
> to ask that question.

No, it's not really the DBD driver's fault.  There is no (easy) way
around it, since the flaw lies in the C library it uses.  If the DBD
driver wished to change behaviour, it could 'secretly' use cursors,
but that would involve parsing queries to detect selects, which might
be fragile.

Jules

Update table based on OID

From

Franck Martin

Date:

06 September 2000, 08:20:44

I would like to update a tuple based on its oid

update table set name='text' where oid=234

does not seem to update anything...

franck@sopac.org

Re: Large selects handled inefficiently?

From

"Ross J. Reedstrom"

Date:

06 September 2000, 10:44:35

On Wed, Sep 06, 2000 at 09:33:07AM +0100, Jules Bean wrote:
>
> No, it's not really the DBD driver's fault.  There is no (easy) way
> around it, since the flaw lies in the C library it uses.  If the DBD
> driver wished to change behaviour, it could 'secretly' use cursors,
> but that would involve parsing queries to detect selects, which might
> be fragile.
>

Recent experience points out the fragility: the ODBC driver has an option
to try this: and there was just a bug report involving the driver's
attempts to use cursors with SELECT FOR UPDATE.

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005