Thread: libpq questions

libpq questions

From
"James Harper"
Date:
For the libpq interface:

I need to be able to know if a column in a result from a query is
nullable or not. From reading the documentation it seems that I can
obtain the following information:
. scan all the rows in the result and see if there exists a null value
for each column...
. backtrack the column to the source table (assuming a non-calculated
field) and check the nullable status there

Neither of the above is particularly cheap to do...

Which leads me to my next question... If I executed a select against a
table with a million rows, and the query returned all of the rows, what
happens? Are all the rows read into memory on the client before
returning the result? Or are rows only fetched from the server as
required?

Thanks

James

Re: libpq questions

From
Michael Fuhr
Date:
On Tue, Jan 31, 2006 at 10:23:54PM +1100, James Harper wrote:
> For the libpq interface:
>
> I need to be able to know if a column in a result from a query is
> nullable or not. From reading the documentation it seems that I can
> obtain the following information:
> . scan all the rows in the result and see if there exists a null value
> for each column...

Be careful what you infer from such a scan: not finding any NULLs
doesn't necessarily mean a column isn't nullable, it just means the
result set didn't contain any NULLs.

> . backtrack the column to the source table (assuming a non-calculated
> field) and check the nullable status there
>
> Neither of the above is particularly cheap to do...

If you know the table and column names then checking which columns
have a NOT NULL constraint is a simple query against pg_attribute.

> Which leads me to my next question... If I executed a select against a
> table with a million rows, and the query returned all of the rows, what
> happens? Are all the rows read into memory on the client before
> returning the result? Or are rows only fetched from the server as
> required?

libpq fetches all rows before returning any to the client; if you
want to fetch rows in smaller chunks then use a cursor.  The
developers' TODO list has an item to address that problem:

* Allow statement results to be automatically batched to the client

http://www.postgresql.org/docs/faqs.TODO.html

--
Michael Fuhr

Re: libpq questions

From
Michael Fuhr
Date:
On Wed, Feb 01, 2006 at 09:47:12AM +1100, James Harper wrote:
> > Be careful what you infer from such a scan: not finding any NULLs
> > doesn't necessarily mean a column isn't nullable, it just means the
> > result set didn't contain any NULLs.
>
> I understand that limitation, but haven't figured out if it matters in
> my situation. The only time it might is if the client wants to infer a
> schema as a result of a query, eg 'SELECT * FROM TableX WHERE 0 = 1'.

Even if such a query did return a "nullable" flag, plenty of other
metadata would be absent that might be just as interesting from a
schema-viewing standpoint (CHECK, PRIMARY KEY, etc.).  A better way
to view the schema is to query the system catalogs or the Information
Schema.

> In the above example, does the database engine assign internally a
> 'nullability' flag? I guess it must do... because how would the
> following be evaluated:
>
> SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30
>
> Would the column f in the created table be nullable or not?
>
> I guess I need to do some testing unless you know off the top of your
> head?

I'm not familiar enough with PostgreSQL internals to comment on
what's happening underneath, but I could tell you from experience
what the above query would do.  But with a quick test you could
figure it out for yourself :-)

> Hmmm... so a select statement with result set of a million rows is going
> to stall for a while before the results are usefully available to the
> client, and is then going to use a significant amount of memory on the
> client...

Correct.

> Is this a limitation of libpq or of the underlying database engine?

The "Incremental results from libpq" thread from a few months ago
might answer your questions:

http://archives.postgresql.org/pgsql-interfaces/2005-11/msg00010.php

> Are there any alternative (but native - eg not ODBC) interfaces to
> postgresql?

What problem do you want the alternative to solve?  If it's just
the libpq-fetches-all-rows problem then you could use a cursor.

--
Michael Fuhr

Re: libpq questions

From
"James Harper"
Date:
>
> Even if such a query did return a "nullable" flag, plenty of other
> metadata would be absent that might be just as interesting from a
> schema-viewing standpoint (CHECK, PRIMARY KEY, etc.).  A better way
> to view the schema is to query the system catalogs or the Information
> Schema.

I now know enough about it to agree with you :)

> > In the above example, does the database engine assign internally a
> > 'nullability' flag? I guess it must do... because how would the
> > following be evaluated:
> >
> > SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30
> >
> > Would the column f in the created table be nullable or not?
> >
> > I guess I need to do some testing unless you know off the top of
your
> > head?
>
> I'm not familiar enough with PostgreSQL internals to comment on
> what's happening underneath, but I could tell you from experience
> what the above query would do.  But with a quick test you could
> figure it out for yourself :-)

Did that once I got access to my postgres server. The f in the created
table is nullable in the above example, and also even if f1 and f2 are
not nullable themselves, so it looks like that as soon as you start to
make a field based on an expression, the 'not null' constraint goes out
the window.

> The "Incremental results from libpq" thread from a few months ago
> might answer your questions:
>
> http://archives.postgresql.org/pgsql-interfaces/2005-11/msg00010.php

Found that eventually. I'd seen the thread earlier but assumed that the
word 'incremental' in the subject was to do with auto-incrementing
fields. An interesting read.

Thanks

James


Re: libpq questions

From
"James Harper"
Date:
> On Tue, Jan 31, 2006 at 10:23:54PM +1100, James Harper wrote:
> > For the libpq interface:
> >
> > I need to be able to know if a column in a result from a query is
> > nullable or not. From reading the documentation it seems that I can
> > obtain the following information:
> > . scan all the rows in the result and see if there exists a null
value
> > for each column...
>
> Be careful what you infer from such a scan: not finding any NULLs
> doesn't necessarily mean a column isn't nullable, it just means the
> result set didn't contain any NULLs.

I understand that limitation, but haven't figured out if it matters in
my situation. The only time it might is if the client wants to infer a
schema as a result of a query, eg 'SELECT * FROM TableX WHERE 0 = 1'.

> > . backtrack the column to the source table (assuming a
non-calculated
> > field) and check the nullable status there
> >
> > Neither of the above is particularly cheap to do...
>
> If you know the table and column names then checking which columns
> have a NOT NULL constraint is a simple query against pg_attribute.

Yep, but does involve another round trip to the server to determine... I
guess it's the only way to do it though. Obviously it's only going to
work for fields which are returned directly from a table. What about
this situation:

CREATE TABLE TableX (f1 int NOT NULL, f2 int NULL)
INSERT INTO TableX VALUES (10, 10)
INSERT INTO TableX VALUES (20, 11)
INSERT INTO TableX VALUES (30, NULL)
INSERT INTO TableX VALUES (40, 12)

SELECT f1 + f2 AS f FROM TableX WHERE f1 < 30

Now, given the current where clause, the client isn't going to see any
NULL values, but a change in the where clause might suddenly make the
calculated f field nullable. Maybe this doesn't matter and the client
application should itself be aware of the database schema anyway...
maybe I'm just worrying too much :)

In the above example, does the database engine assign internally a
'nullability' flag? I guess it must do... because how would the
following be evaluated:

SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30

Would the column f in the created table be nullable or not?

I guess I need to do some testing unless you know off the top of your
head?

>
> > Which leads me to my next question... If I executed a select against
a
> > table with a million rows, and the query returned all of the rows,
what
> > happens? Are all the rows read into memory on the client before
> > returning the result? Or are rows only fetched from the server as
> > required?
>
> libpq fetches all rows before returning any to the client; if you
> want to fetch rows in smaller chunks then use a cursor.  The
> developers' TODO list has an item to address that problem:
>
> * Allow statement results to be automatically batched to the client
>
> http://www.postgresql.org/docs/faqs.TODO.html
>

Hmmm... so a select statement with result set of a million rows is going
to stall for a while before the results are usefully available to the
client, and is then going to use a significant amount of memory on the
client...

Is this a limitation of libpq or of the underlying database engine?

Are there any alternative (but native - eg not ODBC) interfaces to
postgresql?

Thanks for the response!

James


Re: libpq questions

From
Martijn van Oosterhout
Date:
On Wed, Feb 01, 2006 at 09:47:12AM +1100, James Harper wrote:
> Now, given the current where clause, the client isn't going to see any
> NULL values, but a change in the where clause might suddenly make the
> calculated f field nullable. Maybe this doesn't matter and the client
> application should itself be aware of the database schema anyway...
> maybe I'm just worrying too much :)
>
> In the above example, does the database engine assign internally a
> 'nullability' flag? I guess it must do... because how would the
> following be evaluated:
>
> SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30
>
> Would the column f in the created table be nullable or not?

Umm, the database doesn't keep track of whether a column is nullable or
not. Nor would I expect it to. For example in the above case, just
because the column doesn't contain nulls, doesn't mean it shouldn't be
able to. The "NOT NULL" declaration is a constraint and PostgreSQL
won't add it for you.

> I guess I need to do some testing unless you know off the top of your
> head?

Is suppose this would be the different between SELECT * INTO and CREATE
TABLE ... LIKE.

> Hmmm... so a select statement with result set of a million rows is going
> to stall for a while before the results are usefully available to the
> client, and is then going to use a significant amount of memory on the
> client...
>
> Is this a limitation of libpq or of the underlying database engine?

If you use cursors you can get any rows whenever you want. By default
if you ask for all the rows you get all the rows...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment