Thread: libpq questions
For the libpq interface: I need to be able to know if a column in a result from a query is nullable or not. From reading the documentation it seems that I can obtain the following information: . scan all the rows in the result and see if there exists a null value for each column... . backtrack the column to the source table (assuming a non-calculated field) and check the nullable status there Neither of the above is particularly cheap to do... Which leads me to my next question... If I executed a select against a table with a million rows, and the query returned all of the rows, what happens? Are all the rows read into memory on the client before returning the result? Or are rows only fetched from the server as required? Thanks James
On Tue, Jan 31, 2006 at 10:23:54PM +1100, James Harper wrote: > For the libpq interface: > > I need to be able to know if a column in a result from a query is > nullable or not. From reading the documentation it seems that I can > obtain the following information: > . scan all the rows in the result and see if there exists a null value > for each column... Be careful what you infer from such a scan: not finding any NULLs doesn't necessarily mean a column isn't nullable, it just means the result set didn't contain any NULLs. > . backtrack the column to the source table (assuming a non-calculated > field) and check the nullable status there > > Neither of the above is particularly cheap to do... If you know the table and column names then checking which columns have a NOT NULL constraint is a simple query against pg_attribute. > Which leads me to my next question... If I executed a select against a > table with a million rows, and the query returned all of the rows, what > happens? Are all the rows read into memory on the client before > returning the result? Or are rows only fetched from the server as > required? libpq fetches all rows before returning any to the client; if you want to fetch rows in smaller chunks then use a cursor. The developers' TODO list has an item to address that problem: * Allow statement results to be automatically batched to the client http://www.postgresql.org/docs/faqs.TODO.html -- Michael Fuhr
On Wed, Feb 01, 2006 at 09:47:12AM +1100, James Harper wrote: > > Be careful what you infer from such a scan: not finding any NULLs > > doesn't necessarily mean a column isn't nullable, it just means the > > result set didn't contain any NULLs. > > I understand that limitation, but haven't figured out if it matters in > my situation. The only time it might is if the client wants to infer a > schema as a result of a query, eg 'SELECT * FROM TableX WHERE 0 = 1'. Even if such a query did return a "nullable" flag, plenty of other metadata would be absent that might be just as interesting from a schema-viewing standpoint (CHECK, PRIMARY KEY, etc.). A better way to view the schema is to query the system catalogs or the Information Schema. > In the above example, does the database engine assign internally a > 'nullability' flag? I guess it must do... because how would the > following be evaluated: > > SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30 > > Would the column f in the created table be nullable or not? > > I guess I need to do some testing unless you know off the top of your > head? I'm not familiar enough with PostgreSQL internals to comment on what's happening underneath, but I could tell you from experience what the above query would do. But with a quick test you could figure it out for yourself :-) > Hmmm... so a select statement with result set of a million rows is going > to stall for a while before the results are usefully available to the > client, and is then going to use a significant amount of memory on the > client... Correct. > Is this a limitation of libpq or of the underlying database engine? The "Incremental results from libpq" thread from a few months ago might answer your questions: http://archives.postgresql.org/pgsql-interfaces/2005-11/msg00010.php > Are there any alternative (but native - eg not ODBC) interfaces to > postgresql? What problem do you want the alternative to solve? If it's just the libpq-fetches-all-rows problem then you could use a cursor. -- Michael Fuhr
> > Even if such a query did return a "nullable" flag, plenty of other > metadata would be absent that might be just as interesting from a > schema-viewing standpoint (CHECK, PRIMARY KEY, etc.). A better way > to view the schema is to query the system catalogs or the Information > Schema. I now know enough about it to agree with you :) > > In the above example, does the database engine assign internally a > > 'nullability' flag? I guess it must do... because how would the > > following be evaluated: > > > > SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30 > > > > Would the column f in the created table be nullable or not? > > > > I guess I need to do some testing unless you know off the top of your > > head? > > I'm not familiar enough with PostgreSQL internals to comment on > what's happening underneath, but I could tell you from experience > what the above query would do. But with a quick test you could > figure it out for yourself :-) Did that once I got access to my postgres server. The f in the created table is nullable in the above example, and also even if f1 and f2 are not nullable themselves, so it looks like that as soon as you start to make a field based on an expression, the 'not null' constraint goes out the window. > The "Incremental results from libpq" thread from a few months ago > might answer your questions: > > http://archives.postgresql.org/pgsql-interfaces/2005-11/msg00010.php Found that eventually. I'd seen the thread earlier but assumed that the word 'incremental' in the subject was to do with auto-incrementing fields. An interesting read. Thanks James
> On Tue, Jan 31, 2006 at 10:23:54PM +1100, James Harper wrote: > > For the libpq interface: > > > > I need to be able to know if a column in a result from a query is > > nullable or not. From reading the documentation it seems that I can > > obtain the following information: > > . scan all the rows in the result and see if there exists a null value > > for each column... > > Be careful what you infer from such a scan: not finding any NULLs > doesn't necessarily mean a column isn't nullable, it just means the > result set didn't contain any NULLs. I understand that limitation, but haven't figured out if it matters in my situation. The only time it might is if the client wants to infer a schema as a result of a query, eg 'SELECT * FROM TableX WHERE 0 = 1'. > > . backtrack the column to the source table (assuming a non-calculated > > field) and check the nullable status there > > > > Neither of the above is particularly cheap to do... > > If you know the table and column names then checking which columns > have a NOT NULL constraint is a simple query against pg_attribute. Yep, but does involve another round trip to the server to determine... I guess it's the only way to do it though. Obviously it's only going to work for fields which are returned directly from a table. What about this situation: CREATE TABLE TableX (f1 int NOT NULL, f2 int NULL) INSERT INTO TableX VALUES (10, 10) INSERT INTO TableX VALUES (20, 11) INSERT INTO TableX VALUES (30, NULL) INSERT INTO TableX VALUES (40, 12) SELECT f1 + f2 AS f FROM TableX WHERE f1 < 30 Now, given the current where clause, the client isn't going to see any NULL values, but a change in the where clause might suddenly make the calculated f field nullable. Maybe this doesn't matter and the client application should itself be aware of the database schema anyway... maybe I'm just worrying too much :) In the above example, does the database engine assign internally a 'nullability' flag? I guess it must do... because how would the following be evaluated: SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30 Would the column f in the created table be nullable or not? I guess I need to do some testing unless you know off the top of your head? > > > Which leads me to my next question... If I executed a select against a > > table with a million rows, and the query returned all of the rows, what > > happens? Are all the rows read into memory on the client before > > returning the result? Or are rows only fetched from the server as > > required? > > libpq fetches all rows before returning any to the client; if you > want to fetch rows in smaller chunks then use a cursor. The > developers' TODO list has an item to address that problem: > > * Allow statement results to be automatically batched to the client > > http://www.postgresql.org/docs/faqs.TODO.html > Hmmm... so a select statement with result set of a million rows is going to stall for a while before the results are usefully available to the client, and is then going to use a significant amount of memory on the client... Is this a limitation of libpq or of the underlying database engine? Are there any alternative (but native - eg not ODBC) interfaces to postgresql? Thanks for the response! James
On Wed, Feb 01, 2006 at 09:47:12AM +1100, James Harper wrote: > Now, given the current where clause, the client isn't going to see any > NULL values, but a change in the where clause might suddenly make the > calculated f field nullable. Maybe this doesn't matter and the client > application should itself be aware of the database schema anyway... > maybe I'm just worrying too much :) > > In the above example, does the database engine assign internally a > 'nullability' flag? I guess it must do... because how would the > following be evaluated: > > SELECT f1 + f2 AS f INTO TableY FROM TableX WHERE f1 < 30 > > Would the column f in the created table be nullable or not? Umm, the database doesn't keep track of whether a column is nullable or not. Nor would I expect it to. For example in the above case, just because the column doesn't contain nulls, doesn't mean it shouldn't be able to. The "NOT NULL" declaration is a constraint and PostgreSQL won't add it for you. > I guess I need to do some testing unless you know off the top of your > head? Is suppose this would be the different between SELECT * INTO and CREATE TABLE ... LIKE. > Hmmm... so a select statement with result set of a million rows is going > to stall for a while before the results are usefully available to the > client, and is then going to use a significant amount of memory on the > client... > > Is this a limitation of libpq or of the underlying database engine? If you use cursors you can get any rows whenever you want. By default if you ask for all the rows you get all the rows... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.