Thread: Nulls get converted to 0 problem

Nulls get converted to 0 problem

From

Avi Schwartz

Date:

04 June 2003, 18:16:17

Hi,

I have this very strange problem that I cannot figure out.

I am in the process of converting a Coldfusion MX application from SQL
Server to PostgreSQL 7.3.2 running on SuSE Linux 8.2.
Since Coldfusion has no support for real null, when a select returns a
null value, the value of the Coldfusion variable is set to "", even if
it a numeric value like int.

For example, the following query should display nothing in the
item_category column:

select * from item_catalog where item_category is null

With SQl Server 7, this works as expected, but with PostgreSQL, the
value Coldfusion is setting the integer variables to is 0 (zero) and
not "" as it should.  This of course is a big problem.  I checked with
other people using the combination of PostgreSQL and Coldfusion and
they cannot reproduce it.  This makes me think that there may be a
problem with my database as created.

Now the interesting thing is that if I use another tool to examine the
results, they do display null in this column.

At this point I pretty much lost.  Any idea what may the problem be?

Thanks,
Avi
--
Avi Schwartz
avi@CFFtechnologies.com

Re: Nulls get converted to 0 problem

From

Andrew Sullivan

Date:

04 June 2003, 18:55:32

On Wed, Jun 04, 2003 at 10:26:22AM -0500, Avi Schwartz wrote:
> value Coldfusion is setting the integer variables to is 0 (zero) and
> not "" as it should.  This of course is a big problem.  I checked with
> other people using the combination of PostgreSQL and Coldfusion and
> they cannot reproduce it.  This makes me think that there may be a
> problem with my database as created.

Shooting in the dark here, but what is the TRANSFORM_NULL_EQUALS
setting on your system and on that of the others?

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Nulls get converted to 0 problem

From

"scott.marlowe"

Date:

04 June 2003, 19:09:07

FYI, "" does NOT equal NULL does not equal 0.

If an application wishes to store a null value, it should store a null
value, not a blank value.

Since blanks are not allowed in integers, the only legal value for CF to
store is either NULL or 0.

If you attempt to store '' or "" into an integer column, you should get an
error.

CF's lack of support for nulls is their issue, and one of the dozens of
reasons I switched development from that to PHP in my shop.

I would guess this might be a known and fixed bug in cold fusion.  Have
you checked for updates on their site yet?

A database application language that doesn't understand NULLs is broken,
and needs to be fixed.

On Wed, 4 Jun 2003, Avi Schwartz wrote:

> Hi,
>
> I have this very strange problem that I cannot figure out.
>
> I am in the process of converting a Coldfusion MX application from SQL
> Server to PostgreSQL 7.3.2 running on SuSE Linux 8.2.
> Since Coldfusion has no support for real null, when a select returns a
> null value, the value of the Coldfusion variable is set to "", even if
> it a numeric value like int.
>
> For example, the following query should display nothing in the
> item_category column:
>
> select * from item_catalog where item_category is null
>
> With SQl Server 7, this works as expected, but with PostgreSQL, the
> value Coldfusion is setting the integer variables to is 0 (zero) and
> not "" as it should.  This of course is a big problem.  I checked with
> other people using the combination of PostgreSQL and Coldfusion and
> they cannot reproduce it.  This makes me think that there may be a
> problem with my database as created.
>
> Now the interesting thing is that if I use another tool to examine the
> results, they do display null in this column.
>
> At this point I pretty much lost.  Any idea what may the problem be?
>
> Thanks,
> Avi
>

Re: Nulls get converted to 0 problem

From

Avi Schwartz

Date:

04 June 2003, 21:46:05

On Wednesday, Jun 4, 2003, at 13:48 America/Chicago, scott.marlowe
wrote:

> FYI, "" does NOT equal NULL does not equal 0.

I know this very well, thank you.

> If an application wishes to store a null value, it should store a null
> value, not a blank value.

No disagreement here.

> Since blanks are not allowed in integers, the only legal value for CF
> to
> store is either NULL or 0.
>
> If you attempt to store '' or "" into an integer column, you should
> get an
> error.

I do not store a '' or "" into an integer column.  This is how CF does
it when it gets a null value from the database.

> CF's lack of support for nulls is their issue, and one of the dozens of
> reasons I switched development from that to PHP in my shop.

Unfortunately this is not reasonable.  This is an extremely big
application that will require a huge effort to move to a different
platform, a thought that we will entertain if we decide to rewrite the
application in the future.

> I would guess this might be a known and fixed bug in cold fusion.  Have
> you checked for updates on their site yet?

Yes, I am running the latest SP.  However, I am not sure yet where is
the problem since CF works as expected with other databases.

> A database application language that doesn't understand NULLs is
> broken,
> and needs to be fixed.

Avi
--
Avi Schwartz
avi@CFFtechnologies.com

Re: Nulls get converted to 0 problem

From

"Vincent Hikida"

Date:

04 June 2003, 22:13:43

> FYI, "" does NOT equal NULL does not equal 0.


Interesting, I noticed that a few days ago. I noticed that in pg sql, if I
used RTRIM on a column with all blanks. the RTRIM'ed result is not null. In
Oracle I think it would be a null. That is in Oracle "" and NULL are
equivalent.

Vincent Hikida,
Member of Technical Staff - Urbana Software, Inc.
"A Personalized Learning Experience"

www.UrbanaSoft.com

Re: Nulls get converted to 0 problem

From

Avi Schwartz

Date:

04 June 2003, 22:26:10

On Wednesday, Jun 4, 2003, at 16:53 America/Chicago, scott.marlowe
wrote:

> On Wed, 4 Jun 2003, Avi Schwartz wrote:
>
>
>> I have a feeling this has something to do with the JDBC driver, again,
>> just a feeling.
>
> You may be on to something there. Have you tried setting up an ODBC
> connection to see how that works?

Unlike older versions, Coldfusion MX is Java based and does not support
ODBC connections anymore.

>>> Just because it works with other databases doesn't mean it's not
>>> broken,
>>> just that the other databases will let you do something you shouldn't
>>> be
>>> allowed to do.
>>
>> But that does not help me or any other company that wants to use
>> PostgreSQL with CF and I am sure Micromedia will tell me that
>> PostgreSQL is broken since they have no problem with other databases.
>
> That's not always true.  Where I work we went round and round with the
> folks at Crystal Reports.  Their box said it worked with "LDAP V3
> compliant" servers.  It didn't work with OpenLDAP, and we figured it
> out
> together, and they're now working on making it work with OpenLDAP.

I just posted a general request for MM to improve their support of
PostgreSQL.  We'll see what happens.

Avi
--
Avi Schwartz
avi@CFFtechnologies.com

Re: Nulls get converted to 0 problem

From

Date:

04 June 2003, 22:51:49

Oracle *incorrectly* interprets blank (empty) strings as NULL.  They are NOT
the same.  A string of zero characters is a string nonetheless.  A NULL is
"the absence of value", which equals nothing (theoretically not even another
NULL).

Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com
Fax: (416) 441-9085


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Vincent Hikida
> Sent: Wednesday, June 04, 2003 3:27 PM
> To: scott.marlowe; Avi Schwartz
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Nulls get converted to 0 problem
>
>
> > FYI, "" does NOT equal NULL does not equal 0.
>
>
> Interesting, I noticed that a few days ago. I noticed that in
> pg sql, if I
> used RTRIM on a column with all blanks. the RTRIM'ed result
> is not null. In
> Oracle I think it would be a null. That is in Oracle "" and NULL are
> equivalent.
>
> Vincent Hikida,
> Member of Technical Staff - Urbana Software, Inc.
> "A Personalized Learning Experience"
>
> www.UrbanaSoft.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>

Re: Nulls get converted to 0 problem

From

"scott.marlowe"

Date:

04 June 2003, 22:53:16

On Wed, 4 Jun 2003, Avi Schwartz wrote:

> I have a feeling this has something to do with the JDBC driver, again,
> just a feeling.

You may be on to something there. Have you tried setting up an ODBC
connection to see how that works?

> I posted a question also on the Macromedia forums to see if I can
> verify that the problem exists only on my installation and if it works
> for them to find out what JDBC driver version they are using.

> >> Yes, I am running the latest SP.  However, I am not sure yet where is
> >> the problem since CF works as expected with other databases.
> >
> > I repeat...
> >
> > A database application language that doesn't understand NULLs is
> > broken, and needs to be fixed.
> >
> > Just because it works with other databases doesn't mean it's not
> > broken,
> > just that the other databases will let you do something you shouldn't
> > be
> > allowed to do.
>
> But that does not help me or any other company that wants to use
> PostgreSQL with CF and I am sure Micromedia will tell me that
> PostgreSQL is broken since they have no problem with other databases.

That's not always true.  Where I work we went round and round with the
folks at Crystal Reports.  Their box said it worked with "LDAP V3
compliant" servers.  It didn't work with OpenLDAP, and we figured it out
together, and they're now working on making it work with OpenLDAP.

Sure, they complained about "only supporting brand X" LDAP servers, but
then I pointed out that for their box to be correct it should say that,
not claim full LDAP V3 which they obviously didn't really have quite right
yet.

It couldn't be that hard to add a few is_null type checks to cold fusion,
and it certainly wouldn't hurt their market penetration to be able to
handle NULLs properly.  It's usually harder to find someone to "pitch" to
in a closed source shop, but they're often very receptive of ideas once
they get in.

Re: Nulls get converted to 0 problem

From

Avi Schwartz

Date:

05 June 2003, 01:04:54

On Wednesday, Jun 4, 2003, at 15:55 America/Chicago, scott.marlowe
wrote:

> On Wed, 4 Jun 2003, Avi Schwartz wrote:
>
>>
>> On Wednesday, Jun 4, 2003, at 13:48 America/Chicago, scott.marlowe
>> wrote:
>>
>> I do not store a '' or "" into an integer column.  This is how CF does
>> it when it gets a null value from the database.
>
> So, CF is letting you store NULLS?  Or is it coercing them to 0 before
> storage?

No, the value in the database IS null.  When I do the select, it seems
that CF turns the null integers into the value 0.

> I'm gonna guess that the real problem is that cold fusion knows JUST
> enough about postgresql to be dangerous, and it is storing 0 in those
> fields when you tell it to store NULL.  Since it stores it as 0, it
> returns it as 0.  If you were to run an update statement on that table
> to
> set the values that are 0 to NULL, it would probably work as you want.
>
> Again, this is a guess.

I have a feeling this has something to do with the JDBC driver, again,
just a feeling.

I posted a question also on the Macromedia forums to see if I can
verify that the problem exists only on my installation and if it works
for them to find out what JDBC driver version they are using.
>> Yes, I am running the latest SP.  However, I am not sure yet where is
>> the problem since CF works as expected with other databases.
>
> I repeat...
>
> A database application language that doesn't understand NULLs is
> broken, and needs to be fixed.
>
> Just because it works with other databases doesn't mean it's not
> broken,
> just that the other databases will let you do something you shouldn't
> be
> allowed to do.

But that does not help me or any other company that wants to use
PostgreSQL with CF and I am sure Micromedia will tell me that
PostgreSQL is broken since they have no problem with other databases.

Avi
--
Avi Schwartz
avi@CFFtechnologies.com

Re: Nulls get converted to 0 problem

From

"scott.marlowe"

Date:

05 June 2003, 01:30:02

On Wed, 4 Jun 2003, Avi Schwartz wrote:

>
> On Wednesday, Jun 4, 2003, at 13:48 America/Chicago, scott.marlowe
> wrote:
>
> > FYI, "" does NOT equal NULL does not equal 0.
>
> I know this very well, thank you.

Great, now if we could just teach Cold Fusion to know the difference your
problem would be solved.

> > If an application wishes to store a null value, it should store a null
> > value, not a blank value.
>
> No disagreement here.
>
> > Since blanks are not allowed in integers, the only legal value for CF
> > to
> > store is either NULL or 0.
> >
> > If you attempt to store '' or "" into an integer column, you should
> > get an
> > error.
>
> I do not store a '' or "" into an integer column.  This is how CF does
> it when it gets a null value from the database.

So, CF is letting you store NULLS?  Or is it coercing them to 0 before
storage?

> > CF's lack of support for nulls is their issue, and one of the dozens of
> > reasons I switched development from that to PHP in my shop.
>
> Unfortunately this is not reasonable.  This is an extremely big
> application that will require a huge effort to move to a different
> platform, a thought that we will entertain if we decide to rewrite the
> application in the future.

I'm not suggesting moving now.  I am saying that this is a bug that
needs to be fixed.

I'm gonna guess that the real problem is that cold fusion knows JUST
enough about postgresql to be dangerous, and it is storing 0 in those
fields when you tell it to store NULL.  Since it stores it as 0, it
returns it as 0.  If you were to run an update statement on that table to
set the values that are 0 to NULL, it would probably work as you want.

Again, this is a guess.

> > I would guess this might be a known and fixed bug in cold fusion.  Have
> > you checked for updates on their site yet?
>
> Yes, I am running the latest SP.  However, I am not sure yet where is
> the problem since CF works as expected with other databases.

I repeat...

A database application language that doesn't understand NULLs is
broken, and needs to be fixed.

Just because it works with other databases doesn't mean it's not broken,
just that the other databases will let you do something you shouldn't be
allowed to do.

Re: Nulls get converted to 0 problem

From

"Vincent Hikida"

Date:

05 June 2003, 01:37:24

Thanks for the detail. :)

Vincent Hikida,
Member of Technical Staff - Urbana Software, Inc.
"A Personalized Learning Experience"

www.UrbanaSoft.com

----- Original Message -----
From: "scott.marlowe" <scott.marlowe@ihs.com>
To: "Vincent Hikida" <vhikida@inreach.com>
Cc: "Avi Schwartz" <avi@CFFtechnologies.com>; <pgsql-general@postgresql.org>
Sent: Wednesday, June 04, 2003 1:28 PM
Subject: Re: [GENERAL] Nulls get converted to 0 problem


> On Wed, 4 Jun 2003, Vincent Hikida wrote:
>
> > > FYI, "" does NOT equal NULL does not equal 0.
> >
> >
> > Interesting, I noticed that a few days ago. I noticed that in pg sql, if
I
> > used RTRIM on a column with all blanks. the RTRIM'ed result is not null.
In
> > Oracle I think it would be a null. That is in Oracle "" and NULL are
> > equivalent.
>
> Yeah, this is an issue we run into a lot on the lists.  Basically, Oracle
> got this one wrong, people wrote a lot of code expecting it, and now
> Oracle can't change it to the proper behaviour without breaking folks'
> applications.
>
> There is a setting in postgresql that turns where field=NULL into where
> field IS NULL, but I don't think there's much more support for "" being
> equivalent to NULL.
>
>

Re: Nulls get converted to 0 problem

From

"scott.marlowe"

Date:

05 June 2003, 02:20:07

On Wed, 4 Jun 2003, Vincent Hikida wrote:

> > FYI, "" does NOT equal NULL does not equal 0.
>
>
> Interesting, I noticed that a few days ago. I noticed that in pg sql, if I
> used RTRIM on a column with all blanks. the RTRIM'ed result is not null. In
> Oracle I think it would be a null. That is in Oracle "" and NULL are
> equivalent.

Yeah, this is an issue we run into a lot on the lists.  Basically, Oracle
got this one wrong, people wrote a lot of code expecting it, and now
Oracle can't change it to the proper behaviour without breaking folks'
applications.

There is a setting in postgresql that turns where field=NULL into where
field IS NULL, but I don't think there's much more support for "" being
equivalent to NULL.

Re: Nulls get converted to 0 problem

From

Ron Johnson

Date:

05 June 2003, 03:51:36

On Wed, 2003-06-04 at 15:55, scott.marlowe wrote:
> On Wed, 4 Jun 2003, Avi Schwartz wrote:
>
> >
> > On Wednesday, Jun 4, 2003, at 13:48 America/Chicago, scott.marlowe
> > wrote:
[snip]
> I repeat...
>
> A database application language that doesn't understand NULLs is
> broken, and needs to be fixed.
>
> Just because it works with other databases doesn't mean it's not broken,
> just that the other databases will let you do something you shouldn't be
> allowed to do.

I guess that's how de facto standards get created.  <Sigh>

--
+-----------------------------------------------------------+
| Ron Johnson, Jr.     Home: ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson |
|                                                           |
| Regarding war zones: "There's nothing sacrosanct about a  |
| hotel with a bunch of journalists in it."                 |
|     Marine Lt. Gen. Bernard E. Trainor (Retired)          |
+-----------------------------------------------------------+

Re: Nulls get converted to 0 problem

From

Bruno Wolff III

Date:

05 June 2003, 05:54:51

On Wed, Jun 04, 2003 at 14:39:05 -0500,
  Avi Schwartz <avi@CFFtechnologies.com> wrote:
>
> I do not store a '' or "" into an integer column.  This is how CF does
> it when it gets a null value from the database.

Can you use coalesce to work around this problem?

Re: Nulls get converted to 0 problem

From

Alvaro Herrera

Date:

05 June 2003, 09:59:14

On Wed, Jun 04, 2003 at 10:26:22AM -0500, Avi Schwartz wrote:

> select * from item_catalog where item_category is null
>
> With SQl Server 7, this works as expected, but with PostgreSQL, the
> value Coldfusion is setting the integer variables to is 0 (zero) and
> not "" as it should.

You probably can do

select coalesce(column1::text, ''), coalesce(column2::text, ''), ...
from item_catalog where item_category is null;

(Note that everything is going to come back as TEXT rather than numbers,
though)

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Las cosas son buenas o malas segun las hace nuestra opinion" (Lisias)

Re: Nulls get converted to 0 problem

From

Avi Schwartz

Date:

06 June 2003, 04:13:31

Just wanted to let everyone know that I just downloaded the latest JDBC
driver and all my problems are solved now.  It was a driver issue after
all.

Avi

On Wednesday, Jun 4, 2003, at 17:22 America/Chicago, Avi Schwartz wrote:

>
> On Wednesday, Jun 4, 2003, at 16:53 America/Chicago, scott.marlowe
> wrote:
>
>> On Wed, 4 Jun 2003, Avi Schwartz wrote:
>>
>>
>>> I have a feeling this has something to do with the JDBC driver,
>>> again,
>>> just a feeling.
>>
>> You may be on to something there. Have you tried setting up an ODBC
>> connection to see how that works?
>
> Unlike older versions, Coldfusion MX is Java based and does not
> support ODBC connections anymore.
--
Avi Schwartz
avi@CFFtechnologies.com

Re: Nulls get converted to 0 problem

From

"scott.marlowe"

Date:

06 June 2003, 06:02:44

On Thu, 5 Jun 2003, Alvaro Herrera wrote:

> On Wed, Jun 04, 2003 at 10:26:22AM -0500, Avi Schwartz wrote:
>
> > select * from item_catalog where item_category is null
> >
> > With SQl Server 7, this works as expected, but with PostgreSQL, the
> > value Coldfusion is setting the integer variables to is 0 (zero) and
> > not "" as it should.
>
> You probably can do
>
> select coalesce(column1::text, ''), coalesce(column2::text, ''), ...
> from item_catalog where item_category is null;
>
> (Note that everything is going to come back as TEXT rather than numbers,
> though)

Adding to this, if they DON'T want to have pgsql specific code in their
application, they could create views with update triggers to handle the
tables underneath.

Re: Nulls get converted to 0 problem

From

Jon Earle

Date:

06 June 2003, 10:51:51

On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:

> Oracle *incorrectly* interprets blank (empty) strings as NULL.  They are NOT
> the same.  A string of zero characters is a string nonetheless.  A NULL is
> "the absence of value", which equals nothing (theoretically not even another
> NULL).

If you're testing a value, you're testing to see if there's something in
there or not - what difference does it make if the variable contains 0, ""
or NULL?

Why not adhere to the practices inherent (and thus anticipated by
developers) in other languages (C comes to mind) where 0, NULL and "" are
equivalent?

Cheers!

--
Jon Earle

SAVE FARSCAPE http://www.savefarscape.com/

Re: Nulls get converted to 0 problem

From

kdebisschop@alert.infoplease.com

Date:

06 June 2003, 15:04:05

On Thu, 2003-06-05 at 09:39, Jon Earle wrote:
> On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:
>
> > Oracle *incorrectly* interprets blank (empty) strings as NULL.  They are NOT
> > the same.  A string of zero characters is a string nonetheless.  A NULL is
> > "the absence of value", which equals nothing (theoretically not even another
> > NULL).
>
> If you're testing a value, you're testing to see if there's something in
> there or not - what difference does it make if the variable contains 0, ""
> or NULL?
>
> Why not adhere to the practices inherent (and thus anticipated by
> developers) in other languages (C comes to mind) where 0, NULL and "" are
> equivalent?

Perhaps because the SQL Spec says they are different?

For that matter, a zero length string in C is not the same as NULL.
Believing otherwise may be convenient, but leads to segfaults

i.e., this code will cause a segfault

main(...) {
  char *str;

  if (str == NULL)
    printf ("This test is safe\n");

  if (str == "")
    printf ("This comparison above can segfault on some systems\n");

  printf ("printing a NULL string like %s can also segfault\n", str);

}

I believe in C the following is true as well:

main() {
  char *str="";

  if (str)
    printf ("An empty string evaluates as true");

}

--
Karl DeBisschop <kdebisschop@alert.infoplease.com>

Re: Nulls get converted to 0 problem

From

Jon Earle

Date:

06 June 2003, 15:46:22

On Fri, 6 Jun 2003, Mattias Kregert wrote:

> 1. Imagine a table with dates and weather conditions. How would you
>    differ between "zero degrees" and "no data entered yet"? You would
>    have to add a column (degrees_valid boolean). Having the NULL value
>    makes things much easier.
>
> 2. In OO languages (c++, java, and so on), a null pointer is very
>    different from a pointer to an object - even if that object contains
>    the value 0 (zero) or an empty string (""). The "practices" you refer
>    to does not apply/is not inherent in all languages. NULL means that
>    the variable points to nothing at all. "" means that there is some
>    address/space allocated.
>   Example:
>     String mystring = NULL;  // pointer mystring points to nothing (no
>     allocation)
>     String mystring = new String("");  // pointer mystring points to a
>     memory location, which currently holds the empty data "".

<click!>

Thank you Mattias, the differences (and similarities) are now clear.

Cheers!
Jon

--
Jon Earle

SAVE FARSCAPE http://www.savefarscape.com/

Re: Nulls get converted to 0 problem

From

Jon Earle

Date:

06 June 2003, 21:34:37

On Fri, 6 Jun 2003, Karl DeBisschop wrote:

> Perhaps because the SQL Spec says they are different?
>
> For that matter, a zero length string in C is not the same as NULL.
> Believing otherwise may be convenient, but leads to segfaults

Only if you mistreat them, as in your first example.  Testing strings is a
must:

main() {
  char *str1;
  char *str2 = "";
  char *str3 = "test";

  // Check if str1 is valid
  // Since str1 was not set to anything, it _could_ be valid.  It _should_
  // be set to NULL if not initted with data, as in: char *str1 = NULL;
  if (!str1) printf("This is not a valid string.\n");

  // More correct check for str1.
  if (str1 && !*str1)
    printf("This is a better check for str1's validity\n");

  // Check if str2 contains data.
  if (str2 && !*str2) printf("This is also not a valid string.\n");

  // If str3 is valid and contains data
  if (str3 && *str3) printf("This is a valid string\n");

  // Set the start of str3 to null.
  *str3 = 0;

  // If str3 is valid and doesn't contain data
  if (str3 && !*str3) printf("This is not a valid string\n");
}

But that doesn't answer the question that, what is the difference between
no data and null?  They both indicate zero value.  There's some esoteric
difference that I'm missing, probably because my programming background
has not involved database work until very recently.

Cheers!
Jon

>
> i.e., this code will cause a segfault
>
> main(...) {
>   char *str;
>
>   if (str == NULL)
>     printf ("This test is safe\n");
>
>   if (str == "")
>     printf ("This comparison above can segfault on some systems\n");
>
>   printf ("printing a NULL string like %s can also segfault\n", str);
>
> }
>
> I believe in C the following is true as well:

No. See above.

>
> main() {
>   char *str="";
>
>   if (str)
>     printf ("An empty string evaluates as true");
>
> }
>
> --
> Karl DeBisschop <kdebisschop@alert.infoplease.com>
>

--
Jon Earle

SAVE FARSCAPE http://www.savefarscape.com/

Re: Nulls get converted to 0 problem

From

Jonathan Bartlett

Date:

06 June 2003, 23:16:49

I think the basic idea is that NULL can be used for purposes outside the
normal range of numbers.

For example, I can use NULL to indicate that I don't know what the value
is, as apposed to the value actually being 0.

This would be better accomplished by fuller support for value domains, but
no database (or few unknown ones) implement that, and NULL is the next
best thing.

Jon

On 6 Jun 2003 kdebisschop@alert.infoplease.com wrote:

> On Thu, 2003-06-05 at 09:39, Jon Earle wrote:
> > On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:
> >
> > > Oracle *incorrectly* interprets blank (empty) strings as NULL.  They are NOT
> > > the same.  A string of zero characters is a string nonetheless.  A NULL is
> > > "the absence of value", which equals nothing (theoretically not even another
> > > NULL).
> >
> > If you're testing a value, you're testing to see if there's something in
> > there or not - what difference does it make if the variable contains 0, ""
> > or NULL?
> >
> > Why not adhere to the practices inherent (and thus anticipated by
> > developers) in other languages (C comes to mind) where 0, NULL and "" are
> > equivalent?
>
> Perhaps because the SQL Spec says they are different?
>
> For that matter, a zero length string in C is not the same as NULL.
> Believing otherwise may be convenient, but leads to segfaults
>
> i.e., this code will cause a segfault
>
> main(...) {
>   char *str;
>
>   if (str == NULL)
>     printf ("This test is safe\n");
>
>   if (str == "")
>     printf ("This comparison above can segfault on some systems\n");
>
>   printf ("printing a NULL string like %s can also segfault\n", str);
>
> }
>
> I believe in C the following is true as well:
>
> main() {
>   char *str="";
>
>   if (str)
>     printf ("An empty string evaluates as true");
>
> }
>
> --
> Karl DeBisschop <kdebisschop@alert.infoplease.com>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: Nulls get converted to 0 problem

From

"Jan Weerts"

Date:

07 June 2003, 00:37:15

Hi Jon!

>If you're testing a value, you're testing to see if there's
>something in there or not - what difference does it make if
>the variable contains 0, "" or NULL?

sorry, I cannot resist: well, it makes a huge difference.
Apples and pears are uncomparable, so are numbers and strings.
In strongly typed languages, these values are of different
types and therefore not comparable by simple means. And IIRC also
the SQL standards mentions these to be handled differently. IMO,
only because some vendor once upon a time decided to break with
the standard, other developers are not obliged to follow them
like lemmings. Existing code written in the manner of sytem X
might require some work if you port to system Y, when X and Y
do not agree on standard compliance and the coders were not
aware of that fact.

>Why not adhere to the practices inherent (and thus anticipated by
>developers) in other languages (C comes to mind) where 0, NULL
>and "" are equivalent?

Even in C an empty string is not equivalent to 0 or
NULL, AFAIR. And 0 and NULL being equivalent is more of a
convention, than a definition (at least nowadays). Any good
compiler should give you a warning on assigning with these mixed
types, unless you switch those warnings off.

And to use a counterexample, think of Java.

Bad practices tend to result in bad code.
<sarcasm>if your code was well written, you should have no trouble
to find and convert all the places where these assumptions about
equality was made</sarcasm>. I don't know if such lazily comparing
systems claim to be standard compliant. If you want to be, and I
understood postgres wants to, you have to do it the standards way.

Btw, I like postgres' style of return false on a comparison
of two NULLs. If something is undefined, how could I compare it
to anything else. I can't even figure out its features, so I should
not be able to match it successfully to anything else.

If you really need/want to have this behaviour, I suggest
writing special comparison function(s), which handles the input
as you like it. This way you only have to port the function
between two database systems and you are all done. I would still
recommend to fix the calling code. Try to imagine yourself
revisiting a non-trivial piece of code in half a year...

Sidenote: In principle I dislike all such types of implicit
conversions done by computer systems. IMO, as a programmer you
get used to imprecise thinking, when computer systems handle
your wrong input silently. In the long run, mental concepts
of the language elements will become all blurry, which in turn
leads to programming errors.

Ciao
  Jan
P.S.: as you certainly figured out already, i have a background
in programming languages and compilers :-)

Re: Nulls get converted to 0 problem

From

Erik Ronström

Date:

07 June 2003, 05:52:50

 --- Jon Earle <je_pgsql@kronos.honk.org> wrote:
> If you're testing a value, you're testing to see if there's something
> in there or not - what difference does it make if the variable
> contains 0, "" or NULL?

Depends. Say you have a table of persons and want to store the number
of cars they own in a column. 0 would mean that a person doesn't own
any car, but a NULL value rather indicates that the number of cars is
unknown.

It's not very hard to come up with examples where you want to separate
empty strings and NULL values. The point is that "" could mean "there's
something in there - an empty string", compared to "there is nothing
there".

> Why not adhere to the practices inherent (and thus anticipated by
> developers) in other languages (C comes to mind) where 0, NULL and ""
> are equivalent?

Perl is an excellent example of a programming language that uses NULL
values ("undef") in clever ways.

"The C style" can be used with Perl, since both the undefined value,
the empty string and 0 evaluates to false. "The Perl style" cannot
easily be used in C. IMHO, that's a strong argument for the support of
undefined (NULL) values: you don't have to use them if you don't want
to (in SQL, just use NOT NULL to prohibit NULL values), but it's very
hard to simulate them if you really want to separate 0 and NULL.

Erik

__________________________________________________
Yahoo! Plus - For a better Internet experience
http://uk.promotions.yahoo.com/yplus/yoffer.html

Re: Nulls get converted to 0 problem

From

DeJuan Jackson

Date:

07 June 2003, 05:58:00

Jon Earle wrote:

>But that doesn't answer the question that, what is the difference between
>no data and null?  They both indicate zero value.  There's some esoteric
>difference that I'm missing, probably because my programming background
>has not involved database work until very recently.
>
>Cheers!
>Jon
>
>
>

I believe the point that was being made is that if you treat NULL the
same as "" then even in C /C++, you get at best unpredictable behavior.
I think what you are missing is that in C/C++ NULL == 0 (I have yet to
see a C/C++ implementation where this is not true) from a pointer
perspective.  So,  the pointer can be tested for trueness in and of
itself, which just happens to be the same case for the C/C++ end of
string character '\0' == 0.  The thing to remember when programming
C/C++ is that a NULL pointer is not equal to "", they represent two
completely different states for the variable in question.  The same
situation is true in a database context NULL stands for unknown value
(NOTE: This is not what it means in C/C++), an empty string is a known
value so is 0.

It's like in Monty Python's Holy Grain, when they come to the come to
the bridge the bridge keeper had a database of each ones name and quest
but he had a NULL value for whether or not the swallow was African or
European.

Another possible image for a database NULL, would be the state of
Shrodinger's cat before you look in the box (completely unknown).  After
looking in the box (know the values for the cats state) the only valid
states are ALIVE and DEAD, but before you look the only possible
condition is unknown (NULL), but you still need other information in the
tuple/row about the cat (such as it's favorite chew toy, if it's house
broken, if it's had it's shots, is it male or female, has it been spade
or neutered, ...), which are known before you look into the box.

Re: Nulls get converted to 0 problem

From

terry@greatgulfhomes.com

Date:

07 June 2003, 06:49:09

One good reason not interpret the empty string as NULL is because the empty
string could violate a foreign key constraint, whereas a NULL says "don't
evaluate the fkey constraint, there is no value here for this row"

Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com
Fax: (416) 441-9085


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Jon Earle
> Sent: Thursday, June 05, 2003 9:39 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Nulls get converted to 0 problem
>
>
> On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:
>
> > Oracle *incorrectly* interprets blank (empty) strings as
> NULL.  They are NOT
> > the same.  A string of zero characters is a string
> nonetheless.  A NULL is
> > "the absence of value", which equals nothing (theoretically
> not even another
> > NULL).
>
> If you're testing a value, you're testing to see if there's
> something in
> there or not - what difference does it make if the variable
> contains 0, ""
> or NULL?
>
> Why not adhere to the practices inherent (and thus anticipated by
> developers) in other languages (C comes to mind) where 0,
> NULL and "" are
> equivalent?
>
> Cheers!
>
> --
> Jon Earle
>
> SAVE FARSCAPE http://www.savefarscape.com/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Re: Nulls get converted to 0 problem

From

Samuel Tardieu

Date:

07 June 2003, 08:37:19

>>>>> "kdebisschop" == kdebisschop  <kdebisschop@alert.infoplease.com> writes:

kdebisschop>   if (str == "")
kdebisschop>     printf ("This comparison above can segfault on some
kdebisschop>     systems\n");

No, it can't. You are comparing pointers (str and a pointer to a NULL
string). This is exactly the same as comparing integers. If any
compiler generates code which does a segmentation fault, it is far
from being (even remotely) a C compiler (and I'm not being pedantic
here).

If you meant strcmp(str, "") that's a completely different story.

  Sam
--
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/sam

Re: Nulls get converted to 0 problem

From

kdebisschop@alert.infoplease.com

Date:

07 June 2003, 18:07:23

On Fri, 2003-06-06 at 11:31, Samuel Tardieu wrote:
> >>>>> "kdebisschop" == kdebisschop  <kdebisschop@alert.infoplease.com> writes:
>
> kdebisschop>   if (str == "")
> kdebisschop>     printf ("This comparison above can segfault on some
> kdebisschop>     systems\n");
>
> No, it can't. You are comparing pointers (str and a pointer to a NULL
> string). This is exactly the same as comparing integers. If any
> compiler generates code which does a segmentation fault, it is far
> from being (even remotely) a C compiler (and I'm not being pedantic
> here).
>
> If you meant strcmp(str, "") that's a completely different story.

I think you're right -- I was merely trying to illustrate that NULL and
"" are not the same in C. My example should probably have been labelled
as being more akin to psuedocode

Thanks for the clarification.


--
Karl DeBisschop <kdebisschop@alert.infoplease.com>

Re: Nulls get converted to 0 problem

From

Mike Mascari

Date:

07 June 2003, 19:38:03

Jon Earle wrote:
> On Fri, 6 Jun 2003, Mattias Kregert wrote:
>
>
>>1. Imagine a table with dates and weather conditions. How would you
>>   differ between "zero degrees" and "no data entered yet"? You would
>>   have to add a column (degrees_valid boolean). Having the NULL value
>>   makes things much easier.
>
> <click!>
>
> Thank you Mattias, the differences (and similarities) are now clear.

Some people think NULLs have no business in databases:

http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

And that they cause unforeseen logical problems:

http://www.firstsql.com/iexist2.htm

There also isn't any notion of typed NULLs. At least the C++ example
has a zero-initialized pointer to a type.

The meaning is overloaded. Does NULL mean:

"I don't know" or
"I can't know" or
"Not applicable"  <- This usually implies a non-normalized database

They also require a full implementation of RI (MATCH PARTIAL) if the
attributes are involved in PK/FK relationships.

They cause problems in representations when moving data into and out
of the database between non-SQL systems, as evidenced by this thread.

I'm not sure of the current state, but in older versions of
PostgreSQL, NULLs weren't indexable.

I'd avoid them. But that's just my humble opinion...

Mike Mascari
mascarm@mascari.com

Re: Nulls get converted to 0 problem

From

Andre Truter

Date:

08 June 2003, 00:01:46

On Thu, 2003-06-05 at 15:39, Jon Earle wrote:
> On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:
>
> > Oracle *incorrectly* interprets blank (empty) strings as NULL.  They are NOT
> > the same.  A string of zero characters is a string nonetheless.  A NULL is
> > "the absence of value", which equals nothing (theoretically not even another
> > NULL).
>
> If you're testing a value, you're testing to see if there's something in
> there or not - what difference does it make if the variable contains 0, ""
> or NULL?
>
> Why not adhere to the practices inherent (and thus anticipated by
> developers) in other languages (C comes to mind) where 0, NULL and "" are
> equivalent?
>
> Cheers!

No! For me it is very important that NULL does not equal 0 or "",
because we have a database where a value of 0 means 0 and a NULL means
that the data normally stored there is not applicable for that record.
We calculate averages on the fields, and we don't want 'not applicable'
values to affect the average.

In any case, this is actually in line with languages like C, because
NULL means that no memory has been assigned to a variable, it does not
mean that the value stored in that variable is 0, or even "".

NULL, 0 and "" are definitely not the same thing in C.  If you see them
as the same, you are just looking for segmentation faults. :-)

In any case, NULL should not equal 0, as 0 is a value (between -1 and
1), while NULL is nothing, you cannot compare it to another value.

Is NULL smaller or larger than 1?

--
Andre Truter
Software Engineer
Registered Linux user #185282
ICQ #40935899  AIM: trusoftzaf
http://www.trusoft.za.net

<------------------------------------------------->
< The box said: Requires Windows 95 or better...  >
< So I installed Linux                            >
<------------------------------------------------->

Disclaimer and Confidentiality Warning

This message is intended for the addressee only. If you are not the intended recipient of this message, you are
notifiedthat any distribution, use of or copying of this communication is strictly prohibited. If you have received the
communicationin error, please notify the sender immediately. The views and opinions expressed in this message are those
ofthe individual sender of this message and do not necessarily represent the views and opinions of ATIO. Consequently,
ATIOdoes not accept responsibility for such views and opinions and this message should not be read as representing the
viewsand opinions of ATIO without subsequent written confirmation. Each page attached hereto must also be read in
conjunctionwith this disclaimer.

Re: Nulls get converted to 0 problem

From

"scott.marlowe"

Date:

08 June 2003, 01:33:17

On Thu, 5 Jun 2003, Jon Earle wrote:

> On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:
>
> > Oracle *incorrectly* interprets blank (empty) strings as NULL.  They are NOT
> > the same.  A string of zero characters is a string nonetheless.  A NULL is
> > "the absence of value", which equals nothing (theoretically not even another
> > NULL).
>
> If you're testing a value, you're testing to see if there's something in
> there or not - what difference does it make if the variable contains 0, ""
> or NULL?

Every interface I know of in every language (except cold fusion) has a
test for null.  There IS a difference, and it's not a difference of just
semantics, it has real world meaning.

Enter a record for me.  Enter my cell phone number.  It's a text type.  If
you enter a NULL you are saying I may or may not have a cell phone, you
don't know.  If you enter '' you are saying that I do NOT have a cell
phone.

Hey, who has a cell phone we don't have numbers for?

select * from table where cell_phone IS NULL;

I don't have to make up a boolean to say what I mean when I put in a '' or
a NULL.

For numbers, a NULL should never be coerced to 0, which is what was
happening to Ari due to the older jdbc driver.  Since blank numeric and
date types aren't allowed there's no confusion issue.  But for text there
certainly is a difference in meaning.

Re: Nulls get converted to 0 problem

From

Dmitry Tkach

Date:

08 June 2003, 03:03:16

>
>
>If you're testing a value, you're testing to see if there's something in
>there or not - what difference does it make if the variable contains 0, ""
>or NULL?
>
>
World of difference :-)
An intereger value of 0 is *certainly* "somethign", as well as an empty
string.

>Why not adhere to the practices inherent (and thus anticipated by
>developers) in other languages (C comes to mind) where 0, NULL and "" are
>equivalent?
>
Not at all. An empty string in C is a valid pointer to a memory
location, that contains a 0, and NULL is a pointer to a memory location
that does not exist.
They are not equivalent at all.

Dima

Re: Nulls get converted to 0 problem

From

Avi Schwartz

Date:

08 June 2003, 15:32:45

On Friday, Jun 6, 2003, at 09:45 America/Chicago, scott.marlowe wrote:

> On Thu, 5 Jun 2003, Jon Earle wrote:
>
>> On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:
>>
>>> Oracle *incorrectly* interprets blank (empty) strings as NULL.  They
>>> are NOT
>>> the same.  A string of zero characters is a string nonetheless.  A
>>> NULL is
>>> "the absence of value", which equals nothing (theoretically not even
>>> another
>>> NULL).
>>
>> If you're testing a value, you're testing to see if there's something
>> in
>> there or not - what difference does it make if the variable contains
>> 0, ""
>> or NULL?

If you even used a statistical package like SPSS, you will find that
null is a very important value.  Most statistical calculations
eliminate the null value since it implies that the value was not known
and therefore should be be used.

> Every interface I know of in every language (except cold fusion) has a
> test for null.  There IS a difference, and it's not a difference of
> just
> semantics, it has real world meaning.
>
> Enter a record for me.  Enter my cell phone number.  It's a text type.
>  If
> you enter a NULL you are saying I may or may not have a cell phone, you
> don't know.  If you enter '' you are saying that I do NOT have a cell
> phone.
>
> Hey, who has a cell phone we don't have numbers for?
>
> select * from table where cell_phone IS NULL;
>
> I don't have to make up a boolean to say what I mean when I put in a
> '' or
> a NULL.

We have a field in our tables which contains the date on which the
record was deleted (i.e. soft delete).  If there was no null value, we
would either have to add a deleted flag (not too bad) or have to
reserve a specific date to designate a deleted record (terrible).

> For numbers, a NULL should never be coerced to 0, which is what was
> happening to Ari due to the older jdbc driver.  Since blank numeric and
> date types aren't allowed there's no confusion issue.  But for text
> there
> certainly is a difference in meaning.

Actually it was happening to Avi :-)

The lack of testing for null is indeed a real issue with ColdFusion.
There is really no way to know if you received the empty string because
it was empty or because it was null.

Avi
-
Avi Schwartz
avi@CFFtechnologies.com

Re: Nulls get converted to 0 problem

From

Stephan Szabo

Date:

08 June 2003, 17:24:04

On Thu, 5 Jun 2003, Jon Earle wrote:

> On Wed, 4 Jun 2003 terry@ashtonwoodshomes.com wrote:
>
> > Oracle *incorrectly* interprets blank (empty) strings as NULL.  They are NOT
> > the same.  A string of zero characters is a string nonetheless.  A NULL is
> > "the absence of value", which equals nothing (theoretically not even another
> > NULL).
>
> If you're testing a value, you're testing to see if there's something in
> there or not - what difference does it make if the variable contains 0, ""
> or NULL?
>
> Why not adhere to the practices inherent (and thus anticipated by
> developers) in other languages (C comes to mind) where 0, NULL and "" are
> equivalent?

Because SQL already defines what NULL means to be something else, it's an
unknown value. Also, in C, NULL and "" are different and not very
equivalent (try passing strcmp a NULL rather than empty string on many
systems ;) )

Performance of query

From

Edmund Dengler

Date:

09 June 2003, 20:33:28

Ok, now I am really confused. I have a query that runs really slow, yet
when I code it up as a function it runs very fast! The optimizer seems to
not evaluate a good method for performing the query.

Background: This is a database based off of Snort. There is a base table
called <event> (with <sid>=sensor ID, <cid>=event ID, <timestamp>=when
event collected, <signature>=which event was recognized), and a number of
depending tables for various bits and pieces of network packet data
(<iphdr>=IP header info, <tcphdr>=TCP header info, <data>=packet payload)
and <signature> (what event).

Notes: Running 7.3.3
Database is around 20GB on disk.

=> select version();
                               version
---------------------------------------------------------------------
 PostgreSQL 7.3.3 on i386-unknown-openbsd3.0, compiled by GCC 2.95.3
(1 row)

=> \d event
               Table "public.event"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 sid       | integer                  | not null
 cid       | bigint                   | not null
 signature | integer                  | not null
 timestamp | timestamp with time zone | not null
Indexes: event_pkey primary key btree (sid, cid),
         event_pull_idx unique btree ("timestamp", sid, cid),
         signature_idx btree (signature),
         timestamp_idx btree ("timestamp")
Triggers: event_summary

=> \d signature
                             Table "public.signature"
    Column    |  Type   |                        Modifiers
--------------+---------+----------------------------------------------------------
 sig_id       | integer | not null default nextval('"signature_sig_id_seq"'::text)
<<...more columns...>>
Indexes: signature_pkey primary key btree (sig_id),
         sig_class_id btree (sig_class_id),
         sig_name_idx btree (sig_name)

Rest of the tables are primary keyed with (sid, cid). For example,

=> \d iphdr
      Table "public.iphdr"
  Column  |   Type   | Modifiers
----------+----------+-----------
 sid      | integer  | not null
 cid      | bigint   | not null
<<...more columns...>>
Indexes: iphdr_pkey primary key btree (sid, cid),
         ip_dst_idx btree (ip_dst),
         ip_src_idx btree (ip_src)
Triggers: iphdr_summary

If I run the query:

explain analyze
select
  event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst,
  icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq,
  udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum,
  tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off,
  tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp,
  sensor.hostname, sensor.interface, data.data_payload
from (
  select * from event
  where timestamp > (select now() - '2 hours'::interval)
    and exists (select 1 from hack_pull_sid where sid = event.sid)
) as event
  left join signature on signature.sig_id = event.signature
  left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid
  left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid
  left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid
  left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid
  left join sensor on sensor.sid = event.sid
  left join data on data.sid = event.sid and data.cid = event.cid
order by
  timestamp desc, event.sid desc, event.cid desc
;

(note: <hack_pull_sid> is a table of SIDs I am interested in so that I
avoid the issues with IN)

I get the following output:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1712181.78..1712776.52 rows=237893 width=853) (actual time=163484.81..163484.97 rows=129 loops=1)
   Sort Key: public.event."timestamp", public.event.sid, public.event.cid
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.04..0.04 rows=1 loops=1)
   ->  Merge Join  (cost=1471148.73..1551631.57 rows=237893 width=853) (actual time=152693.86..163484.26 rows=129
loops=1)
         Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
         ->  Merge Join  (cost=728456.82..802284.20 rows=237893 width=258) (actual time=33652.33..39050.11 rows=129
loops=1)
               Merge Cond: ("outer".sid = "inner".sid)
               ->  Merge Join  (cost=728454.78..798713.57 rows=237893 width=226) (actual time=33635.79..39031.06
rows=129loops=1) 
                     Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                     ->  Merge Join  (cost=728454.78..736750.59 rows=237893 width=172) (actual
time=22767.67..24071.84rows=129loops=1) 
                           Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                           ->  Merge Join  (cost=728454.78..731219.48 rows=237893 width=144) (actual
time=22164.44..22681.23rows=129 loops=1) 
                                 Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                 ->  Sort  (cost=701134.84..701729.57 rows=237893 width=116) (actual
time=15343.87..15344.05rows=129 loops=1) 
                                       Sort Key: public.event.sid, public.event.cid
                                       ->  Merge Join  (cost=590896.58..671684.37 rows=237893 width=116) (actual
time=10857.65..15343.57rows=129 loops=1) 
                                             Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                             ->  Index Scan using iphdr_pkey on iphdr  (cost=0.00..73422.89
rows=1432042width=28) (actual time=23.19..13810.58 rows=687918 loops=1) 
                                             ->  Sort  (cost=590896.58..591491.31 rows=237893 width=88) (actual
time=101.79..101.95rows=129 loops=1) 
                                                   Sort Key: public.event.sid, public.event.cid
                                                   ->  Hash Join  (cost=185.07..563108.55 rows=237893 width=88) (actual
time=97.42..101.39rows=129 loops=1) 
                                                         Hash Cond: ("outer".signature = "inner".sig_id)
                                                         ->  Index Scan using timestamp_idx on event
(cost=0.00..558165.62rows=237893 width=24) (actual time=0.18..3.05 rows=129 loops=1) 
                                                               Index Cond: ("timestamp" > $0)
                                                               Filter: (subplan)
                                                               SubPlan
                                                                 ->  Seq Scan on hack_pull_sid  (cost=0.00..1.15 rows=1
width=0)(actual time=0.01..0.01 rows=1 loops=171) 
                                                                       Filter: (sid = $1)
                                                         ->  Hash  (cost=167.26..167.26 rows=7126 width=64) (actual
time=97.12..97.12rows=0 loops=1) 
                                                               ->  Seq Scan on signature  (cost=0.00..167.26
rows=7126width=64)(actual time=0.01..80.86 rows=7126 loops=1) 
                                 ->  Sort  (cost=27319.94..27897.87 rows=231171 width=28) (actual time=6606.52..7073.38
rows=110717loops=1) 
                                       Sort Key: icmphdr.sid, icmphdr.cid
                                       ->  Seq Scan on icmphdr  (cost=0.00..3784.71 rows=231171 width=28) (actual
time=19.81..2082.01rows=231303 loops=1) 
                           ->  Index Scan using udphdr_pkey on udphdr  (cost=0.00..4411.44 rows=115228 width=28)
(actualtime=60.23..1269.61 rows=54669 loops=1) 
                     ->  Index Scan using tcphdr_pkey on tcphdr  (cost=0.00..56586.06 rows=1139141 width=54) (actual
time=88.72..13878.52rows=484419 loops=1) 
               ->  Sort  (cost=2.04..2.11 rows=30 width=32) (actual time=16.46..16.64 rows=136 loops=1)
                     Sort Key: sensor.sid
                     ->  Seq Scan on sensor  (cost=0.00..1.30 rows=30 width=32) (actual time=16.26..16.34 rows=30
loops=1)
         ->  Sort  (cost=742691.92..745059.80 rows=947154 width=595) (actual time=108643.60..123322.67 rows=417145
loops=1)
               Sort Key: data.sid, data.cid
               ->  Seq Scan on data  (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.67..54494.61
rows=947324loops=1) 
 Total runtime: 164147.78 msec
(43 rows)

For some runs, I was getting over 800,000 msec!!

Now, this seems very strange for only 129 rows. The inner select returns
exactly the 129 rows I am interested in, and now we only need to do a few
left joins to get the corresponding rows (which may or may not exist,
which is why the LEFT JOIN's). Note that the (<sid>,<cid>) is the primary
key for most of these joins, and so the LEFT JOIN should be able to
recognize that there will only be one or zero rows matching. In addition,
(<sid>,<cid>)  is the primary key for the <event> table. Similar
discussion with <signature> as <sig_id> is it's primary key.

Notes:
- The table <event> has 1,427,411 rows
- <event>.<timestamp> ranges from 2002-01-19 12:28:29-05 to
  2003-06-08 23:40:01-04
- The last 2 hours is a very small range assuming an even distribution
- All tables are using 100 bins for statistics
- A VACUUM ANALYZE was performed on the entire database before this test

As can be seem, the estimator seems to think that 237,893 rows will be
returned from the INNER SELECT out of 1,427,411 rows. Can anyone explain
how this number could be calculated? Even assuming a straight linear
assumption, this should fall into 2hours/(24hours/day*400days) rough
estimate which is (very roughly) 0.02% of the table.

So, I wrote a function. (Please, no remarks on style, it was a quick hack
to compare performance. I am sure it could be made better and more
optimized). It simply performs the inner SELECT, and then for each row,
does the outer select for the match.

CREATE TYPE hack_pull_type AS (
  sid int, cid bigint, timestamp timestamp with time zone,
  sig_name text,
  ip_src bigint, ip_dst bigint,
  icmp_type smallint, icmp_code smallint, icmp_csum int, icmp_id int, icmp_seq int,
  udp_sport int, udp_dport int, udp_len int, udp_csum int,
  tcp_sport int, tcp_dport int, tcp_seq bigint, tcp_ack bigint, tcp_off smallint,
  tcp_res smallint, tcp_flags smallint, tcp_win int, tcp_csum int, tcp_urp int,
  hostname text, interface text, data_payload text
);

CREATE OR REPLACE FUNCTION hack_pull_func() RETURNS SETOF hack_pull_type
AS '
DECLARE
  event_rec event%ROWTYPE;
  -- pull_rec hack_pull_type;
  pull_rec RECORD;
BEGIN
  FOR event_rec IN
    SELECT *
    FROM event
    WHERE timestamp > (SELECT now() - ''2 hours''::interval)
      AND EXISTS (SELECT 1 FROM hack_pull_sid WHERE sid = event.sid)
    ORDER BY timestamp desc, sid desc, cid desc
  LOOP
    SELECT
      event.sid, event.cid, event.timestamp,
      sig_name, ip_src, ip_dst,
      icmp_type, icmp_code, icmp_csum, icmp_id, icmp_seq,
      udp_sport, udp_dport, udp_len, udp_csum,
      tcp_sport, tcp_dport, tcp_seq, tcp_ack, tcp_off, tcp_res,
      tcp_flags, tcp_win, tcp_csum, tcp_urp,
      sensor.hostname, sensor.interface, data.data_payload
    INTO pull_rec
    FROM event
      left join signature on signature.sig_id = event.signature
      left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid
      left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid
      left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid
      left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid
      left join sensor on sensor.sid = event.sid
      left join data on data.sid = event.sid and data.cid = event.cid
    WHERE
      event.cid = event_rec.cid and event.sid = event_rec.sid
    LIMIT 1;
    RETURN NEXT pull_rec;
  END LOOP;

  RETURN;
END;
' LANGUAGE 'plpgsql';

=> explain analyze select * from hack_pull_func();
                                                     QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Function Scan on hack_pull_func  (cost=0.00..12.50 rows=1000 width=238) (actual time=5259.46..5259.62 rows=23 loops=1)
 Total runtime: 5259.70 msec
(2 rows)

=> explain analyze select * from hack_pull_func();
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Function Scan on hack_pull_func  (cost=0.00..12.50 rows=1000 width=238) (actual time=38.02..38.18 rows=23 loops=1)
 Total runtime: 38.26 msec
(2 rows)

(Sorry, the data has changed as time moves forward and I type this email.
However, the numbers are reflective of the performance difference I
observed during testing).

So, what is happening? How come my function (ugly as it may be), is orders
of magnitude faster than what should be a clean query in SQL? Why is the
estimator so far off the real issue?

Is there a way of writing the query to get the good performance? Note that
there are a number of other queries I run which exhibit similar
performance/estimation issues, and I wonder if I will need to rwrite all
of them into custom functions (thanks for the SETOF piece as this makes it
possible to write this stuff now).

Thanks!
Ed

Re: Nulls get converted to 0 problem

From

Martijn van Oosterhout

Date:

10 June 2003, 01:51:08

On Fri, Jun 06, 2003 at 01:34:16PM -0400, Mike Mascari wrote:
> There also isn't any notion of typed NULLs. At least the C++ example
> has a zero-initialized pointer to a type.

Dunno about the rest but this is false. NULLs can be typed:

# select null::int4 as f into temp a;
SELECT
# \d a
     Table "pg_temp_1.a"
 Column |  Type   | Modifiers
--------+---------+-----------
 f      | integer |

Not that it usually matters.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Attachment

msg-32091-60907.dat