Thread: SET within a function?

SET within a function?

From

Edmund Dengler

Date:

13 October 2003, 22:17:01

Hi all!

I am doing some trigger functions that need to find a tuple in another
table. The problem is that this second table is doing some summarization
work, and I need nulls to equal each other.

Basically, in the trigger I do a:

  SELECT INTO ... x
  FROM table1
  WHERE ...(some straightforward x = old.x)...
    AND (x1 = old.x1 OR (x1 is null and old.x1 is null))
    AND (x2 = old.x2 OR (x2 is null and old.x2 is null))
    AND (x3 = old.x3 OR (x3 is null and old.x3 is null));

The problem is that an index is used to perform the straightforward stuff,
and then the x1,x2,x3 is done via an index scan, rather than directly.
Unfortunately for the data set I have, it can be clustered pretty badly
around the straightforward stuff, and so the scan can take multiple
seconds per call.

I think if I could do a 'SET TRANSFORM_NULL_EQUALS TO ON' then this might
fix the issue (don't know, haven't tried it yet). My question is: can this
be done within a function such that at the end of the function, the value
is reset back to value upon entering (kind of like 'SET LOCAL' except for
just the length of the function call). Is this possible?

Thanks!
Ed

Re: SET within a function?

From

Bruno Wolff III

Date:

13 October 2003, 23:10:40

On Mon, Oct 13, 2003 at 21:16:33 -0400,
  Edmund Dengler <edmundd@eSentire.com> wrote:
>
> I think if I could do a 'SET TRANSFORM_NULL_EQUALS TO ON' then this might
> fix the issue (don't know, haven't tried it yet). My question is: can this
> be done within a function such that at the end of the function, the value
> is reset back to value upon entering (kind of like 'SET LOCAL' except for
> just the length of the function call). Is this possible?

I don't think that will do what you want. That setting is used to
rewrite = null as is null, not to change things so that nulls match each
other.

Re: SET within a function?

From

Edmund Dengler

Date:

13 October 2003, 23:50:11

Is the rewrite only for the literal 'X = NULL' or will it do a test
against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)?

Is there any way to match NULLS to each other (as I am looking for a
literal row, not using NULL as the UNKNOWN). I suppose I could put in a
dummy value for the 'Not a valid value', but it seems to be quite awkward
when I really do want the NULL.

Regards!
Ed

On Mon, 13 Oct 2003, Bruno Wolff III wrote:

> On Mon, Oct 13, 2003 at 21:16:33 -0400,
>   Edmund Dengler <edmundd@eSentire.com> wrote:
> >
> > I think if I could do a 'SET TRANSFORM_NULL_EQUALS TO ON' then this might
> > fix the issue (don't know, haven't tried it yet). My question is: can this
> > be done within a function such that at the end of the function, the value
> > is reset back to value upon entering (kind of like 'SET LOCAL' except for
> > just the length of the function call). Is this possible?
>
> I don't think that will do what you want. That setting is used to
> rewrite = null as is null, not to change things so that nulls match each
> other.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

Re: SET within a function?

From

Mike Mascari

Date:

14 October 2003, 02:25:44

Edmund Dengler wrote:

> Is the rewrite only for the literal 'X = NULL' or will it do a test
> against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)?

It is a parse time transformation:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=26ef31219ae11442&seekm=3DF52206.5060507%40mascari.com#link6

> Is there any way to match NULLS to each other (as I am looking for a
> literal row, not using NULL as the UNKNOWN). I suppose I could put in a
> dummy value for the 'Not a valid value', but it seems to be quite awkward
> when I really do want the NULL.

Normalization would have you eliminate the NULL by having another
relation whose candidate key is the same as your original table, but
those records whose attribute is NULL would simply not be present in
the child table.

Another possible solution is to define your own type with an internal
status for 'Not a valid value'...

HTH,

Mike Mascari
mascarm@mascari.com

Re: SET within a function?

From

"Arthur Ward"

Date:

14 October 2003, 13:46:58

> Is the rewrite only for the literal 'X = NULL' or will it do a test
> against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)?
>
> Is there any way to match NULLS to each other (as I am looking for a
> literal row, not using NULL as the UNKNOWN). I suppose I could put in a
> dummy value for the 'Not a valid value', but it seems to be quite awkward
> when I really do want the NULL.

I ended up writing an "equivalent" function for the project I'm working
on. It goes like this in plpgsql:

    IF $1 IS NULL THEN
        RETURN $2 IS NULL;
    ELSIF $2 IS NULL THEN
        -- We already know $1 is not null.
        RETURN FALSE;
    ELSE
        -- Both args are not null.
        RETURN $1 = $2;
    END IF;

That's the basic idea. I put a wrapper around this to generate a copy of
it for all the data types used in my database.

Re: SET within a function?

From

Edmund Dengler

Date:

14 October 2003, 13:53:33

The problem I would face is that this still needs to be a sequential scan
in the table rather than an index lookup.

Regards,
Ed

On Tue, 14 Oct 2003, Arthur Ward wrote:

> > Is the rewrite only for the literal 'X = NULL' or will it do a test
> > against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)?
> >
> > Is there any way to match NULLS to each other (as I am looking for a
> > literal row, not using NULL as the UNKNOWN). I suppose I could put in a
> > dummy value for the 'Not a valid value', but it seems to be quite awkward
> > when I really do want the NULL.
>
> I ended up writing an "equivalent" function for the project I'm working
> on. It goes like this in plpgsql:
>
>     IF $1 IS NULL THEN
>         RETURN $2 IS NULL;
>     ELSIF $2 IS NULL THEN
>         -- We already know $1 is not null.
>         RETURN FALSE;
>     ELSE
>         -- Both args are not null.
>         RETURN $1 = $2;
>     END IF;
>
> That's the basic idea. I put a wrapper around this to generate a copy of
> it for all the data types used in my database.
>

Re: SET within a function?

From

Mike Mascari

Date:

14 October 2003, 14:06:30

Edmund Dengler wrote:

> The problem I would face is that this still needs to be a sequential scan
> in the table rather than an index lookup.

IIRC, NULL values aren't indexed, only actual values, which is an
implementation detail but yet-another reason why NULL-elimination
through normalization is a good idea:

http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf


Mike Mascari
mascarm@mascari.com

Re: SET within a function?

From

Edmund Dengler

Date:

14 October 2003, 19:12:09

An issue is that I am trying to avoid having another table (to stop the
need for a join (performance reasons)). The NULLs are relatively rare, but
since they can appear, and in certain pathological cases the sequential
scan can take seconds to run, I was hoping for a work-around. But it looks
like I have no real choice in this as there is no way to specify that
NULL == NULL.

Another question: if I have a multi-column index, and one of the values of
a tuple is NULL, is that row not indexed? If it is, how does this jibe
with the "NULLs are not indexed" statements?

Thanks!
Ed

On Tue, 14 Oct 2003, Mike Mascari wrote:

> Edmund Dengler wrote:
>
> > Is the rewrite only for the literal 'X = NULL' or will it do a test
> > against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)?
>
> It is a parse time transformation:
>
>
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=26ef31219ae11442&seekm=3DF52206.5060507%40mascari.com#link6
>
> > Is there any way to match NULLS to each other (as I am looking for a
> > literal row, not using NULL as the UNKNOWN). I suppose I could put in a
> > dummy value for the 'Not a valid value', but it seems to be quite awkward
> > when I really do want the NULL.
>
> Normalization would have you eliminate the NULL by having another
> relation whose candidate key is the same as your original table, but
> those records whose attribute is NULL would simply not be present in
> the child table.
>
> Another possible solution is to define your own type with an internal
> status for 'Not a valid value'...
>
> HTH,
>
> Mike Mascari
> mascarm@mascari.com
>
>
>

Re: SET within a function?

From

darren@crystalballinc.com

Date:

14 October 2003, 19:18:54

You can try COALESCE function where by if you have a null you can make it
have a default value and that default value will be a value the field
never reaches

i.e. COALESCE(NULL,'') = COALESCE(NULL,'')

HTH
Darren

On Tue, 14 Oct 2003, Edmund Dengler wrote:

> An issue is that I am trying to avoid having another table (to stop the
> need for a join (performance reasons)). The NULLs are relatively rare, but
> since they can appear, and in certain pathological cases the sequential
> scan can take seconds to run, I was hoping for a work-around. But it looks
> like I have no real choice in this as there is no way to specify that
> NULL == NULL.
>
> Another question: if I have a multi-column index, and one of the values of
> a tuple is NULL, is that row not indexed? If it is, how does this jibe
> with the "NULLs are not indexed" statements?
>
> Thanks!
> Ed
>
> On Tue, 14 Oct 2003, Mike Mascari wrote:
>
> > Edmund Dengler wrote:
> >
> > > Is the rewrite only for the literal 'X = NULL' or will it do a test
> > > against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)?
> >
> > It is a parse time transformation:
> >
> >
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=26ef31219ae11442&seekm=3DF52206.5060507%40mascari.com#link6
> >
> > > Is there any way to match NULLS to each other (as I am looking for a
> > > literal row, not using NULL as the UNKNOWN). I suppose I could put in a
> > > dummy value for the 'Not a valid value', but it seems to be quite awkward
> > > when I really do want the NULL.
> >
> > Normalization would have you eliminate the NULL by having another
> > relation whose candidate key is the same as your original table, but
> > those records whose attribute is NULL would simply not be present in
> > the child table.
> >
> > Another possible solution is to define your own type with an internal
> > status for 'Not a valid value'...
> >
> > HTH,
> >
> > Mike Mascari
> > mascarm@mascari.com
> >
> >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
Darren Ferguson

Re: SET within a function?

From

Edmund Dengler

Date:

14 October 2003, 19:24:22

This just returns us to the problem that the use of a function causes a
sequential scan (as the select has to essentially apply the function to
each row). I would need to store a dummy value into the field (it is an
int, so I could store -1, but it breaks my sense of aesthetics to do this
simply to get around the sequential scan).

Could I use a functional index, maybe?

Regards,
Ed

On Tue, 14 Oct 2003 darren@crystalballinc.com wrote:

> You can try COALESCE function where by if you have a null you can make it
> have a default value and that default value will be a value the field
> never reaches
>
> i.e. COALESCE(NULL,'') = COALESCE(NULL,'')
>
> HTH
> Darren
>
> On Tue, 14 Oct 2003, Edmund Dengler wrote:
>
> > An issue is that I am trying to avoid having another table (to stop the
> > need for a join (performance reasons)). The NULLs are relatively rare, but
> > since they can appear, and in certain pathological cases the sequential
> > scan can take seconds to run, I was hoping for a work-around. But it looks
> > like I have no real choice in this as there is no way to specify that
> > NULL == NULL.
> >
> > Another question: if I have a multi-column index, and one of the values of
> > a tuple is NULL, is that row not indexed? If it is, how does this jibe
> > with the "NULLs are not indexed" statements?
> >
> > Thanks!
> > Ed
> >
> > On Tue, 14 Oct 2003, Mike Mascari wrote:
> >
> > > Edmund Dengler wrote:
> > >
> > > > Is the rewrite only for the literal 'X = NULL' or will it do a test
> > > > against a value such as 'X = OLD.X' (and rewrite is OLD.X is NULL)?
> > >
> > > It is a parse time transformation:
> > >
> > >
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=26ef31219ae11442&seekm=3DF52206.5060507%40mascari.com#link6
> > >
> > > > Is there any way to match NULLS to each other (as I am looking for a
> > > > literal row, not using NULL as the UNKNOWN). I suppose I could put in a
> > > > dummy value for the 'Not a valid value', but it seems to be quite awkward
> > > > when I really do want the NULL.
> > >
> > > Normalization would have you eliminate the NULL by having another
> > > relation whose candidate key is the same as your original table, but
> > > those records whose attribute is NULL would simply not be present in
> > > the child table.
> > >
> > > Another possible solution is to define your own type with an internal
> > > status for 'Not a valid value'...
> > >
> > > HTH,
> > >
> > > Mike Mascari
> > > mascarm@mascari.com
> > >
> > >
> > >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
>
> --
> Darren Ferguson
>
>

Re: SET within a function?

From

Mike Mascari

Date:

14 October 2003, 19:33:01

Edmund Dengler wrote:

> This just returns us to the problem that the use of a function causes a
> sequential scan (as the select has to essentially apply the function to
> each row). I would need to store a dummy value into the field (it is an
> int, so I could store -1, but it breaks my sense of aesthetics to do this
> simply to get around the sequential scan).
>
> Could I use a functional index, maybe?

Yes, but I think you have to write a little wrapper:

CREATE TABLE foo (
key integer not null,
value text);

CREATE FUNCTION toValue(text) RETURNS text AS '

 SELECT COALESCE($1, '''');

' LANGUAGE 'SQL' IMMUTABLE;

CREATE INDEX i_foo1 ON foo(toValue(value));

And always be sure to use the function in the query:

SELECT *
FROM foo
WHERE toValue(value) = '';

For fun:

SET enable_seqscan to off;

EXPLAIN SELECT * FROM foo WHERE toValue(value) = 'Mike';

should produce an Index Scan....

HTH,

Mike Mascari
mascarm@mascari.com

Re: SET within a function?

From

Tom Lane

Date:

15 October 2003, 01:03:37

Edmund Dengler <edmundd@eSentire.com> writes:
> ... I have no real choice in this as there is no way to specify that
> NULL == NULL.

The conventional wisdom on this is that if you think you need NULL ==
NULL to yield true, then you are misusing NULL, and you'd better
reconsider your data representation.  The standard semantics for NULL
really do not support any other interpretation of NULL than "I don't
know what this value is".  If you are trying to use NULL to mean
something else, you will face nothing but misery.  Choose another
representation for whatever you do mean.

            regards, tom lane

Re: SET within a function?

From

Edmund Dengler

Date:

15 October 2003, 12:28:35

I guess it comes back to the semantics of NULL. As has been pointed out in
many a database course, what we mean by NULL changes, and how we want to
use NULL changes on circumstances.

Normally, when I am comparing rows, I do want NULL <> NULL. In
this specific instance, no value has been assigned to the specific
column for this row, so NULL is appropriate. However, there are cases
where I am trying to explicitely test for existence of a specific row
in the table, and in this case, I _do_ want a NULL == NULL type of
comparison. I could try and specify a dummy value (in this case, I could
put in -1), but then I am trying to create a second class of NULLs, and
this is usually not considered good design.

Note that as a prime example of how postgresql itself is not "consistent"
(in the strictest sense) is GROUP BY which treats NULL == NULL
(interesting side bar, is there a way to cause GROUP BY to treat NULLs as
not equal to each other?). In a theoretical question, how is this
justified if NULL should not equal to NULL (other than "it is in the
spec")?

Also, is there a particular reason for not having a strict equality
operator (or is it simply because it is not in the specification)?
Performance? No support from the back-end? Something else?

Regards,
Ed

On Wed, 15 Oct 2003, Tom Lane wrote:

> Edmund Dengler <edmundd@eSentire.com> writes:
> > ... I have no real choice in this as there is no way to specify that
> > NULL == NULL.
>
> The conventional wisdom on this is that if you think you need NULL ==
> NULL to yield true, then you are misusing NULL, and you'd better
> reconsider your data representation.  The standard semantics for NULL
> really do not support any other interpretation of NULL than "I don't
> know what this value is".  If you are trying to use NULL to mean
> something else, you will face nothing but misery.  Choose another
> representation for whatever you do mean.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: SET within a function?

From

Tom Lane

Date:

15 October 2003, 13:11:39

Edmund Dengler <edmundd@eSentire.com> writes:
> Normally, when I am comparing rows, I do want NULL <> NULL.

No, you still haven't got the point.  NULL is not equal to NULL, and
it is not not-equal-to NULL either.  The result of the comparison is
NULL, not true or false.  This is consistent with the interpretation
of NULL as "I don't know the value".  If you don't know what the value
is, you also don't know whether it is equal to some other value.

> Note that as a prime example of how postgresql itself is not "consistent"
> (in the strictest sense) is GROUP BY which treats NULL == NULL

Shrug ... the standard tells us to do that.  SQL has never been held up
as a model of consistency.

> Also, is there a particular reason for not having a strict equality
> operator (or is it simply because it is not in the specification)?

The existing operators *are* strict (which is defined as NULL in yields
NULL out).  You could build a set of non-strict comparison operators if
you had a mind to.  IIRC you would lose some potential hashtable
optimizations, but in the main it would work.

            regards, tom lane

Re: SET within a function?

From

Dennis Gearon

Date:

15 October 2003, 13:46:25

Tom Lane wrote:

>Edmund Dengler <edmundd@eSentire.com> writes:
>
>
>>Normally, when I am comparing rows, I do want NULL <> NULL.
>>
>>
>
>No, you still haven't got the point.  NULL is not equal to NULL, and
>it is not not-equal-to NULL either.  The result of the comparison is
>NULL, not true or false.  This is consistent with the interpretation
>of NULL as "I don't know the value".  If you don't know what the value
>is, you also don't know whether it is equal to some other value.
>
>

In these cases, it is recommended to either find a value which is out of
range, normally, and use that in place of NULL. For examples:

-1
10^32-1
"."
the_oldest_possible_date BC
the_furthest_away_date AD

Another way is to put an additional column in, but I think this still
has problems if you are trying to get a query to return values in a
column that has NULLs and you are querying against the column that has
the NULLs.

Re: SET within a function?

From

Stephan Szabo

Date:

16 October 2003, 00:03:28

On Wed, 15 Oct 2003, Edmund Dengler wrote:

> Note that as a prime example of how postgresql itself is not "consistent"
> (in the strictest sense) is GROUP BY which treats NULL == NULL
> (interesting side bar, is there a way to cause GROUP BY to treat NULLs as
> not equal to each other?). In a theoretical question, how is this
> justified if NULL should not equal to NULL (other than "it is in the
> spec")?

Because it's not defined in terms of equality. ;)  GROUP BY is defined by
value "distinct"ness, where distinct has a very specific definition in the
spec (which treats two NULL values as not distinct).

You might actually be able to find some way to use that to your advantage,
but I'm not sure how.