Thread: Indexing null dates

Indexing null dates

From
"John Sidney-Woollett"
Date:
Hi

I'm building a web app where changes to customer orders are logged in the
following table, and I worried about the time that it will take to locate
records that need further processing/actioning. Here's the table:

create table CUSTOMER.WCCustOrderStatusLog (
  WCCustOrderID      integer,
  WCOrderStatusID    integer,
  -- date/time at which some process acknowledged the status
  -- change, and took the appropriate action, like raising
  -- an e-mail confirmation
  Acknowledged       timestamp,
  -- stamped with now() when the e-mailer process begins to process
  -- the order/email message. When complete Processing goes back
  -- to null and Acknowledged is stamped with now()
  Processing         timestamp,
  -- date status change occured
  LastUpdated        timestamp,
  primary key (WCCustOrderID, WCOrderStatusID)
) without oids;

I need a separate e-mailing process to locate orders (using the above
table) that have WCOrderStatusID = (4,5 or 99) AND a null value for
"Acknowledged" and "Processing"

I have a function which provides the next order number for processing:

CREATE OR REPLACE FUNCTION CUSTOMER.GetNextCustEmailAck(integer) RETURNS
integer AS '
  -- locates the next order number that requires an e-mail confirmation
  -- to be sent - this generally occurs as the order passes from
  -- one status to the next. These status charges are recorded
  -- within the WCCustOrderStatusLog table
DECLARE
  pStatusID     ALIAS FOR $1;

  vCustOrderID  integer := null;
BEGIN
  -- locate the next order
  select WCCustOrderID into vCustOrderID
  from CUSTOMER.WCCustOrderStatusLog
  where WCOrderStatusID = pStatusID
  and Acknowledged is null
  and Processing is null
  for update
  limit 1;

  if (vCustOrderID is not null) then
    -- mark the record as being processed
    update CUSTOMER.WCCustOrderStatusLog
      set Processing = now()
      where WCCustOrderID = vCustOrderID
      and WCOrderStatusID = pStatusID;
  end if;

  -- return -1 to indicate failure to locate order
  if (vCustOrderID is null) then
    return -1;
  end if;

  -- return the order number
  return vCustOrderID;
END;
' LANGUAGE 'plpgsql';

My question is whether postgres can index null values, and if not, do I
have to accept a full table scan when locating records. Or can I mitigate
this by the use either of date values that signify null, but are non-null
(eg 1-Jan-1970). Or am I better off adding extra flag fields (integer)
which always have a Y/N (1,0) value corresponding to whether the
appropriate date field is null or not, and then use these to locate the
records.

eg

create table CUSTOMER.WCCustOrderStatusLog (
  WCCustOrderID      integer,
  WCOrderStatusID    integer,
  -- New flag field
  AcknowledgedIsNull integer,
  Acknowledged       timestamp,
  -- New flag field
  ProcessingIsNull   integer,
  Processing         timestamp,
  -- date status change occured
  LastUpdated        timestamp,
  primary key (WCCustOrderID, WCOrderStatusID)
) without oids;

Also will adding an index to WCOrderStatusID reduce the cost of the
sequential scan?

Or is there some other strategy that would be better?

Thanks.

John Sidney-Woollett

Re: Indexing null dates

From
Tom Lane
Date:
"John Sidney-Woollett" <johnsw@wardbrook.com> writes:
> [ needs to make this fast: ]
>   -- locate the next order
>   select WCCustOrderID into vCustOrderID
>   from CUSTOMER.WCCustOrderStatusLog
>   where WCOrderStatusID = pStatusID
>   and Acknowledged is null
>   and Processing is null
>   for update
>   limit 1;

> My question is whether postgres can index null values, and if not, do I
> have to accept a full table scan when locating records.

It indexes them, but "is null" is not an indexable operator, so you
can't directly solve the above with a 3-column index.  What you can do
instead is use a partial index, for instance

create index i on CUSTOMER.WCCustOrderStatusLog (WCOrderStatusID)
where Acknowledged is null and Processing is null;

            regards, tom lane

Re: Indexing null dates

From
"John Sidney-Woollett"
Date:
Tom Lane said:
> "John Sidney-Woollett" <johnsw@wardbrook.com> writes:
>> [ needs to make this fast: ]
>>   -- locate the next order
>>   select WCCustOrderID into vCustOrderID
>>   from CUSTOMER.WCCustOrderStatusLog
>>   where WCOrderStatusID = pStatusID
>>   and Acknowledged is null
>>   and Processing is null
>>   for update
>>   limit 1;
>
>> My question is whether postgres can index null values, and if not, do I
>> have to accept a full table scan when locating records.
>
> It indexes them, but "is null" is not an indexable operator, so you
> can't directly solve the above with a 3-column index.  What you can do
> instead is use a partial index, for instance
>
> create index i on CUSTOMER.WCCustOrderStatusLog (WCOrderStatusID)
> where Acknowledged is null and Processing is null;

That's a very nifty trick and exactly the sort of answer I was after!

Many thanks

John Sidney-Woollett

Re: Indexing null dates

From
Bruce Momjian
Date:
John Sidney-Woollett wrote:
> Tom Lane said:
> > "John Sidney-Woollett" <johnsw@wardbrook.com> writes:
> >> [ needs to make this fast: ]
> >>   -- locate the next order
> >>   select WCCustOrderID into vCustOrderID
> >>   from CUSTOMER.WCCustOrderStatusLog
> >>   where WCOrderStatusID = pStatusID
> >>   and Acknowledged is null
> >>   and Processing is null
> >>   for update
> >>   limit 1;
> >
> >> My question is whether postgres can index null values, and if not, do I
> >> have to accept a full table scan when locating records.
> >
> > It indexes them, but "is null" is not an indexable operator, so you
> > can't directly solve the above with a 3-column index.  What you can do
> > instead is use a partial index, for instance
> >
> > create index i on CUSTOMER.WCCustOrderStatusLog (WCOrderStatusID)
> > where Acknowledged is null and Processing is null;
>
> That's a very nifty trick and exactly the sort of answer I was after!

Yes, nifty.  CREATE INDEX docs updated with:

+    <literal>NULL</> values are not indexed by default.  The best way
+    to index <literal>NULL</> values is to create a partial index using
+    an <literal>IS NULL</> comparison.  <literal>IS NULL</> is more
+    of a function call than a value comparison, and this is why a partial
+    index works.

Full patch attached.  Thanks.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/create_index.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/create_index.sgml,v
retrieving revision 1.44
diff -c -c -r1.44 create_index.sgml
*** doc/src/sgml/ref/create_index.sgml    29 Nov 2003 19:51:38 -0000    1.44
--- doc/src/sgml/ref/create_index.sgml    20 Apr 2004 00:54:29 -0000
***************
*** 66,72 ****
      When the <literal>WHERE</literal> clause is present, a
      <firstterm>partial index</firstterm> is created.
      A partial index is an index that contains entries for only a portion of
!     a table, usually a portion that is somehow more interesting than the
      rest of the table. For example, if you have a table that contains both
      billed and unbilled orders where the unbilled orders take up a small
      fraction of the total table and yet that is an often used section, you
--- 66,72 ----
      When the <literal>WHERE</literal> clause is present, a
      <firstterm>partial index</firstterm> is created.
      A partial index is an index that contains entries for only a portion of
!     a table, usually a portion that is more useful for indexing than the
      rest of the table. For example, if you have a table that contains both
      billed and unbilled orders where the unbilled orders take up a small
      fraction of the total table and yet that is an often used section, you
***************
*** 77,85 ****
    </para>

    <para>
      The expression used in the <literal>WHERE</literal> clause may refer
!     only to columns of the underlying table (but it can use all columns,
!     not only the one(s) being indexed).  Presently, subqueries and
      aggregate expressions are also forbidden in <literal>WHERE</literal>.
      The same restrictions apply to index fields that are expressions.
    </para>
--- 77,93 ----
    </para>

    <para>
+    <literal>NULL</> values are not indexed by default.  The best way
+    to index <literal>NULL</> values is to create a partial index using
+    an <literal>IS NULL</> comparison.  <literal>IS NULL</> is more
+    of a function call than a value comparison, and this is why a partial
+    index works.
+   </para>
+
+   <para>
      The expression used in the <literal>WHERE</literal> clause may refer
!     only to columns of the underlying table, but it can use all columns,
!     not just the ones being indexed.  Presently, subqueries and
      aggregate expressions are also forbidden in <literal>WHERE</literal>.
      The same restrictions apply to index fields that are expressions.
    </para>

Re: Indexing null dates

From
Greg Stark
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Yes, nifty.  CREATE INDEX docs updated with:
>
> +    <literal>NULL</> values are not indexed by default.  The best way
> +    to index <literal>NULL</> values is to create a partial index using
> +    an <literal>IS NULL</> comparison.  <literal>IS NULL</> is more
> +    of a function call than a value comparison, and this is why a partial
> +    index works.

Uh, this is wrong.

NULLs are indexed. It's just that IS NULL cannot take advantage of it due to
technical details. These are NOT the same thing.

Saying "NULLs are not indexed" will confuse people because it will make them
think that they're not present in the index at all which is what Oracle does.

That has real consequences on queries. The most obvious being that

 SELECT * FROM foo ORDER BY bar

cannot take advantage of an index on bar. Oracle programmers are accustomed to
having to had a "WHERE bar IS NOT NULL" or else live with the full table scan
and sort.

--
greg

Re: Indexing null dates

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> +    <literal>NULL</> values are not indexed by default.

This is quite incorrect.  The nulls *are* indexed (at least in btree
indexes); the issue is whether there is any way to use the index to
search for them.  I do not think it helps anyone for the documentation
to get this basic point wrong, even if the distinction is subtle.

            regards, tom lane

Re: Indexing null dates

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > +    <literal>NULL</> values are not indexed by default.
>
> This is quite incorrect.  The nulls *are* indexed (at least in btree
> indexes); the issue is whether there is any way to use the index to
> search for them.  I do not think it helps anyone for the documentation
> to get this basic point wrong, even if the distinction is subtle.

OK, docs updated with:

   Indexes can not be used with <literal>IS NULL</> clauses by default.
   The best way to use indexes in such cases is to create a partial index
   using an <literal>IS NULL</> comparison.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073