Re: Why is this SELECT evaluated? - Mailing list pgsql-general

From Erik Wienhold
Subject Re: Why is this SELECT evaluated?
Date
Msg-id 1572287198.422215.1674938041031@office.mailbox.org
Whole thread Raw
List pgsql-general
> On 28/01/2023 20:29 CET Miles Elam <miles.elam@productops.com> wrote:
>
> Postgres v15
>
> Given this example of a conversion from a byte array to an int8
> masquerading as an "unsigned" int4
> 
>       SELECT (get_byte(bytes, byte_offset)::int8 << 24)
>              | (get_byte(bytes, byte_offset + 1) << 16)
>              | (get_byte(bytes, byte_offset + 2) << 8)
>              | (get_byte(bytes, byte_offset + 3))
>         FROM ( VALUES ('\x010000'::bytea, 0) ) b(bytes, byte_offset)
>        WHERE length(bytes) >= (4 + byte_offset)
>       ;
> 
> Why does this error result?
>
>     ERROR: index 3 out of valid range, 0..2
>     SQL state: 2202E
>
> I was under the impression that if the WHERE clause evaluated to
> false, the SELECT clause would not be evaluated.

Yes, according to https://www.postgresql.org/docs/15/sql-select.html#id-1.9.3.172.7
the WHERE clause is evaluated before the SELECT list.

> Why is get_byte(...) ever run in the first place even though length(bytes)
> is 3?

Postgres also applies constant folding which can be observed in the execution
plan (I removed the 4th get_byte call).  The WHERE clause is always false:

                    QUERY PLAN
    ------------------------------------------
     Result  (cost=0.00..0.01 rows=1 width=8)
       One-Time Filter: false
    (2 rows)

And with those constants the SELECT list is evaluated before the statement is
processed in the documented order.

Does the SQL standard say anything about constant folding and when or if it can
be applied?  I assume it's just an implementation detail of Postgres.  Without
the constant folding I would also expect that query to just return the empty set.

get_byte checks the index at runtime.  Adding a fourth byte (index is 0-based)
works as expected with index 3:

    test=# select get_byte('\x010203', 3);
    ERROR:  index 3 out of valid range, 0..2
    
    test=# select get_byte('\x01020304', 3);
     get_byte
    ----------
        4
    (1 row)
    
    test=# select get_byte('\x01020304', 4);
    ERROR:  index 4 out of valid range, 0..3

With a random bytea length the query is processed in the expected order and
returns one or zero rows but never raises an error:

    SELECT
      b,
      (get_byte(bytes, byte_offset)::int8 << 24)
        | (get_byte(bytes, byte_offset + 1) << 16)
        | (get_byte(bytes, byte_offset + 2) << 8)
        | (get_byte(bytes, byte_offset + 3))
    FROM (
      VALUES (substring('\x01020304'::bytea from 1 for (random() * 4)::int), 0)
    ) b(bytes, byte_offset)
    WHERE
      length(bytes) >= (4 + byte_offset);

--
Erik



pgsql-general by date:

Previous
From: Miles Elam
Date:
Subject: Why is this SELECT evaluated?
Next
From: Erik Wienhold
Date:
Subject: Re: Sequence vs UUID