Thread: SERIAL datatype column skipping values.

SERIAL datatype column skipping values.

From
Prabhat Sahu
Date:
Hi all,
Please check the below behavior for the "SERIAL" datatype.

postgres=# CREATE TABLE t1(c1 int, c2 serial);
CREATE TABLE
postgres=# insert into t1 values (generate_series(1,3));
INSERT 0 3
postgres=# insert into t1 values (generate_series(4,6));
INSERT 0 3
postgres=# select * from t1;
 c1 | c2
----+----
  1 |  1
  2 |  2
  3 |  3
  4 |  5
  5 |  6
  6 |  7
(6 rows)

In this above case, the serial column "c2" is skipping the value "4" in select output. 
Is this an expected behavior?

--

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com

Re: SERIAL datatype column skipping values.

From
Andreas Karlsson
Date:
On 3/11/20 11:15 AM, Prabhat Sahu wrote:
> Hi all,
> Please check the below behavior for the "SERIAL" datatype.
> 
> [...]
> 
> In this above case, the serial column "c2" is skipping the value "4" in 
> select output.
> Is this an expected behavior?

Curious, it seems like DEFAULT expressions of a table are executed an 
extra time if a set returning function is used like in your example. And 
the SERIAL type is implemented using DEFAULT.

On the other hand if you use "INSERT ... SELECT" the DEFAULT expression 
is only executed once per row inserted.

# CREATE FUNCTION test_default() RETURNS int LANGUAGE plpgsql AS $$
BEGIN
     RAISE NOTICE 'Ran test_default()';
     RETURN 42;
END
$$;
CREATE FUNCTION

# CREATE TABLE t2 (c1 int, c2 int DEFAULT test_default());
CREATE TABLE

# INSERT INTO t2 VALUES (generate_series(1,2));
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
INSERT 0 2

# INSERT INTO t2 SELECT generate_series(1,2);
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
INSERT 0 2

Andreas



Re: SERIAL datatype column skipping values.

From
Tom Lane
Date:
Andreas Karlsson <andreas@proxel.se> writes:
> On 3/11/20 11:15 AM, Prabhat Sahu wrote:
>> Is this an expected behavior?

> Curious, it seems like DEFAULT expressions of a table are executed an 
> extra time if a set returning function is used like in your example. And 
> the SERIAL type is implemented using DEFAULT.

Yeah, it's the same as if you do

regression=# select generate_series(1,2), test_default();
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
NOTICE:  Ran test_default()
 generate_series | test_default 
-----------------+--------------
               1 |           42
               2 |           42
(2 rows)

The generated plan is

regression=# explain verbose select generate_series(1,2), test_default();
                   QUERY PLAN                    
-------------------------------------------------
 ProjectSet  (cost=0.00..0.28 rows=2 width=8)
   Output: generate_series(1, 2), test_default()
   ->  Result  (cost=0.00..0.01 rows=1 width=0)
(3 rows)

and if you read nodeProjectSet.c you'll see that it needs to evaluate
the target list three times.  On the third iteration, generate_series()
returns isdone == ExprEndResult indicating that it has no more results,
so we don't emit an output tuple --- but we still run test_default()
while scanning the tlist.

Possibly the planner should try to avoid putting volatile expressions
into ProjectSet's tlist.  On the other hand, it's worked this way for
an awfully long time, so I wonder if anyone is relying on the behavior.
Even in versions before we used ProjectSet nodes, you still see three
calls to the volatile function.

Anyway, to get back to the OP's implied question, no you should never
assume that a SERIAL column's values won't have holes in the sequence.
Rolled-back transactions will have that effect in any case.

            regards, tom lane