Re: WIP: System Versioned Temporal Table - Mailing list pgsql-hackers

From Surafel Temesgen
Subject Re: WIP: System Versioned Temporal Table
Date
Msg-id CALAY4q_ZZFTqDQ_G-i5FHM7Df2t8eDTZZuD0uLrv3WqNKDu+7g@mail.gmail.com
Whole thread Raw
In response to Re: WIP: System Versioned Temporal Table  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Hi,
Thank you very much looking at it
On Tue, Jan 7, 2020 at 1:33 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
Hello.

Isn't this patch somehow broken?


First, I tried to create a temporal table.

When I used timestamptz as the type of versioning columns, ALTER,
CREATE commands ended with server crash.

 "CREATE TABLE t (a int, s timestamptz GENERATED ALWAYS AS ROW START, e timestamptz GENERATED ALWAYS AS ROW END);"
 (CREATE TABLE t (a int);)
 "ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START"
 "ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamptz GENERATED ALWAYS AS ROW END"

If I added the start/end timestamp columns to an existing table, it
returns uncertain error.

 ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START;
 ERROR:  column "s" contains null values
 ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamp(6) GENERATED ALWAYS AS ROW END;
 ERROR:  column "s" contains null values


When I defined only start column, SELECT on the table crashed.

 "CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START);"
 "SELECT * from t;"
 (crashed)

The following command ended with ERROR which I cannot understand the
cause, but I expected the command to be accepted.


Fixed

  ALTER TABLE t ADD COLUMN start timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN end timestamp(6) GENERATED ALWAYS AS ROW END;
  ERROR:  syntax error at or near "end"


end is a keyword
 
I didin't examined further but the syntax part doesn't seem designed
well, and the body part seems vulnerable to unexpected input.


I ran a few queries:

SELECT * shows the timestamp columns, don't we need to hide the period
timestamp columns from this query?


The sql standard didn't dictate hiding the column but i agree hiding it by
default is good thing because this columns are used by the system
to classified the data and not  needed in user side frequently. I can
change to that if we have consensus
 
I think UPDATE needs to update the start timestamp, but it doesn't. As
the result the timestamps doesn't represent the correct lifetime of
the row version and we wouldn't be able to pick up correct versions of
a row that exprerienced updates. (I didn't confirmed that because I
couldn't do "FOR SYSTEM_TIME AS OF" query due to syntax error..)


Right. It have to set both system time for inserted row and set row end time for
deleted row. I fix it
 
(Sorry in advance for possible pointless comments due to my lack of
access to the SQL11 standard.)  If we have the period-timestamp
columns before the last two columns, INSERT in a common way on the
table fails, which doesn't seem to me to be expected behavior:

 CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START, e timestamp(6) GENERATED ALWAYS AS ROW END, a int) WITH SYSTEM VERSIONING;
 INSERT INTO t (SELECT a FROM generate_series(0, 99) a);
 ERROR:  column "s" is of type timestamp without time zone but expression is of type integer


Its the same without the patch too
CREATE TABLE t (s timestamptz , e timestamptz, a int);
INSERT INTO t (SELECT a FROM generate_series(0, 99) a);
ERROR:  column "s" is of type timestamp with time zone but expression is of type integer
LINE 1: INSERT INTO t (SELECT a FROM generate_series(0, 99) a);
 
Some queries using SYSTEM_TIME which I think should be accepted ends
with error. Is the grammar part missing something?

 SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';
 ERROR:  syntax error at or near "system_time"
 LINE 1: SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';

 SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00' AND '2020-01-08 0:00:00';
 ERROR:  syntax error at or near "system_time"
 LINE 1: SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00'...


fixed


Other random comments (sorry for it not being comprehensive):

The patch at worst loops over all columns at every parse time. It is
quite ineffecient if there are many columns. We can store the column
indexes in relcache.

 
but its only for system versioned table.
 
If I'm not missing anything, alter table doesn't properly modify
existing data in the target table. AddSystemVersioning should fill in
start/end_timestamp with proper values and DropSystemVersioning shuold
remove rows no longer useful.


fixed


+makeAndExpr(Node *lexpr, Node *rexpr, int location)

 I believe that planner flattenes nested AND/ORs in
 eval_const_expressions(). Shouldn't we leave the work to the planner?


 
filter clause is added using makeAndExpr and planner can flat that if it sees fit
 
+makeConstraint(ConstrType type)

We didn't use such a function to make that kind of nodes. Maybe we
should use makeNode directly, or we need to change similar coding into
that using the function. Addition to that, setting .location to 1 is
wrong. "Unknown" location is -1.

done


Separately from that, temporal clauses is not restriction of a
table. So it seems wrong to me to use constraint mechamism for this
purpose.


we use constraint mechanism for similar thing like default value and
generated column
 
+makeSystemColumnDef(char *name)

"system column (or attribute)" is a column specially treated outside
of tuple descriptor. The temporal-period columns are not system
columns in that sense.

changed to makeTemporalColumnDef and use timestamptz for all
versioning purpose.

Attach is the patch that fix the above and uses Vik's regression test

regards
Surafel
Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: shared-memory based stats collector
Next
From: Surafel Temesgen
Date:
Subject: Re: WIP: System Versioned Temporal Table