Re: SQL:2011 PERIODS vs Postgres Ranges? - Mailing list pgsql-hackers
From | Paul A Jungwirth |
---|---|
Subject | Re: SQL:2011 PERIODS vs Postgres Ranges? |
Date | |
Msg-id | CA+renyW6SYCc_vmN2HavjOx74m7KgPU4gzHimMq-F4hPnsNtTA@mail.gmail.com Whole thread Raw |
In response to | Re: SQL:2011 PERIODS vs Postgres Ranges? (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: SQL:2011 PERIODS vs Postgres Ranges?
(Michael Paquier <michael@paquier.xyz>)
|
List | pgsql-hackers |
Here is a patch rebasing on master (meant to be applied on top of my other multirange patch) and newly including UPDATE/DELETE FOR PORTION OF. FOR PORTION OF works on any table with a temporal primary key. It restricts the UPDATE/DELETE to the given time frame, and then if the affected row(s) had any "leftovers" above or below the targeted range, it INSERTs new rows to preserve the untouched intervals. I put the implementation into execModifyTable.c (mostly), which I think is the preferred approach. (There was a great message on -hackers a year or two ago lamenting how new contributors want to do all the work in the parse/analysis phase, so I tried to avoid that. I wish I could find the thread. I want to say Robert Haas wrote it, but it could have been anyone.) The executor is new territory for me, so even though this is WIP I'd love to have someone look at it now. I'm sure I'm doing all kinds of bad things re locking, transaction isolation, snapshots, etc. (There are some comments in the .patch for specific worries.) Also, since I have to do range calculations to handle the leftovers, this adds knowledge about a specific type (well a specific type of types) to the executor. I felt like that was mixing abstraction layers a bit, so perhaps someone will have an opinion/suggestion there. I could probably build an Expr earlier in the pipeline if I had a way to feed it the pre-UPDATE values of the row (suggestions anyone?), and then the executor could just evaluate it without knowing anything about ranges. I'm also not yet handling FDW tables, updatable views, or partitioned tables. Perhaps we don't support FDWs at all here, or leave it up to the FDW implementation to decide. I haven't thought much about updatable views yet.... For partitioned tables, I think I can add support without too much trouble; I'll give it a try soon. Possibly the temporal PK requirement rules out using this for all three (FDWs, updatable views, partitioned tables), at least for now. This is mostly "happy path" so there is probably some error handling to add. Previously a DELETE never updated indexes, and now I *do* update indexes if a DELETE has a FOR PORTION OF clause, so that they see the potential INSERTs. I don't know if that adds any risks around deadlock etc. I have a test verifying that we do fire triggers on the implicit INSERTs (which I think is what the spec requires and is what MariaDB and IBM DB2 do). Right now my triggers are firing in this order: BEFORE UPDATE/DELETE BEFORE INSERT BEFORE INSERT AFTER INSERT AFTER INSERT AFTER UPDATE/DELETE In MariaDB they fire in this order: BEFORE UPDATE/DELETE BEFORE INSERT AFTER INSERT BEFORE INSERT AFTER INSERT AFTER UPDATE/DELETE I haven't yet tested DB2 to see which order it uses. (It does fire the INSERT triggers though.) I don't know if the spec has an opinion (I've never found anything explicit, but it talks about "primary" vs "secondary" operations), and I'm not actually sure how to get MariaDB's order if we wanted to. Instead of implementing so much in the executor, I could *almost* have built FOR PORTION OF based on hidden triggers (sort of like how we implement FKs). Probably AFTER ROW triggers. Then I'd hardly have to touch the executor at all, and I wouldn't need to worry as much about locking/isolation/snapshots. I would just need to add something to the TriggerData struct giving the FOR PORTION OF bounds. (For an UPDATE I could pull this out of NEW.valid_at (or whatever you call your column), but for a DELETE that is NULL.) Basically a trigger needs to know (1) if the query had a FOR PORTION OF clause (2) what the bounds were. My triggers would be in C, so I think adding a new field to the struct is sufficient. Of course we could still expose the values to user-defined triggers if we wanted with e.g. TG_TARGET_INTERVAL. With a trigger-based implementation, I'd add the update/delete triggers whenever someone adds a temporal primary key. That means you can't use FOR PORTION OF on arbitrary ranges, which is a little sad, but no worse than the spec. And for now I was only supporting it on PK columns anyway. Also with a trigger-based implementation I don't think there is a way to force our hidden trigger to fire before user-defined triggers, which might be nice. But if that's not a problem with FKs then it's probably not a problem here. A trigger-based implementation also is less flexible in how it interacts with GENERATED columns, but I think an AFTER trigger would still do the right thing there. A trigger-based implementation should give us the same firing order as MariaDB (IIUC), which might be nice. So let me know if anyone thinks this would be better implemented as triggers instead. I'm kind of leaning that way myself to be honest. One weird issue I noticed concerns dealing with unbounded endpoints. According to the spec a PERIOD's endpoints must be not-NULL, so you have to use sentinel values like 3000-JAN-01. Oracle ignores this though, and since our own ranges already interpret a null bound to mean infinite, I think we should accept a non-sentinel way of saying "from now on" or "since the beginning". Right now I accept 'Infinity' and '-Infinity' in FOR PORTION OF, because those are familiar and valid values for timestamps/dates/floats. But I translate them to `NULL` to get proper range behavior. That's because to ranges, 'Infinity' is "right before" an upper null, and '-Infinity' is "right after" a lower null. For example: =# select tsrange('2020-01-01', null) - tsrange('2020-01-01', 'Infinity'); ?column? ------------- [infinity,) That will leave a lot of ugly records in your table if you don't take care to avoid it. I guess alternately I could let people say FOR PORTION OF FROM '2020-01-01' TO NULL. But that is less obvious, and it leaves the footgun still there. So right now I'm translating +/-Infinity into NULL to prevent the problem. Anyway, now that FOR PORTION OF works, I want to add CASCADE support to temporal FKs. I don't think that will take long. Then I'd like to bring in Vik Fearing's old patch adding PERIODs, and start supporting those too. Vik, do you have any objection or advice about that? Yours,
Attachment
pgsql-hackers by date: