Thread: New feature "... ALTER CONSTRAINT ... VERIFY USING INDEX"
Hello hackers, I want to implement a new feature that allows to decrease time when table is under ExclusiveLock on ALTERing new constraints NOT NULL or CHECK. In Postgres9.1 a new feature was implemented [1] for adding PK and UNIQUE constraints using indexes created concurrently, but constraints NOT NULL and CHECK still require full seqscan of a table. New CHECK constraint allows "NOT VALID" option but VALIDATE CONSTRAINT still does seqscan (with RowExclusiveLock, but for big and constantly updatable table it is still awful). It is possible to find wrong rows in a table without seqscan if there is an index with a predicate allows to find such rows. There is no sense what columns it has since it is enough to check whether index_getnext for it returns NULL (table is OK) or any tuple (table has wrong rows). Index must be BTREE (since it is not supposed to hold any data), valid and has a predicate depending on a constraint type: * for NOT NULL constraint predicate must be "(col) IS NULL"; * for CHECK constraint predicate must be "(expr) IS DISTINCT FROM TRUE" (to cover both "(expr) IS NULL" and "NOT(expr)" cases). I propose the next syntax ("action" in "ALTER TABLE"): ALTER [ COLUMN ] column_name SET NOT NULL [ VERIFY USING INDEX index_name ] and (all rows except the last one was got from a documentation[2]) ADD CONSTRAINT constraint_name CHECK ( expression ) [ NO INHERIT ] [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED| INITIALLY IMMEDIATE ] [ VERIFY USING INDEX index_name ] The new nonreserved keyword "VERIFY" is necessary to avoid people be confused what "USING INDEX" really does, and to show the index is not needed after DDL finishes (verifies table) correctly. I wasn't succeed in my search for similar feature (and such syntax) in an SQL standard or in any existing DBMS. I've done some research to be familiar with the source code (and be ready to get advice) and have a working version for CHECK constraint. My patch has a WIP state since I don't know how the community will meet proposal feature/syntax, and some work is necessary for NOT NULL, documentation and psql. So I'm ready for a discussion. P.S.: for NOT NULL it'll allow to do something like: ALTER TABLE tablename ADD COLUMN newcol data_type; ALTER TABLE tablename ALTER COLUMN newcol SET DEFAULT default_expr; ALTER TABLE tablename ADD CONSTRAINT tablename_nonnull_chk CHECK (newcol IS NOT NULL) NOT VALID; --optional CREATE INDEX CONCURRENTLY tablename_chk ON tablename (id DESC) -- PK col(s), "DESC" is to begin from the oldest rows WHERE newcol IS NULL; -- query for repeat UPDATE tablename SET newcol=DEFAULT WHERE id IN ( SELECT id FROM tablename WHERE newcol IS NULL ORDER BY id DESC LIMIT 100000 FOR UPDATE SKIP LOCKED ); -- repeat above command until 0 rows is affected. ALTER TABLE tablename ALTER COLUMN newcol SET NOT NULL VERIFY USING INDEX tablename_chk; DROP INDEX CONCURRENTLY tablename_chk; -- if the command above succeed ALTER TABLE tablename DROP CONSTRAINT tablename_nonnull_chk; --optional [1] http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN110279 [2] http://www.postgresql.org/docs/9.5/static/sql-createtable.html -- Best regards, Vitaly Burovoy
On 8 January 2016 at 12:49, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
--
In Postgres9.1 a new feature was implemented [1] for adding PK and
UNIQUE constraints using indexes created concurrently, but constraints
NOT NULL and CHECK still require full seqscan of a table. New CHECK
constraint allows "NOT VALID" option but VALIDATE CONSTRAINT still
does seqscan (with RowExclusiveLock, but for big and constantly
updatable table it is still awful).
It is possible to find wrong rows in a table without seqscan if there
is an index with a predicate allows to find such rows. There is no
sense what columns it has since it is enough to check whether
index_getnext for it returns NULL (table is OK) or any tuple (table
has wrong rows).
You avoid a full seqscan by creating an index which also does a full seq scan.
How does this help? The lock and scan times are the same.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/8/16, Simon Riggs <simon@2ndquadrant.com> wrote: > On 8 January 2016 at 12:49, Vitaly Burovoy <vitaly.burovoy@gmail.com> > wrote: > > >> In Postgres9.1 a new feature was implemented [1] for adding PK and >> UNIQUE constraints using indexes created concurrently, but constraints >> NOT NULL and CHECK still require full seqscan of a table. New CHECK >> constraint allows "NOT VALID" option but VALIDATE CONSTRAINT still >> does seqscan (with RowExclusiveLock, but for big and constantly >> updatable table it is still awful). >> >> It is possible to find wrong rows in a table without seqscan if there >> is an index with a predicate allows to find such rows. There is no >> sense what columns it has since it is enough to check whether >> index_getnext for it returns NULL (table is OK) or any tuple (table >> has wrong rows). >> > > You avoid a full seqscan by creating an index which also does a full seq > scan. > > How does this help? The lock and scan times are the same. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > <http://www.2ndquadrant.com/> > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > I avoid not a full seqscan, but a time when table is under ExclusiveLock: index can be build concurrently without locking table. -- Best regards, Vitaly Burovoy
On 8 January 2016 at 13:13, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
--
On 1/8/16, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 January 2016 at 12:49, Vitaly Burovoy <vitaly.burovoy@gmail.com>
> wrote:
>
>
>> In Postgres9.1 a new feature was implemented [1] for adding PK and
>> UNIQUE constraints using indexes created concurrently, but constraints
>> NOT NULL and CHECK still require full seqscan of a table. New CHECK
>> constraint allows "NOT VALID" option but VALIDATE CONSTRAINT still
>> does seqscan (with RowExclusiveLock, but for big and constantly
>> updatable table it is still awful).
>>
>> It is possible to find wrong rows in a table without seqscan if there
>> is an index with a predicate allows to find such rows. There is no
>> sense what columns it has since it is enough to check whether
>> index_getnext for it returns NULL (table is OK) or any tuple (table
>> has wrong rows).
>>
>
> You avoid a full seqscan by creating an index which also does a full seq
> scan.
>
> How does this help? The lock and scan times are the same.
I avoid not a full seqscan, but a time when table is under
ExclusiveLock: index can be build concurrently without locking table.
That is exactly what ADD ...NOT VALID and VALIDATE already does, as of 9.4.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/8/16, Simon Riggs <simon@2ndquadrant.com> wrote: > On 8 January 2016 at 13:13, Vitaly Burovoy <vitaly.burovoy@gmail.com> > wrote: > >> On 1/8/16, Simon Riggs <simon@2ndquadrant.com> wrote: >> > On 8 January 2016 at 12:49, Vitaly Burovoy <vitaly.burovoy@gmail.com> >> > wrote: >> > >> > >> >> In Postgres9.1 a new feature was implemented [1] for adding PK and >> >> UNIQUE constraints using indexes created concurrently, but constraints >> >> NOT NULL and CHECK still require full seqscan of a table. New CHECK >> >> constraint allows "NOT VALID" option but VALIDATE CONSTRAINT still >> >> does seqscan (with RowExclusiveLock, but for big and constantly >> >> updatable table it is still awful). >> >> >> >> It is possible to find wrong rows in a table without seqscan if there >> >> is an index with a predicate allows to find such rows. There is no >> >> sense what columns it has since it is enough to check whether >> >> index_getnext for it returns NULL (table is OK) or any tuple (table >> >> has wrong rows). >> >> >> > >> > You avoid a full seqscan by creating an index which also does a full >> > seq >> > scan. >> > >> > How does this help? The lock and scan times are the same. >> >> I avoid not a full seqscan, but a time when table is under >> ExclusiveLock: index can be build concurrently without locking table. > > > That is exactly what ADD ...NOT VALID and VALIDATE already does, as of > 9.4. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > <http://www.2ndquadrant.com/> > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > Hmm... It really does. I was confused by a line in ATExecValidateConstraint conrel = heap_open(ConstraintRelationId, RowExclusiveLock); but validateCheckConstraint doesn't do anything for applying the lock to a row. What about SET NOT NULL constraints? There is no VALIDATE CONSTRAINT for it. -- Best regards, Vitaly Burovoy
On 8 January 2016 at 14:14, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
--
What about SET NOT NULL constraints? There is no VALIDATE CONSTRAINT for it.
Sounds like a useful addition.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs wrote: > On 8 January 2016 at 14:14, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > > > What about SET NOT NULL constraints? There is no VALIDATE CONSTRAINT for > > it. > > Sounds like a useful addition. Yes. In order to make it a reality you need to make the NOT NULL constraints appear in pg_constraint. Years ago I wrote a patch to do that, which was very close to done. It would be really cool if Vitaly or someone else could look into that patch, update it and get it ready for commit. If someone is interested, I can send the patch along. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/8/16, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Simon Riggs wrote: >> On 8 January 2016 at 14:14, Vitaly Burovoy <vitaly.burovoy@gmail.com> >> wrote: >> >> > What about SET NOT NULL constraints? There is no VALIDATE CONSTRAINT >> > for >> > it. >> >> Sounds like a useful addition. > > Yes. In order to make it a reality you need to make the NOT NULL > constraints appear in pg_constraint. Years ago I wrote a patch to do > that, which was very close to done. It would be really cool if Vitaly > or someone else could look into that patch, update it and get it ready > for commit. > > If someone is interested, I can send the patch along. > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > I guess you are talking about the other thread[1]. I'm not sure I have enough experience in Postgres hacking to start working on it right now, but I'll have a look. IMO the best way is to raise that topic by a letter with summary what troubles are left there (in addition to a rebasing one). [1] http://www.postgresql.org/message-id/flat/20110707213401.GA27098@alvh.no-ip.org -- Best regards, Vitaly Burovoy
Vitaly Burovoy wrote: > I guess you are talking about the other thread[1]. > I'm not sure I have enough experience in Postgres hacking to start > working on it right now, but I'll have a look. > IMO the best way is to raise that topic by a letter with summary what > troubles are left there (in addition to a rebasing one). > > > [1] http://www.postgresql.org/message-id/flat/20110707213401.GA27098@alvh.no-ip.org Here's a newer thread: https://www.postgresql.org/message-id/1343682669-sup-2532%40alvh.no-ip.org which includes some points needing additional work. In my local tree I have a branch that maybe is not the same as the last patch I posted in that thread, because the last commit has a newer date. Here I attach what that branch has. It applies cleanly on top of 03bda4535ee119d3. I don't remember if it was in compilable state at the time I abandoned it. It needs some work to rebase, but from a quick experimental "git merge" I just ran, it's not too bad. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services