Thread: [HACKERS] SET NOT NULL [NOT VALID / CONCURRENTLY]?
Hi hackers, I would be good if it would be possible to quickly set NOT NULL for an existing column in a table that have no rows where the column IS NULL and where there is a full index on the column allowing the logics to quickly understand there are no NULL values, and just have to take a quick lock on the table to prevent any modifications during the short time when the NOT NULL is set for the column. Currently if you want to set NOT NULL for a column in a huge table that's not doable without blocking all writes to the table for quite some time. Setting NOT NULL for a 100 million row table took 28 seconds locally on my machine. Is anyone working on fixing this for PostgreSQL 10? Joel Jacobson Trustly
On 21 December 2016 at 16:48, Joel Jacobson <joel@trustly.com> wrote: > Hi hackers, > > I would be good if it would be possible to quickly set NOT NULL for an > existing column in a table > that have no rows where the column IS NULL and where there is a full > index on the column > allowing the logics to quickly understand there are no NULL values, > and just have to take a quick > lock on the table to prevent any modifications during the short time > when the NOT NULL > is set for the column. > > Currently if you want to set NOT NULL for a column in a huge table > that's not doable without blocking all writes to the table for quite > some time. > Setting NOT NULL for a 100 million row table took 28 seconds locally > on my machine. > > Is anyone working on fixing this for PostgreSQL 10? Not as far as I know. IMO this and other similar cases should all be handled the same way: create the constraint NOT VALID, then VALIDATE it while holding a weak lock that only blocks concurrent schema changes. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Dec 21, 2016 at 4:24 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> Is anyone working on fixing this for PostgreSQL 10? > > Not as far as I know. > > IMO this and other similar cases should all be handled the same way: > create the constraint NOT VALID, then VALIDATE it while holding a weak > lock that only blocks concurrent schema changes. Sounds like a good approach. Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK feature to improve concurrency, we would be very interested in also sponsoring this feature, as it would mean a great lot to us. I don't know if this is the right forum trying to find someone/some company to sign up for the task, please let me know if I should mail to some other list. Thanks. Joel Jacobson Trustly
On 21 December 2016 at 19:01, Joel Jacobson <joel@trustly.com> wrote: > Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK > feature to improve concurrency, > we would be very interested in also sponsoring this feature, as it > would mean a great lot to us. > I don't know if this is the right forum trying to find someone/some > company to sign up for the task, > please let me know if I should mail to some other list. Thanks. You'll probably get mail off list. For what it's worth, there's a bit of a complexity here. PostgreSQL doesn't model NOT NULL as a true CONSTRAINT. Instead it's a column attribute. I suspect we would need to change that in order to allow a NOT VALID NOT NULL constraint to be created. That's at least partly why the docs say that "option NOT VALID [...] is currently only allowed for foreign key and CHECK constraints". Note that "[VALIDATE] acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered" so it's already suitable for what you need. The challenge is making it possible to create a NOT VALID constraint for NOT NULL. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
If you are fully confident you have no NULL values, e.g. if you have all your logics in db functions and you validate all INSERTs to a table won't pass any NULL values, and you have checked all the rows in a table are NOT NULL for the column, would it be completely crazy to just set pg_attribute.attnotnull to TRUE for the column? Is anything else happening "under the hood" than just locking all rows and verifying there are no NULL rows, and then setting attnotnull to TRUE? On Wed, Dec 21, 2016 at 6:37 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 21 December 2016 at 19:01, Joel Jacobson <joel@trustly.com> wrote: > >> Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK >> feature to improve concurrency, >> we would be very interested in also sponsoring this feature, as it >> would mean a great lot to us. >> I don't know if this is the right forum trying to find someone/some >> company to sign up for the task, >> please let me know if I should mail to some other list. Thanks. > > You'll probably get mail off list. > > For what it's worth, there's a bit of a complexity here. PostgreSQL > doesn't model NOT NULL as a true CONSTRAINT. Instead it's a column > attribute. I suspect we would need to change that in order to allow a > NOT VALID NOT NULL constraint to be created. > > That's at least partly why the docs say that "option NOT VALID [...] > is currently only allowed for foreign key and CHECK constraints". > > Note that "[VALIDATE] acquires only a SHARE UPDATE EXCLUSIVE lock on > the table being altered" so it's already suitable for what you need. > The challenge is making it possible to create a NOT VALID constraint > for NOT NULL. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Joel Jacobson Mobile: +46703603801 Trustly.com | Newsroom | LinkedIn | Twitter
Attached is the function SET_NOT_NULL(_Schema name, _Table name, _Column name) which does the following:
1. LOCK TABLE %I.%I IN ACCESS EXCLUSIVE MODE
just like the normal DDL commands would do
2. SELECT EXISTS (SELECT 1 FROM %I.%I WHERE %I IS NULL)
which is fast if there is an index on the column
3. UPDATE pg_catalog.pg_attribute SET attnotnull = TRUE
WHERE attrelid = %L::oid
AND attname = %L
Pragmatically, would this be a safe approach?
On Wed, Dec 21, 2016 at 6:53 PM, Joel Jacobson <joel@trustly.com> wrote:
If you are fully confident you have no NULL values,
e.g. if you have all your logics in db functions and you validate all
INSERTs to a table won't pass any NULL values,
and you have checked all the rows in a table are NOT NULL for the column,
would it be completely crazy to just set pg_attribute.attnotnull to
TRUE for the column?
Is anything else happening "under the hood" than just locking all rows
and verifying there are no NULL rows, and then setting attnotnull to
TRUE?
On Wed, Dec 21, 2016 at 6:37 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 21 December 2016 at 19:01, Joel Jacobson <joel@trustly.com> wrote:
>
>> Similar to what we (Trustly) did when we sponsored the FOR KEY LOCK
>> feature to improve concurrency,
>> we would be very interested in also sponsoring this feature, as it
>> would mean a great lot to us.
>> I don't know if this is the right forum trying to find someone/some
>> company to sign up for the task,
>> please let me know if I should mail to some other list. Thanks.
>
> You'll probably get mail off list.
>
> For what it's worth, there's a bit of a complexity here. PostgreSQL
> doesn't model NOT NULL as a true CONSTRAINT. Instead it's a column
> attribute. I suspect we would need to change that in order to allow a
> NOT VALID NOT NULL constraint to be created.
>
> That's at least partly why the docs say that "option NOT VALID [...]
> is currently only allowed for foreign key and CHECK constraints".
>
> Note that "[VALIDATE] acquires only a SHARE UPDATE EXCLUSIVE lock on
> the table being altered" so it's already suitable for what you need.
> The challenge is making it possible to create a NOT VALID constraint
> for NOT NULL.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services--
Joel Jacobson
Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter
Attachment
On Wed, Dec 21, 2016 at 7:55 PM, Joel Jacobson <joel@trustly.com> wrote:
--
Attached is the function SET_NOT_NULL(_Schema name, _Table name, _Column name) which does the following:1. LOCK TABLE %I.%I IN ACCESS EXCLUSIVE MODEjust like the normal DDL commands would do2. SELECT EXISTS (SELECT 1 FROM %I.%I WHERE %I IS NULL)which is fast if there is an index on the column3. UPDATE pg_catalog.pg_attribute SET attnotnull = TRUEWHERE attrelid = %L::oidAND attname = %LPragmatically, would this be a safe approach?
Hmm, I don't see a problem with it.
--