Thread: Is it possible to speed up addition of "not null"?
I have 8.3 database with non-trivial table (~ 80million rows, but the rows are just two integers). I need to add not null on one of the columns, but it seems to require full table scan. I tried with some indexes, but I can't get the time to something reasonable, so here is my question: is there any way I could make the "not null" constraint *fast*? i need it to be able to run pg_reorg on this table, which requires pkey, or unique index on not-null column. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
On Thu, Feb 02, 2012 at 12:48:04PM +0100, hubert depesz lubaczewski wrote > I need to add not null on one of the columns, but it seems to require > full table scan. Of course it does. If you want a constraint added to the table, the first thing it ought to do is check that all your data actually matches the constraint. If not, your constraint doesn't work. > I tried with some indexes, but I can't get the time to something > reasonable, so here is my question: is there any way I could make the > "not null" constraint *fast*? Not faster than a table scan, no. How fast do you want, though? It doesn't sound like an unreasonably large table. Have you done any tuning? Do you have adequate hardware? Maybe faster would be to create a new table with the schema you want, and then use COPY to pull the data out of the old table and into the new table. (It sounds like what you really want is a primary key, however, and that's going to be faster if you build the unique index after the data's all loaded. A -- Andrew Sullivan ajs@crankycanuck.ca
On Thu, Feb 02, 2012 at 07:26:15AM -0500, Andrew Sullivan wrote: > > I need to add not null on one of the columns, but it seems to require > > full table scan. > Of course it does. If you want a constraint added to the table, the > first thing it ought to do is check that all your data actually > matches the constraint. If not, your constraint doesn't work. Sure. But at least theoretically, it could use index - for example, if I had index "where column is null". > Not faster than a table scan, no. How fast do you want, though? It > doesn't sound like an unreasonably large table. Have you done any > tuning? Do you have adequate hardware? oh yes. very much so. But this should be nearly instantenous. This machine is very busy. In the low-traffic moments we have ~ 5k transactions per second. > Maybe faster would be to create a new table with the schema you want, > and then use COPY to pull the data out of the old table and into the > new table. (It sounds like what you really want is a primary key, > however, and that's going to be faster if you build the unique index > after the data's all loaded. This table is concurrently used. Taking it offline is not an option. Of course, I could: 1. add triggers to log changes 2. create side table with proper schema 3. copy data to side table 4. apply changes 5. swap tables but this seems like overly complex thing, while simple index theoretically could solve the problem. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
* hubert depesz lubaczewski: > I tried with some indexes, but I can't get the time to something > reasonable, so here is my question: is there any way I could make the > "not null" constraint *fast*? You coul patch pg_attribute directly. I'm not sure if that's still safe in current versions, though. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On Thu, Feb 02, 2012 at 02:08:51PM +0000, Florian Weimer wrote: > * hubert depesz lubaczewski: > > > I tried with some indexes, but I can't get the time to something > > reasonable, so here is my question: is there any way I could make the > > "not null" constraint *fast*? > > You coul patch pg_attribute directly. I'm not sure if that's still safe > in current versions, though. it is interesting option. especially since pg_reorg will recreate the table anyway. does anyone of you see any problems with it? procedure would look like: 1. update pg_attribute set attnotnull = true where attrelid = 'my_table'::regclass and attname = 'not-null-column'; 2. delete from my_table where not-null-column is null; -- this shouldn't do anything, as I know that there are no null values, but just in case 3. pg_reorg of the table. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
* hubert depesz lubaczewski: > procedure would look like: > 1. update pg_attribute set attnotnull = true where attrelid = 'my_table'::regclass and attname = 'not-null-column'; > 2. delete from my_table where not-null-column is null; -- this shouldn't > do anything, as I know that there are no null values, but just in > case > 3. pg_reorg of the table. You could install a trigger before step 1 which prevents INSERTs and UPDATEs which would add even more rows violating the constraint. I'm not sure if the DELETE will actually do anything, given that pg_attribute says that the column cannot be NULL. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On Thu, Feb 02, 2012 at 02:20:59PM +0100, hubert depesz lubaczewski wrote> > Sure. But at least theoretically, it could use index - for example, if > I had index "where column is null". To build that index, you had to visit every row too. But I see what your problem is. > But this should be nearly instantenous. This machine is very busy. In > the low-traffic moments we have ~ 5k transactions per second. [. . .] > This table is concurrently used. Taking it offline is not an option. It's this. You don't have low enough traffic to get the lock you need on the table. You're changing the schema of the table, and you need to lock it while you do that. Probably you're not getting the lock you need granted and therefore it seems like it's taking a long time. A -- Andrew Sullivan ajs@crankycanuck.ca