Thread: Best Approach for Swapping a Table with its Copy
Hi folks, I have a scenario where I need to swap an original table with a copy of that table. The copy has an exclusion constraint that the original does not have. The main challenge is to ensure that the swap does not break any existing foreign keys to the original table and handles the associated TOAST tables correctly. Both tables are fairly large and exist in a system where there are no maintenance time windows where the application is shut down. My key questions are: - What is the best process for swapping the original table with the copy in such a way that the foreign key relations are preserved? - Are there any special considerations for managing the TOAST tables during this swap? - Should I perform this operation in multiple steps, or is there a straightforward way to achieve this atomically? - Are there any risks of potential issues I should be aware of when doing this swap? Specifically related to foreign key integrity and TOAST data? Thank you! - Marcelo
On 2/12/25 12:57 PM, Marcelo Fernandes wrote: > Hi folks, > > I have a scenario where I need to swap an original table with a copy of that > table. > > The copy has an exclusion constraint that the original does not have. The main > challenge is to ensure that the swap does not break any existing foreign keys > to the original table and handles the associated TOAST tables correctly. > > Both tables are fairly large and exist in a system where there are no > maintenance time windows where the application is shut down. This needs more information: 1) Postgres version. 2) The table definition. 3) The exclusion constraint definition. 4) Definition of what 'fairly large' is. 5) How is the application interfacing with the database? > > My key questions are: > > - What is the best process for swapping the original table with the copy in > such a way that the foreign key relations are preserved? > > - Are there any special considerations for managing the TOAST tables during > this swap? > > - Should I perform this operation in multiple steps, or is there a > straightforward way to achieve this atomically? > > - Are there any risks of potential issues I should be aware of when doing this > swap? Specifically related to foreign key integrity and TOAST data? > > Thank you! > - Marcelo > > -- Adrian Klaver adrian.klaver@aklaver.com
On 2/12/25 12:57 PM, Marcelo Fernandes wrote: > Hi folks, > > I have a scenario where I need to swap an original table with a copy of that > table. > Should have added to previous post: Why can't you just add the exclusion constraint to the original table? -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Feb 13, 2025 at 10:02 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > This needs more information: > 1) Postgres version. That would be for Postgres 15. > 2) The table definition. This is a simplified version of the original table: CREATE TABLE bookings ( id SERIAL PRIMARY KEY, resource_id INT NOT NULL, start_date DATE NOT NULL, end_date DATE NOT NULL ); > 3) The exclusion constraint definition. The copy table would have an exclusion constraint such as: ALTER TABLE bookings ADD CONSTRAINT no_date_overlap_for_resource_id EXCLUDE USING gist ( resource_id WITH =, daterange(start_date, end_date, '[]') WITH && ); > 4) Definition of what 'fairly large' is. This table is over 400GB > 5) How is the application interfacing with the database? This is a web application that interfaces with the database using psycopg. Also pulling in your question in the other reply: > Why can't you just add the exclusion constraint to the original table? With unique constraints, one can use a unique index to create the constraint concurrently. With check constraints, one can create the constraint as invalid and then validate it while only requiring a share update exclusive lock. But with exclusion constraints, neither of those techniques are available. In that sense, there is no way to create this type of constraint in a large table without copying the original table, adding the constraint, and performing a table swap. This is done to avoid having to hold an exclusive lock for a long amount of time, thus creating application outages. Hope that clarifies the situation a bit better - Marcelo
On 2/12/25 1:27 PM, Marcelo Fernandes wrote: > On Thu, Feb 13, 2025 at 10:02 AM Adrian Klaver > This is a simplified version of the original table: > > CREATE TABLE bookings ( > id SERIAL PRIMARY KEY, > resource_id INT NOT NULL, > start_date DATE NOT NULL, > end_date DATE NOT NULL > ); > >> 3) The exclusion constraint definition. > > The copy table would have an exclusion constraint such as: > > ALTER TABLE bookings > ADD CONSTRAINT no_date_overlap_for_resource_id > EXCLUDE USING gist ( > resource_id WITH =, > daterange(start_date, end_date, '[]') WITH && > ); Do you know this will not fail on the existing data? > >> 4) Definition of what 'fairly large' is. > > This table is over 400GB Do you have room for a complete copy of the table? > > This is done to avoid having to hold an exclusive lock for a long amount of > time, thus creating application outages. I am not seeing how this can be done without some outage for that table. What sort of time frame is acceptable? > > Hope that clarifies the situation a bit better > - Marcelo -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Feb 13, 2025 at 10:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Do you know this will not fail on the existing data? Yes, all the current data in the original table respects the constraint. > Do you have room for a complete copy of the table? Yes, in this scenario the copy is already created, and triggers keep the copy in sync with the original table. > I am not seeing how this can be done without some outage for that table. Swapping tables without an outage is possible. I believe that this something involved in the process repacking a table when using pg_repack as it relies on copying the whole table and then swapping it for the original. My main question is "How?". I know that the pg_repack involves swapping the relfilenode values and something about TOAST tables, but I am not super acquainted with pg_repack code or debugging tools to verify precisely what it does. > What sort of time frame is acceptable? The scan phase in this table is very slow, on top of it the exclusion constraint needs to create the underlying index to service the constraint. Anything that takes more than 10s in this system is prohibitive, in this sense creating the constraint without having a table copy is not viable for the size of this table. Regards,
On 2/12/25 14:04, Marcelo Fernandes wrote: > On Thu, Feb 13, 2025 at 10:40 AM Adrian Klaver > <adrian.klaver@aklaver.com> wrote: >> Do you know this will not fail on the existing data? > > Yes, all the current data in the original table respects the constraint. > >> Do you have room for a complete copy of the table? > > Yes, in this scenario the copy is already created, and triggers keep the copy > in sync with the original table. To confirm, this copy has the exclusion constraint defined? > >> I am not seeing how this can be done without some outage for that table. > > Swapping tables without an outage is possible. I believe that this something > involved in the process repacking a table when using pg_repack as it relies on > copying the whole table and then swapping it for the original. Not seeing it: https://reorg.github.io/pg_repack/ "Details Full Table Repacks [...] pg_repack will only hold an ACCESS EXCLUSIVE lock for a short period during initial setup (steps 1 and 2 above) and during the final swap-and-drop phase (steps 6 and 7). For the rest of its time, pg_repack only needs to hold an ACCESS SHARE lock on the original table, meaning INSERTs, UPDATEs, and DELETEs may proceed as usual." During the ACCESS EXCLUSIVE stages you will not have access. Not only that with pg_repack you are not changing the table definition, whereas in your case you are introducing a new constraint and associated index. > > My main question is "How?". I know that the pg_repack involves swapping the > relfilenode values and something about TOAST tables, but I am not super > acquainted with pg_repack code or debugging tools to verify precisely what it > does. > >> What sort of time frame is acceptable? > > The scan phase in this table is very slow, on top of it the exclusion > constraint needs to create the underlying index to service the constraint. > > Anything that takes more than 10s in this system is prohibitive, in this sense > creating the constraint without having a table copy is not viable for the size > of this table. Do you have a dev setup where you can test alternatives with a test sample of data? > > Regards, -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Feb 13, 2025 at 1:33 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Not seeing it: > > https://reorg.github.io/pg_repack/ > > "Details > Full Table Repacks > > [...] > > pg_repack will only hold an ACCESS EXCLUSIVE lock for a short period > during initial setup (steps 1 and 2 above) and during the final > swap-and-drop phase (steps 6 and 7). For the rest of its time, pg_repack > only needs to hold an ACCESS SHARE lock on the original table, meaning > INSERTs, UPDATEs, and DELETEs may proceed as usual." > > During the ACCESS EXCLUSIVE stages you will not have access. The operations under ACCESS EXCLUSIVE are short-lived. In my benchmarks, the "repack_swap" function finishes within the order of magnitude of milliseconds. The operations seem to be catalogue-only. I'd love someone to confirm this though, because I am not a specialist in C and Postgres extensions. Here is the code if you want to have a look: https://github.com/reorg/pg_repack/blob/306b0d4f7f86e807498ac00baec89ecd33411398/lib/repack.c#L843 What I am after is the same, but I seek a deeper understanding of what it does, and why it does it. For example, it swaps relfilenode. Why? > Not only that with pg_repack you are not changing the table definition, > whereas in your case you are introducing a new constraint and associated > index. Correct, but I am not using pg_repack. I have cloned the table using my own tool. I'm citing pg_repack because it does perform a table swap (a bloated table is swapped by a new non-bloated table). Given that I know pg_repack works well in large databases, it has to follow that the approach they have to swapping the tables is robust. > Do you have a dev setup where you can test alternatives with a test > sample of data? Do you mean alternatives to table-cloning-and-swapping? Regards, - Marcelo
On Thu, 2025-02-13 at 11:04 +1300, Marcelo Fernandes wrote: > > I am not seeing how this can be done without some outage for that table. > > Swapping tables without an outage is possible. Yes, but only if you are willing to write C code that runs inside the database server. That way, you can do anything (and cause arbitrary damage). The big challenge here would be to do the swap in a safe way. How do you intend to guarantee that the foreign keys are valid without a table scan? How do you handle concurrent data modifications? Yours, Laurenz Albe -- *E-Mail Disclaimer* Der Inhalt dieser E-Mail ist ausschliesslich fuer den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. *CONFIDENTIALITY NOTICE & DISCLAIMER *This message and any attachment are confidential and may be privileged or otherwise protected from disclosure and solely for the use of the person(s) or entity to whom it is intended. If you have received this message in error and are not the intended recipient, please notify the sender immediately and delete this message and any attachment from your system. If you are not the intended recipient, be advised that any use of this message is prohibited and may be unlawful, and you must not copy this message or attachment or disclose the contents to any other person.
On Wed, Feb 12, 2025 at 9:02 PM Marcelo Fernandes <marcefern7@gmail.com> wrote:
What I am after is the same, but I seek a deeper understanding of what it does, and why it does it. For example, it swaps relfilenode. Why?
It is surgically replacing all pointers to the old data with pointers to the new data. Yes, with lots of system catalog shenanigans.
pg_repack is meant to do what vacuum full does, but in a faster way. Imagine your table is an 18-wheeler truck, with a cab (system catalog stuff) and a trailer (full of data). We don't want a whole new truck, we want to change out the trailer.
With VACUUM FULL, you stop all traffic while you pull the truck to the side of the road and turn it off. A new truck is pulled alongside it, and everything from the old trailer is unloaded and placed in the new one. The new trailer is hooked to the cab, and pulls away into the now-moving traffic.
With pg_repack, you keep driving full speed. A new truck pulls up alongside your truck, and the new trailer is filled based on the old one. At the last moment, all the wires are pulled from the old trailer and hooked to the new trailer. The old trailer is detached and left to crash into the mutant bikers who have been pursuing you. It's the same cab, but the trailer (e.g. relfilenodes) has been changed.
It's technically possible to do something similar for your use case, but it's not trivial. All the cab to trailer wires must be precisely changed. Everything directly related to the data must be swapped: heap, indexes, toast.
Cheers,
Greg
On Thu, Feb 13, 2025 at 4:09 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Wed, Feb 12, 2025 at 9:02 PM Marcelo Fernandes <marcefern7@gmail.com> wrote:What I am after is the same, but I seek a deeper understanding of what it does, and why it does it. For example, it swaps relfilenode. Why?It is surgically replacing all pointers to the old data with pointers to the new data. Yes, with lots of system catalog shenanigans.pg_repack is meant to do what vacuum full does, but in a faster way. Imagine your table is an 18-wheeler truck, with a cab (system catalog stuff) and a trailer (full of data). We don't want a whole new truck, we want to change out the trailer.With VACUUM FULL, you stop all traffic while you pull the truck to the side of the road and turn it off. A new truck is pulled alongside it, and everything from the old trailer is unloaded and placed in the new one. The new trailer is hooked to the cab, and pulls away into the now-moving traffic.With pg_repack, you keep driving full speed. A new truck pulls up alongside your truck, and the new trailer is filled based on the old one. At the last moment, all the wires are pulled from the old trailer and hooked to the new trailer. The old trailer is detached and left to crash into the mutant bikers who have been pursuing you. It's the same cab, but the trailer (e.g. relfilenodes) has been changed.It's technically possible to do something similar for your use case, but it's not trivial. All the cab to trailer wires must be precisely changed. Everything directly related to the data must be swapped: heap, indexes, toast.
Thanks for the colorful analogy Greg :).
Maybe the better option is to support ALTER TABLE to ADD an exclusion constraint, no?
I get that it's not support now. But is it more difficult than the above? And why then? --DD
On 2/13/25 07:25, Dominique Devienne wrote: > On Thu, Feb 13, 2025 at 4:09 PM Greg Sabino Mullane <htamfids@gmail.com > <mailto:htamfids@gmail.com>> wrote: > > > Thanks for the colorful analogy Greg :). > > Maybe the better option is to support ALTER TABLE to ADD an exclusion > constraint, no? That exists: select version(); PostgreSQL 14.15 create table exclusion_test(id integer primary key, dt1 timestamptz, dt2 timestamptz); ALTER TABLE exclusion_test ADD CONSTRAINT dt_overlap EXCLUDE USING gist ( id WITH =, tstzrange(dt1, dt2, '[]') WITH && ); \d exclusion_test Table "public.exclusion_test" Column | Type | Collation | Nullable | Default --------+--------------------------+-----------+----------+--------- id | integer | | not null | dt1 | timestamp with time zone | | | dt2 | timestamp with time zone | | | Indexes: "exclusion_test_pkey" PRIMARY KEY, btree (id) "dt_overlap" EXCLUDE USING gist (id WITH =, tstzrange(dt1, dt2, '[]'::text) WITH &&) > I get that it's not support now. But is it more difficult than the > above? And why then? --DD From here: https://www.postgresql.org/message-id/CAM2F1VOOn1izCrtcrDx4YUtCY-H64Vj5yvN5H1Lhk8kCY%2B_W6Q%40mail.gmail.com " > Why can't you just add the exclusion constraint to the original table? With unique constraints, one can use a unique index to create the constraint concurrently. With check constraints, one can create the constraint as invalid and then validate it while only requiring a share update exclusive lock. But with exclusion constraints, neither of those techniques are available. In that sense, there is no way to create this type of constraint in a large table without copying the original table, adding the constraint, and performing a table swap. This is done to avoid having to hold an exclusive lock for a long amount of time, thus creating application outages. Hope that clarifies the situation a bit better " -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, Feb 13, 2025 at 7:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > Yes, but only if you are willing to write C code that runs inside the > database server. That way, you can do anything (and cause arbitrary > damage). > > The big challenge here would be to do the swap in a safe way. How do > you intend to guarantee that the foreign keys are valid without a table > scan? How do you handle concurrent data modifications? Exactly! This is part of what I am trying to figure out (-: The plan that I have in mind so far to swap a table safely is: 1. Run the following query to grab information about the toast and the toast index of the tables involved in the swapping: SELECT X.oid, X.reltoastrelid, X.relowner, TOAST_X.indexrelid FROM pg_catalog.pg_class X LEFT JOIN pg_catalog.pg_index TOAST_X ON X.reltoastrelid = TOAST_X.indrelid AND TOAST_X.indisvalid WHERE X.oid = ('my_sweet_table')::regclass; 2. Open a transaction and acquire an access exclusive lock. 3. Ensure that the cloned table has the same owner as the original table ALTER TABLE copy_table OWNER TO owner_of_the_original_table; 4. Now I need to swap all the data in the catalogue tables that point to the old table and the toast to point to the new one and vice-versa (in case I need to rollback). 5. Commit Step 4 is what I am stuck on. What bits of the catalogue do I need to change? And for what reasons? It may be a risky operation, as you say and I might decide not to do pursue it in the end, but first I must understand (-: Regards, - Marcelo
On Fri, Feb 14, 2025 at 4:09 AM Greg Sabino Mullane <htamfids@gmail.com> wrote: > It is surgically replacing all pointers to the old data with pointers to the new data. Yes, with lots of system catalogshenanigans. Love your analogy Greg, thanks for that. > It's technically possible to do something similar for your use case, but it's not trivial. All the cab to trailer wiresmust be precisely changed. Everything directly related to the data must be swapped: heap, indexes, toast. I'd really appreciate to know more about how I can do this, as I think this is the crux of what I am trying to solve. If you have pointers, thoughts, or resources where I can better understand what's involved, that would be much appreciated. In terms of where I am at currently, I summarised in my previous reply: On Fri, Feb 14, 2025 at 11:59 AM Marcelo Fernandes <marcefern7@gmail.com> wrote: > The plan that I have in mind so far to swap a table safely is: > > 1. Run the following query to grab information about the toast and the toast > index of the tables involved in the swapping: > > SELECT > X.oid, > X.reltoastrelid, > X.relowner, > TOAST_X.indexrelid > FROM pg_catalog.pg_class X > LEFT JOIN > pg_catalog.pg_index TOAST_X ON X.reltoastrelid = > TOAST_X.indrelid AND TOAST_X.indisvalid > WHERE X.oid = ('my_sweet_table')::regclass; > > 2. Open a transaction and acquire an access exclusive lock. > 3. Ensure that the cloned table has the same owner as the original table > > ALTER TABLE copy_table OWNER TO owner_of_the_original_table; > > 4. Now I need to swap all the data in the catalogue tables that point to the > old table and the toast to point to the new one and vice-versa (in case I > need to rollback). > 5. Commit > > Step 4 is what I am stuck on. What bits of the catalogue do I need to change? > And for what reasons? > > It may be a risky operation, as you say and I might decide not to do pursue it > in the end, but first I must understand (-: Regards, - Marcelo
On 2/13/25 14:59, Marcelo Fernandes wrote: > On Thu, Feb 13, 2025 at 7:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> Yes, but only if you are willing to write C code that runs inside the >> database server. That way, you can do anything (and cause arbitrary >> damage). >> >> The big challenge here would be to do the swap in a safe way. How do >> you intend to guarantee that the foreign keys are valid without a table >> scan? How do you handle concurrent data modifications? > > Exactly! This is part of what I am trying to figure out (-: > > The plan that I have in mind so far to swap a table safely is: > > 1. Run the following query to grab information about the toast and the toast > index of the tables involved in the swapping: > > SELECT > X.oid, > X.reltoastrelid, > X.relowner, > TOAST_X.indexrelid > FROM pg_catalog.pg_class X > LEFT JOIN > pg_catalog.pg_index TOAST_X ON X.reltoastrelid = > TOAST_X.indrelid AND TOAST_X.indisvalid > WHERE X.oid = ('my_sweet_table')::regclass; > > 2. Open a transaction and acquire an access exclusive lock. > 3. Ensure that the cloned table has the same owner as the original table > > ALTER TABLE copy_table OWNER TO owner_of_the_original_table; > > 4. Now I need to swap all the data in the catalogue tables that point to the > old table and the toast to point to the new one and vice-versa (in case I > need to rollback). > 5. Commit > > Step 4 is what I am stuck on. What bits of the catalogue do I need to change? > And for what reasons? 1) In a previous post you said: "Yes, in this scenario the copy is already created, and triggers keep the copy in sync with the original table." In that case the copy will already have TOAST tables associated with it. 2) What are the FK relationships and how many? Also could you just not add the FK's as NOT VALID? > > It may be a risky operation, as you say and I might decide not to do pursue it > in the end, but first I must understand (-: > > Regards, > - Marcelo -- Adrian Klaver adrian.klaver@aklaver.com
On Fri, Feb 14, 2025 at 12:35 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote: > 1) In a previous post you said: > > "Yes, in this scenario the copy is already created, and triggers keep > the copy in sync with the original table." > > In that case the copy will already have TOAST tables associated with it. If we follow the idea behind repack_swap, we would have swapped the oid's of the two tables. This means you have to swap the TOAST table in the catalogue as well. Otherwise the new table will be linked to the old TOAST and the old table will be linked to the new TOAST. We want the opposite. > 2) What are the FK relationships and how many? I think that for theoretical purposes we can just say there are "N" FKs. Because no matter how many there are, they need to be updated to point towards the new table. > Also could you just not add the FK's as NOT VALID? That's an interesting compromise I haven't thought of. Thanks. However, ideally I'd like to swap the catalogue entries instead - as that would be a cleaner approach since it wouldn't require dropping old constraints, creating NOT VALID ones, and then optionally validating them later. Regards, - Marcelo
On Fri, 2025-02-14 at 11:59 +1300, Marcelo Fernandes wrote: > On Thu, Feb 13, 2025 at 7:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > The big challenge here would be to do the swap in a safe way. How do > > you intend to guarantee that the foreign keys are valid without a table > > scan? How do you handle concurrent data modifications? > > Exactly! This is part of what I am trying to figure out (-: > > The plan that I have in mind so far to swap a table safely is: > > 1. Run the following query to grab information about the toast and the toast > index of the tables involved in the swapping: > > SELECT > X.oid, > X.reltoastrelid, > X.relowner, > TOAST_X.indexrelid > FROM pg_catalog.pg_class X > LEFT JOIN > pg_catalog.pg_index TOAST_X ON X.reltoastrelid = > TOAST_X.indrelid AND TOAST_X.indisvalid > WHERE X.oid = ('my_sweet_table')::regclass; > > 2. Open a transaction and acquire an access exclusive lock. > 3. Ensure that the cloned table has the same owner as the original table > > ALTER TABLE copy_table OWNER TO owner_of_the_original_table; > > 4. Now I need to swap all the data in the catalogue tables that point to the > old table and the toast to point to the new one and vice-versa (in case I > need to rollback). > 5. Commit > > Step 4 is what I am stuck on. What bits of the catalogue do I need to change? > And for what reasons? > > It may be a risky operation, as you say and I might decide not to do pursue it > in the end, but first I must understand (-: In your steps, you carefully avoid the question of whether the foreign key is valid or not. So you are trusting the user to have made sure that everything is fine with the foreign key... I may forget something, but I'd say that swapping out the files from under a table's butt is just a question of updating the "pg_class.relfilenode" of the tables and the TOAST tables. But if you just UPDATE the catalogs, concurrent statements that are using the tables will be in trouble, so you need to take an ACCESS EXCLUSIVE lock. Moreover, you have to make sure to send out invalidation messages so that every session that caches statistics or execution plans for the tables discards them. Yours, Laurenz Albe -- *E-Mail Disclaimer* Der Inhalt dieser E-Mail ist ausschliesslich fuer den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. *CONFIDENTIALITY NOTICE & DISCLAIMER *This message and any attachment are confidential and may be privileged or otherwise protected from disclosure and solely for the use of the person(s) or entity to whom it is intended. If you have received this message in error and are not the intended recipient, please notify the sender immediately and delete this message and any attachment from your system. If you are not the intended recipient, be advised that any use of this message is prohibited and may be unlawful, and you must not copy this message or attachment or disclose the contents to any other person.
> On 12 Feb 2025, at 22:27, Marcelo Fernandes <marcefern7@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 10:02 AM Adrian Klaver > > Also pulling in your question in the other reply: > >> Why can't you just add the exclusion constraint to the original table? > > With unique constraints, one can use a unique index to create the constraint > concurrently. > > With check constraints, one can create the constraint as invalid and then > validate it while only requiring a share update exclusive lock. > > But with exclusion constraints, neither of those techniques are available. In > that sense, there is no way to create this type of constraint in a large table > without copying the original table, adding the constraint, and performing a > table swap. > > This is done to avoid having to hold an exclusive lock for a long amount of > time, thus creating application outages. > Just a wild idea (not sure if anyone suggested it in this thread and not sure if it is doable): Create index concurrently and then fiddle with the catalog tables to define the constraint using this index? — Michal
On Thu, Feb 13, 2025 at 6:06 PM Marcelo Fernandes <marcefern7@gmail.com> wrote:
> It's technically possible to do something similar for your use case, but it's not trivial. All the cab to trailer wires must be precisely changed. Everything directly related to the data must be swapped: heap, indexes, toast.
I'd really appreciate to know more about how I can do this, as I think this is
the crux of what I am trying to solve.
The pg_repack link posted earlier has the details on how it is done. But messing with system catalogs like this is highly discouraged, for good reasons. Still, if you need to go that route, test heavily and post the solutions here for feedback.
Cheers,
Greg
On Fri, Feb 14, 2025 at 12:41 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Moreover, you have to make sure to send out invalidation messages so that every session that caches statistics or
execution plans for the tables discards them.
Hmm...is that really necessary? Because if so, there is no direct SQL-level way to do that I am aware of, so we are back to a C extension. Or just restarting the cluster :)
Cheers,
Greg
On Fri, Feb 14, 2025 at 1:02 AM Michał Kłeczek <michal@kleczek.org> wrote:
Create index concurrently and then fiddle with the catalog tables to define the constraint using this index?
You mean an ALTER TABLE ... ADD CONSTRAINT ... EXCLUDE without actually doing an ALTER TABLE. Nope, that's far worse than the pg_repack shenanigans, as we would be creating new catalog objects.
Ideally would be that someone adds support for USING INDEX for an exclusion constraint.
Cheers,
Greg
--
On Sat, Feb 15, 2025 at 4:12 AM Greg Sabino Mullane <htamfids@gmail.com> wrote: > The pg_repack link posted earlier has the details on how it is done. But messing with system catalogs like this is highlydiscouraged, for good reasons. Still, if you need to go that route, test heavily and post the solutions here for feedback. I'm trying to digest what pg_repack does by reproducing the same behaviour using SQL. I have come up with the script below to test the whole scenario, but I have two major problems: - The foreign keys are not being updated to point to the new table. - pg_repack seems to update relcache entries. I am not sure how to do that with SQL. See: https://github.com/marcelofern/pg_repack/blob/9f36c65bd57ca1b228025687843758556b56df8e/lib/repack.c#L1373-L1387 And for reference, here is the script I have used so far. Keen for any suggestions on how to swap the foreign keys so that they can point towards the new table. ```sql -- Create the original table that will be later swapped by its copy. DROP TABLE IF EXISTS original CASCADE; CREATE TABLE original ( id SERIAL PRIMARY KEY, name VARCHAR(5000) NOT NULL, -- necessary for the TOAST table. value INTEGER NOT NULL ); -- Insert 10_000 rows into it. INSERT INTO original (name, value) SELECT 'item_' || generate_series(1, 10000) AS name, (generate_series(1, 10000) % 10000) + 1 AS value; -- Create the copy table, this table will be swapped for the original table -- later DROP TABLE IF EXISTS copy; CREATE TABLE copy ( id SERIAL PRIMARY KEY, name VARCHAR(5000) NOT NULL, value INTEGER NOT NULL ); -- Pull all the data from the original table into the copy table. INSERT INTO copy SELECT id, name, value FROM ONLY original; -- Create a table with a foreign key to the original table to verify if the -- swap addresses the foreign key table. DROP TABLE IF EXISTS table_with_fk; CREATE TABLE table_with_fk ( id SERIAL PRIMARY KEY, original_id INTEGER NOT NULL, CONSTRAINT fk_original FOREIGN KEY (original_id) REFERENCES original(id) ON DELETE CASCADE ON UPDATE CASCADE ); -- Insert 10_000 rows into it. INSERT INTO table_with_fk (original_id) SELECT generate_series(1, 10000); -- Analogously, create a table with a NOT VALID foreign key. DROP TABLE IF EXISTS table_with_not_valid_fk; CREATE TABLE table_with_not_valid_fk ( id SERIAL PRIMARY KEY, original_id INTEGER NOT NULL, CONSTRAINT not_valid_fk_original FOREIGN KEY (original_id) REFERENCES original(id) ON DELETE CASCADE ON UPDATE CASCADE NOT VALID ); -- Insert 10_000 rows INSERT INTO table_with_not_valid_fk (original_id) SELECT generate_series(1, 10000); -- All tables must have 10_000 rows in them. SELECT count(*) FROM original; SELECT count(*) FROM copy; SELECT count(*) FROM table_with_fk; SELECT count(*) FROM table_with_not_valid_fk; -- See relation info for the tables and their TOASTs. SELECT X.relname, X.reltablespace, X.oid, X.reltoastrelid, X.relowner, X.relkind, X.relfrozenxid, X.relminmxid, X.relpages, X.reltuples, X.relallvisible, X.relfilenode, TOAST_X.indexrelid as toast_indexrelid FROM pg_catalog.pg_class X LEFT JOIN pg_catalog.pg_index TOAST_X ON X.reltoastrelid = TOAST_X.indrelid AND TOAST_X.indisvalid WHERE X.oid IN (('original')::regclass, ('copy')::regclass) ORDER BY X.relname; -- -[ RECORD 1 ]----+--------- -- relname | copy -- reltablespace | 0 -- oid | 22522 -- reltoastrelid | 22526 -- relowner | 10 -- relkind | r -- relfrozenxid | 2068 -- relminmxid | 1 -- relpages | 64 -- reltuples | 10000 -- relallvisible | 64 -- relfilenode | 22522 -- toast_indexrelid | 22527 -- -[ RECORD 2 ]----+--------- -- relname | original -- reltablespace | 0 -- oid | 22513 -- reltoastrelid | 22517 -- relowner | 10 -- relkind | r -- relfrozenxid | 2065 -- relminmxid | 1 -- relpages | 64 -- reltuples | 10000 -- relallvisible | 64 -- relfilenode | 22513 -- toast_indexrelid | 22518 -- Take note of the dependencies for the toast table to compare later. SELECT d1.objid AS original_objid, d2.objid AS copy_objid FROM pg_depend d1, pg_depend d2 WHERE d1.objid = ('original'::regclass) AND d2.objid = ('copy'::regclass); -- -[ RECORD 1 ]--+------ -- original_objid | 22513 -- copy_objid | 22522 -- Start table swap inside a transaction. BEGIN; LOCK TABLE original, copy IN ACCESS EXCLUSIVE MODE; SELECT * FROM pg_class WHERE relname in ('original', 'copy') FOR UPDATE; WITH swapped AS ( SELECT c1.oid AS original_oid, c2.oid AS copy_oid, c1.relfilenode AS original_filenode, c2.relfilenode AS copy_filenode, c1.reltablespace AS original_tablespace, c2.reltablespace AS copy_tablespace, c1.reltoastrelid AS original_toast, c2.reltoastrelid AS copy_toast, c1.relfrozenxid AS original_frozenxid, c2.relfrozenxid AS copy_frozenxid, c1.relminmxid as original_relminmxid, c2.relminmxid AS copy_relminmxid, c1.relpages AS original_pages, c2.relpages AS copy_pages, c1.reltuples AS original_tuples, c2.reltuples AS copy_tuples, c1.relallvisible AS original_allvisible, c2.relallvisible AS copy_allvisible FROM pg_class c1, pg_class c2 WHERE c1.relname = 'original' AND c2.relname = 'copy' ) UPDATE pg_class SET relfilenode = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_filenode FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_filenode FROM swapped) END, reltablespace = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_tablespace FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_tablespace FROM swapped) END, reltoastrelid = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_toast FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_toast FROM swapped) END, relfrozenxid = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_frozenxid FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_frozenxid FROM swapped) END, relminmxid = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_relminmxid FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_relminmxid FROM swapped) END, relpages = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_pages FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_pages FROM swapped) END, reltuples = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_tuples FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_tuples FROM swapped) END, relallvisible = CASE WHEN oid = (SELECT original_oid FROM swapped) THEN (SELECT copy_allvisible FROM swapped) WHEN oid = (SELECT copy_oid FROM swapped) THEN (SELECT original_allvisible FROM swapped) END WHERE oid IN (SELECT original_oid FROM swapped UNION SELECT copy_oid FROM swapped); -- See that relevant fields have been swapped SELECT X.relname, X.reltablespace, X.oid, X.reltoastrelid, X.relowner, X.relkind, X.relfrozenxid, X.relminmxid, X.relpages, X.reltuples, X.relallvisible, X.relfilenode, TOAST_X.indexrelid as toast_indexrelid FROM pg_catalog.pg_class X LEFT JOIN pg_catalog.pg_index TOAST_X ON X.reltoastrelid = TOAST_X.indrelid AND TOAST_X.indisvalid WHERE X.oid IN (('original')::regclass, ('copy')::regclass) ORDER BY X.relname; -- -[ RECORD 1 ]----+--------- -- relname | copy -- reltablespace | 0 -- oid | 22522 -- reltoastrelid | 22517 -- relowner | 10 -- relkind | r -- relfrozenxid | 2065 -- relminmxid | 1 -- relpages | 64 -- reltuples | 10000 -- relallvisible | 64 -- relfilenode | 22513 -- toast_indexrelid | 22518 -- -[ RECORD 2 ]----+--------- -- relname | original -- reltablespace | 0 -- oid | 22513 -- reltoastrelid | 22526 -- relowner | 10 -- relkind | r -- relfrozenxid | 2068 -- relminmxid | 1 -- relpages | 64 -- reltuples | 10000 -- relallvisible | 64 -- relfilenode | 22522 -- toast_indexrelid | 22527 -- Lock the pg_depend rows that correspond to 'original' and 'copy' SELECT * FROM pg_depend WHERE objid IN (('original')::regclass, ('copy')::regclass) FOR UPDATE; -- Swap the objid values for the two dependencies WITH swapped_dep AS ( SELECT d1.objid AS original_objid, d2.objid AS copy_objid FROM pg_depend d1, pg_depend d2 WHERE d1.objid = ('original'::regclass) AND d2.objid = ('copy'::regclass) ) -- TODO: this update is not working, maybe it needs to be deleted and then -- inserted again? A delete-followed-by create is what pg_repack seems to do. UPDATE pg_depend SET objid = CASE WHEN objid = (SELECT original_objid FROM swapped_dep) THEN (SELECT copy_objid FROM swapped_dep) WHEN objid = (SELECT copy_objid FROM swapped_dep) THEN (SELECT original_objid FROM swapped_dep) END WHERE objid IN (SELECT original_objid FROM swapped_dep UNION SELECT copy_objid FROM swapped_dep); -- Verify the dependencies have been swapped. SELECT d1.objid AS original_objid, d2.objid AS copy_objid FROM pg_depend d1, pg_depend d2 WHERE d1.objid = ('original'::regclass) AND d2.objid = ('copy'::regclass); ---[ RECORD 1 ]--+------ --original_objid | 22513 --copy_objid | 22522 -- Renames! ALTER TABLE original RENAME TO temp_original; ALTER TABLE copy RENAME TO original; ALTER TABLE temp_original RENAME TO copy; DROP TABLE copy CASCADE; -- Insert a couple of rows in the new "original" to verify it works INSERT INTO original (id, name, value) values (10001, 'my_new_row', 10); SELECT * from original order by id DESC; -- TODO (minor): Index names for pks and its seq have not been renamed. -- TODO (major): The FKs on the related tables weren't updated to use the new -- table -- \d table_with_fk -- Table "public.table_with_fk" -- Column | Type | Collation | Nullable | Default -- -------------+---------+-----------+----------+------------------------------------------- -- id | integer | | not null | nextval('table_with_fk_id_seq'::regclass) -- original_id | integer | | not null | -- Indexes: -- "table_with_fk_pkey" PRIMARY KEY, btree (id) -- -- \d table_with_not_valid_fk -- Table "public.table_with_not_valid_fk" -- Column | Type | Collation | Nullable | Default -- -------------+---------+-----------+----------+----------------------------------------------------- -- id | integer | | not null | nextval('table_with_not_valid_fk_id_seq'::regclass) -- original_id | integer | | not null | -- Indexes: -- "table_with_not_valid_fk_pkey" PRIMARY KEY, btree (id) -- Roll this back so that your postgres db doesn't get potentially messed up. ROLLBACK; ```
On Sun, Feb 16, 2025 at 5:58 PM Marcelo Fernandes <marcefern7@gmail.com> wrote:
- The foreign keys are not being updated to point to the new table.
You started out okay with your test script, but the pg_depend bit needs work. I would recommend examining that table closely until you have a really good understanding of how it works - both on the objid and refobjid side. Create and modify tables and see how the pg_depend entries change. Hint: Try it with and without foreign keys.
- pg_repack seems to update relcache entries. I am not sure how to do that with SQL.
Yes, as I mentioned upthread, the safest thing will probably be a restart.
Cheers,
Greg