Thread: Dump/Restore of non-default PKs
At the moment you cannot create a unique index other than a btree. (As discussed on other threads, I am pursuing unique hash indexes for PostgreSQL, one step at a time). You get "ERROR index foo_idx is not a btree" According to parse_utilcmd.c line 2310, this is because it would break pg_dump, which needs ADD CONSTRAINT to create the same kind of index again. Fair enough. This is needed because ADD CONSTRAINT just uses the defaults index type. We could simply allow a GUC for default_primary_key_access_method, but that is overkill and there seems to be an easy and more general solution: I propose that we change pg_dump so that when it creates a PK it does so in 2 commands: 1. CREATE [UNIQUE] INDEX iname ... 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname; Step (1) recreates the index, respecting its AM, even if that is not a btree (2) works and there is no problem with defaults Doing this as 2 steps instead of one doesn't add any more time because (2) is just a metadata-only change, not an index build. Any objections to a patch to implement this thought? -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
At the moment you cannot create a unique index other than a btree. (As
discussed on other threads, I am pursuing unique hash indexes for
PostgreSQL, one step at a time).
You get "ERROR index foo_idx is not a btree"
According to parse_utilcmd.c line 2310, this is because it would break
pg_dump, which needs ADD CONSTRAINT to create the same kind of index
again. Fair enough.
This is needed because ADD CONSTRAINT just uses the defaults index
type. We could simply allow a GUC for
default_primary_key_access_method, but that is overkill and there
seems to be an easy and more general solution:
I propose that we change pg_dump so that when it creates a PK it does
so in 2 commands:
1. CREATE [UNIQUE] INDEX iname ...
2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;
Step
(1) recreates the index, respecting its AM, even if that is not a btree
(2) works and there is no problem with defaults
Doing this as 2 steps instead of one doesn't add any more time because
(2) is just a metadata-only change, not an index build.
Any objections to a patch to implement this thought?
Why not just get rid of the limitation that constraint definitions don't support non-default methods?
I.e., add syntax to index_parameters so that the underlying index can be defined directly.
index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are:
[ INCLUDE ( column_name [, ... ] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ USING INDEX TABLESPACE tablespace_name ]
[ INCLUDE ( column_name [, ... ] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ USING INDEX TABLESPACE tablespace_name ]
We should add:
[ USING INDEX METHOD index_method ]
index_method := { BTREE | GIN | GIST | HASH | SPGIST | BRIN }
David J.
Simon Riggs <simon.riggs@enterprisedb.com> writes: > I propose that we change pg_dump so that when it creates a PK it does > so in 2 commands: > 1. CREATE [UNIQUE] INDEX iname ... > 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname; > Step > (1) recreates the index, respecting its AM, even if that is not a btree > (2) works and there is no problem with defaults > Doing this as 2 steps instead of one doesn't add any more time because > (2) is just a metadata-only change, not an index build. I don't believe the claim that this adds no cost. Maybe it's negligible in context, but you've provided no evidence of that. (Parallel restore, where the ALTER would have to be a separate worker task, would probably be the worst case here.) Also, I assume your ambition would extend to supporting UNIQUE (but non-PKEY) constraints, so that would have to be done this way too. A potential advantage of doing things this way is that if we make pg_dump treat the index and the constraint as fully independent objects, that might allow some logic simplifications in pg_dump. Right now I think there are various weirdnesses in there that exist precisely because we don't want to dump them separately. One concern is that this'd create a hard compatibility break for loading dump output into servers that predate whenever we added ADD PRIMARY KEY USING INDEX. However, it looks like that syntax is accepted back to 9.1, so probably that's no issue in practice. Maybe a bigger concern for people who want to port to other RDBMSes? regards, tom lane
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com> > wrote: >> I propose that we change pg_dump so that when it creates a PK it does >> so in 2 commands: >> 1. CREATE [UNIQUE] INDEX iname ... >> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname; > Why not just get rid of the limitation that constraint definitions don't > support non-default methods? That approach would be doubling down on the assumption that we can always shoehorn more custom options into SQL-standard constraint clauses, and we'll never fall foul of shift/reduce problems or future spec additions. I think for example that USING INDEX TABLESPACE is a blot on humanity, and I'd be very glad to see pg_dump stop using it in favor of doing things as Simon suggests. regards, tom lane
On Mon, Apr 18, 2022 at 1:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com>
> wrote:
>> I propose that we change pg_dump so that when it creates a PK it does
>> so in 2 commands:
>> 1. CREATE [UNIQUE] INDEX iname ...
>> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;
> Why not just get rid of the limitation that constraint definitions don't
> support non-default methods?
That approach would be doubling down on the assumption that we can always
shoehorn more custom options into SQL-standard constraint clauses, and
we'll never fall foul of shift/reduce problems or future spec additions.
I think for example that USING INDEX TABLESPACE is a blot on humanity,
and I'd be very glad to see pg_dump stop using it in favor of doing
things as Simon suggests.
I'm convinced.
As for portability - that would be something we could explicitly define and support through a pg_dump option. In compatibility mode you get whatever the default index would be for your engine while by default we output the existing index as defined and then alter-add it to the table.
David J.
On Mon, 18 Apr 2022 at 21:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com> > > wrote: > >> I propose that we change pg_dump so that when it creates a PK it does > >> so in 2 commands: > >> 1. CREATE [UNIQUE] INDEX iname ... > >> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname; > > > Why not just get rid of the limitation that constraint definitions don't > > support non-default methods? > > That approach would be doubling down on the assumption that we can always > shoehorn more custom options into SQL-standard constraint clauses, and > we'll never fall foul of shift/reduce problems or future spec additions. > I think for example that USING INDEX TABLESPACE is a blot on humanity, > and I'd be very glad to see pg_dump stop using it in favor of doing > things as Simon suggests. Sigh, agreed. It's more work, but its cleaner in the longer term to separate indexes from constraints. I'll look in more detail and come back here later. Thanks both. -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, 18 Apr 2022 at 22:05, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Mon, 18 Apr 2022 at 21:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > "David G. Johnston" <david.g.johnston@gmail.com> writes: > > > On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com> > > > wrote: > > >> I propose that we change pg_dump so that when it creates a PK it does > > >> so in 2 commands: > > >> 1. CREATE [UNIQUE] INDEX iname ... > > >> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname; > > > > > Why not just get rid of the limitation that constraint definitions don't > > > support non-default methods? > > > > That approach would be doubling down on the assumption that we can always > > shoehorn more custom options into SQL-standard constraint clauses, and > > we'll never fall foul of shift/reduce problems or future spec additions. > > I think for example that USING INDEX TABLESPACE is a blot on humanity, > > and I'd be very glad to see pg_dump stop using it in favor of doing > > things as Simon suggests. > > Sigh, agreed. It's more work, but its cleaner in the longer term to > separate indexes from constraints. > > I'll look in more detail and come back here later. > > Thanks both. My original plan was to get pg_dump to generate -- -- Name: foo foo_a_idx; Type: CONSTRAINT; Schema: public; Owner: postgres -- CREATE UNIQUE INDEX foo_a_idx ON public.foo USING btree (a); ALTER TABLE ONLY public.foo ADD CONSTRAINT foo_a_idx PRIMARY KEY USING INDEX foo_a_idx; so the index definition is generated as a CONSTRAINT, not an INDEX. Separating things a bit more generates this output, which is what I think we want: -- -- Name: foo foo_a_idx; Type: CONSTRAINT; Schema: public; Owner: postgres -- ALTER TABLE ONLY public.foo ADD CONSTRAINT foo_a_idx PRIMARY KEY USING INDEX foo_a_idx; -- -- Name: foo_a_idx; Type: INDEX; Schema: public; Owner: postgres -- CREATE UNIQUE INDEX foo_a_idx ON public.foo USING btree (a); Which is better, but there is still some ugly code for REPLICA IDENTITY and CLUSTER duplicated in dumpIndex() and dumpConstraint(). The attached patch includes a change to pg_dump_sort.c which changes the priority of CONSTRAINT, but that doesn't seem to have any effect on the output. I'm hoping that's a quick fix, but I haven't seen it yet, even after losing sanity points trying to read the priority code. Anyway, the main question is how should the code be structured? -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Tue, Apr 19, 2022 at 9:14 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Mon, 18 Apr 2022 at 22:05, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Mon, 18 Apr 2022 at 21:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com>
> > > wrote:
> > >> I propose that we change pg_dump so that when it creates a PK it does
> > >> so in 2 commands:
> > >> 1. CREATE [UNIQUE] INDEX iname ...
> > >> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;
> >
> > > Why not just get rid of the limitation that constraint definitions don't
> > > support non-default methods?
> >
> > That approach would be doubling down on the assumption that we can always
> > shoehorn more custom options into SQL-standard constraint clauses, and
> > we'll never fall foul of shift/reduce problems or future spec additions.
> > I think for example that USING INDEX TABLESPACE is a blot on humanity,
> > and I'd be very glad to see pg_dump stop using it in favor of doing
> > things as Simon suggests.
>
> Sigh, agreed. It's more work, but its cleaner in the longer term to
> separate indexes from constraints.
>
> I'll look in more detail and come back here later.
>
> Thanks both.
Anyway, the main question is how should the code be structured?
I don't have a good answer to that question but the patch presently produces the dump below for a partitioned table with one partition.
After manually adjusting the order of operations you end up with:
psql:/vagrant/pg_dump_indexattach.v1.txt:67: ERROR: index "parent_pkey" is not valid
LINE 2: ADD CONSTRAINT parent_pkey PRIMARY KEY USING INDEX paren...
^
LINE 2: ADD CONSTRAINT parent_pkey PRIMARY KEY USING INDEX paren...
^
Because:
ADD table_constraint_using_index
...This form is not currently supported on partitioned tables.
David J.
===== pg_dump with manual re-ordering of create/alter index before alter table
CREATE TABLE public.parent (
id integer NOT NULL,
class text NOT NULL
)
PARTITION BY LIST (class);
CREATE TABLE public.parent_a (
id integer NOT NULL,
class text NOT NULL
);
ALTER TABLE public.parent_a OWNER TO vagrant;
ALTER TABLE ONLY public.parent ATTACH PARTITION public.parent_a FOR VALUES IN ('a');
id integer NOT NULL,
class text NOT NULL
)
PARTITION BY LIST (class);
CREATE TABLE public.parent_a (
id integer NOT NULL,
class text NOT NULL
);
ALTER TABLE public.parent_a OWNER TO vagrant;
ALTER TABLE ONLY public.parent ATTACH PARTITION public.parent_a FOR VALUES IN ('a');
CREATE UNIQUE INDEX parent_pkey ON ONLY public.parent USING btree (id, class);
ALTER TABLE ONLY public.parent
ADD CONSTRAINT parent_pkey PRIMARY KEY USING INDEX parent_pkey;
CREATE UNIQUE INDEX parent_a_pkey ON public.parent_a USING btree (id, class);
ALTER INDEX public.parent_pkey ATTACH PARTITION public.parent_a_pkey;
ADD CONSTRAINT parent_pkey PRIMARY KEY USING INDEX parent_pkey;
CREATE UNIQUE INDEX parent_a_pkey ON public.parent_a USING btree (id, class);
ALTER INDEX public.parent_pkey ATTACH PARTITION public.parent_a_pkey;
ALTER TABLE ONLY public.parent_a
ADD CONSTRAINT parent_a_pkey PRIMARY KEY USING INDEX parent_a_pkey;
On Wed, 20 Apr 2022 at 03:05, David G. Johnston <david.g.johnston@gmail.com> wrote: > https://www.postgresql.org/docs/current/sql-altertable.html > ADD table_constraint_using_index > ...This form is not currently supported on partitioned tables. Good to know, thanks very much for pointing it out. That needs to be fixed before we can progress further on this thread. Will look into it. -- Simon Riggs http://www.EnterpriseDB.com/
On 18.04.22 22:48, Tom Lane wrote: >> Why not just get rid of the limitation that constraint definitions don't >> support non-default methods? > That approach would be doubling down on the assumption that we can always > shoehorn more custom options into SQL-standard constraint clauses, and > we'll never fall foul of shift/reduce problems or future spec additions. When we do get the ability to create a table with a primary key with an underlying hash index, how would that be done? Would the only way be 1. create the table without primary key 2. create the index 3. attach the index as primary key constraint That doesn't sound attractive.
On Wed, 20 Apr 2022 at 21:46, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 18.04.22 22:48, Tom Lane wrote: > >> Why not just get rid of the limitation that constraint definitions don't > >> support non-default methods? > > That approach would be doubling down on the assumption that we can always > > shoehorn more custom options into SQL-standard constraint clauses, and > > we'll never fall foul of shift/reduce problems or future spec additions. > > When we do get the ability to create a table with a primary key with an > underlying hash index, how would that be done? Would the only way be > > 1. create the table without primary key > 2. create the index > 3. attach the index as primary key constraint > > That doesn't sound attractive. Can you explain what you find unattractive about it? The alternative is we have this 1. create the table without primary key 2. attach the index as primary key constraint (which must be extended to include ALL of the options available on create index) Having to extend ALTER TABLE so it exactly matches CREATE INDEX is painful and maintaining it that way seems unattractive, to me. Just so we are clear this is not about hash indexes, this is about using ANY kind of index (i.e. any index access method, extension or otherwise) to enforce a constraint. Another idea might be to allow some kind of statement embedding... so we don't need to constantly fiddle with ALTER TABLE ALTER TABLE foo ADD PRIMARY KEY USING INDEX (CREATE INDEX .... ) -- Simon Riggs http://www.EnterpriseDB.com/
On 21.04.22 13:43, Simon Riggs wrote: >> 1. create the table without primary key >> 2. create the index >> 3. attach the index as primary key constraint >> >> That doesn't sound attractive. > Can you explain what you find unattractive about it? Well, if I want to create a table with a primary key, the established way is to say "primary key", not to have to assemble it from multiple pieces. I think this case is very similar to exclusion constraints, which also have syntax to specify the index access method.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 21.04.22 13:43, Simon Riggs wrote: >> Can you explain what you find unattractive about it? > Well, if I want to create a table with a primary key, the established > way is to say "primary key", not to have to assemble it from multiple > pieces. > I think this case is very similar to exclusion constraints, which also > have syntax to specify the index access method. That analogy would be compelling if exclusion constraints were a SQL-standard feature; but they aren't so their clause syntax is fully under our control. The scenario that worries me is that somewhere down the pike, the SQL committee might extend the syntax of PKEY/UNIQUE constraint clauses in a way that breaks our nonstandard extensions of them. However, independently of whether we offer a syntax option or not, it may still simplify pg_dump to make it treat the constraint and the index as independent objects in all cases. regards, tom lane
On 22.04.22 16:14, Tom Lane wrote: > That analogy would be compelling if exclusion constraints were a > SQL-standard feature; but they aren't so their clause syntax is > fully under our control. The scenario that worries me is that > somewhere down the pike, the SQL committee might extend the > syntax of PKEY/UNIQUE constraint clauses in a way that breaks > our nonstandard extensions of them. Some syntax like PRIMARY KEY (x, y) USING ACCESS METHOD hash should be able to avoid any future clashes.
On Thu, 28 Apr 2022 at 15:09, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 22.04.22 16:14, Tom Lane wrote: > > That analogy would be compelling if exclusion constraints were a > > SQL-standard feature; but they aren't so their clause syntax is > > fully under our control. The scenario that worries me is that > > somewhere down the pike, the SQL committee might extend the > > syntax of PKEY/UNIQUE constraint clauses in a way that breaks > > our nonstandard extensions of them. > > Some syntax like > > PRIMARY KEY (x, y) USING ACCESS METHOD hash > > should be able to avoid any future clashes. That seems to conflict with USING INDEX TABLESPACE. I've tried a few things but have not found anything yet. Any other ideas on syntax? -- Simon Riggs http://www.EnterpriseDB.com/