Thread: Dump/Restore of non-default PKs

Dump/Restore of non-default PKs

From
Simon Riggs
Date:
At the moment you cannot create a unique index other than a btree. (As
discussed on other threads, I am pursuing unique hash indexes for
PostgreSQL, one step at a time).
You get "ERROR index foo_idx is not a btree"

According to parse_utilcmd.c line 2310, this is because it would break
pg_dump, which needs ADD CONSTRAINT to create the same kind of index
again. Fair enough.

This is needed because ADD CONSTRAINT just uses the defaults index
type. We could simply allow a GUC for
default_primary_key_access_method, but that is overkill and there
seems to be an easy and more general solution:

I propose that we change pg_dump so that when it creates a PK it does
so in 2 commands:
1. CREATE [UNIQUE] INDEX iname ...
2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;

Step
(1) recreates the index, respecting its AM, even if that is not a btree
(2) works and there is no problem with defaults

Doing this as 2 steps instead of one doesn't add any more time because
(2) is just a metadata-only change, not an index build.

Any objections to a patch to implement this thought?

--
Simon Riggs                http://www.EnterpriseDB.com/



Re: Dump/Restore of non-default PKs

From
"David G. Johnston"
Date:
On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
At the moment you cannot create a unique index other than a btree. (As
discussed on other threads, I am pursuing unique hash indexes for
PostgreSQL, one step at a time).
You get "ERROR index foo_idx is not a btree"

According to parse_utilcmd.c line 2310, this is because it would break
pg_dump, which needs ADD CONSTRAINT to create the same kind of index
again. Fair enough.

This is needed because ADD CONSTRAINT just uses the defaults index
type. We could simply allow a GUC for
default_primary_key_access_method, but that is overkill and there
seems to be an easy and more general solution:

I propose that we change pg_dump so that when it creates a PK it does
so in 2 commands:
1. CREATE [UNIQUE] INDEX iname ...
2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;

Step
(1) recreates the index, respecting its AM, even if that is not a btree
(2) works and there is no problem with defaults

Doing this as 2 steps instead of one doesn't add any more time because
(2) is just a metadata-only change, not an index build.

Any objections to a patch to implement this thought?

Why not just get rid of the limitation that constraint definitions don't support non-default methods?

I.e., add syntax to index_parameters so that the underlying index can be defined directly.

index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are:

[ INCLUDE ( column_name [, ... ] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) ]
[ USING INDEX TABLESPACE tablespace_name ]

We should add:

[ USING INDEX METHOD index_method ]

index_method := { BTREE | GIN | GIST | HASH | SPGIST | BRIN }

David J.

Re: Dump/Restore of non-default PKs

From
Tom Lane
Date:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
> I propose that we change pg_dump so that when it creates a PK it does
> so in 2 commands:
> 1. CREATE [UNIQUE] INDEX iname ...
> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;

> Step
> (1) recreates the index, respecting its AM, even if that is not a btree
> (2) works and there is no problem with defaults

> Doing this as 2 steps instead of one doesn't add any more time because
> (2) is just a metadata-only change, not an index build.

I don't believe the claim that this adds no cost.  Maybe it's negligible
in context, but you've provided no evidence of that.  (Parallel restore,
where the ALTER would have to be a separate worker task, would probably
be the worst case here.)

Also, I assume your ambition would extend to supporting UNIQUE (but
non-PKEY) constraints, so that would have to be done this way too.

A potential advantage of doing things this way is that if we make
pg_dump treat the index and the constraint as fully independent
objects, that might allow some logic simplifications in pg_dump.
Right now I think there are various weirdnesses in there that
exist precisely because we don't want to dump them separately.

One concern is that this'd create a hard compatibility break for
loading dump output into servers that predate whenever we added
ADD PRIMARY KEY USING INDEX.  However, it looks like that syntax
is accepted back to 9.1, so probably that's no issue in practice.
Maybe a bigger concern for people who want to port to other
RDBMSes?

            regards, tom lane



Re: Dump/Restore of non-default PKs

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com>
> wrote:
>> I propose that we change pg_dump so that when it creates a PK it does
>> so in 2 commands:
>> 1. CREATE [UNIQUE] INDEX iname ...
>> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;

> Why not just get rid of the limitation that constraint definitions don't
> support non-default methods?

That approach would be doubling down on the assumption that we can always
shoehorn more custom options into SQL-standard constraint clauses, and
we'll never fall foul of shift/reduce problems or future spec additions.
I think for example that USING INDEX TABLESPACE is a blot on humanity,
and I'd be very glad to see pg_dump stop using it in favor of doing
things as Simon suggests.

            regards, tom lane



Re: Dump/Restore of non-default PKs

From
"David G. Johnston"
Date:
On Mon, Apr 18, 2022 at 1:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com>
> wrote:
>> I propose that we change pg_dump so that when it creates a PK it does
>> so in 2 commands:
>> 1. CREATE [UNIQUE] INDEX iname ...
>> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;

> Why not just get rid of the limitation that constraint definitions don't
> support non-default methods?

That approach would be doubling down on the assumption that we can always
shoehorn more custom options into SQL-standard constraint clauses, and
we'll never fall foul of shift/reduce problems or future spec additions.
I think for example that USING INDEX TABLESPACE is a blot on humanity,
and I'd be very glad to see pg_dump stop using it in favor of doing
things as Simon suggests.


I'm convinced.

As for portability - that would be something we could explicitly define and support through a pg_dump option.  In compatibility mode you get whatever the default index would be for your engine while by default we output the existing index as defined and then alter-add it to the table.

David J.

Re: Dump/Restore of non-default PKs

From
Simon Riggs
Date:
On Mon, 18 Apr 2022 at 21:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com>
> > wrote:
> >> I propose that we change pg_dump so that when it creates a PK it does
> >> so in 2 commands:
> >> 1. CREATE [UNIQUE] INDEX iname ...
> >> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;
>
> > Why not just get rid of the limitation that constraint definitions don't
> > support non-default methods?
>
> That approach would be doubling down on the assumption that we can always
> shoehorn more custom options into SQL-standard constraint clauses, and
> we'll never fall foul of shift/reduce problems or future spec additions.
> I think for example that USING INDEX TABLESPACE is a blot on humanity,
> and I'd be very glad to see pg_dump stop using it in favor of doing
> things as Simon suggests.

Sigh, agreed. It's more work, but its cleaner in the longer term to
separate indexes from constraints.

I'll look in more detail and come back here later.

Thanks both.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Dump/Restore of non-default PKs

From
Simon Riggs
Date:
On Mon, 18 Apr 2022 at 22:05, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Mon, 18 Apr 2022 at 21:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com>
> > > wrote:
> > >> I propose that we change pg_dump so that when it creates a PK it does
> > >> so in 2 commands:
> > >> 1. CREATE [UNIQUE] INDEX iname ...
> > >> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;
> >
> > > Why not just get rid of the limitation that constraint definitions don't
> > > support non-default methods?
> >
> > That approach would be doubling down on the assumption that we can always
> > shoehorn more custom options into SQL-standard constraint clauses, and
> > we'll never fall foul of shift/reduce problems or future spec additions.
> > I think for example that USING INDEX TABLESPACE is a blot on humanity,
> > and I'd be very glad to see pg_dump stop using it in favor of doing
> > things as Simon suggests.
>
> Sigh, agreed. It's more work, but its cleaner in the longer term to
> separate indexes from constraints.
>
> I'll look in more detail and come back here later.
>
> Thanks both.

My original plan was to get pg_dump to generate

--
-- Name: foo foo_a_idx; Type: CONSTRAINT; Schema: public; Owner: postgres
--
CREATE UNIQUE INDEX foo_a_idx ON public.foo USING btree (a);
ALTER TABLE ONLY public.foo
    ADD CONSTRAINT foo_a_idx PRIMARY KEY USING INDEX foo_a_idx;

so the index definition is generated as a CONSTRAINT, not an INDEX.

Separating things a bit more generates this output, which is what I
think we want:

--
-- Name: foo foo_a_idx; Type: CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY public.foo
    ADD CONSTRAINT foo_a_idx PRIMARY KEY USING INDEX foo_a_idx;
--
-- Name: foo_a_idx; Type: INDEX; Schema: public; Owner: postgres
--
CREATE UNIQUE INDEX foo_a_idx ON public.foo USING btree (a);

Which is better, but there is still some ugly code for REPLICA
IDENTITY and CLUSTER duplicated in dumpIndex() and dumpConstraint().

The attached patch includes a change to pg_dump_sort.c which changes
the priority of CONSTRAINT, but that doesn't seem to have any effect
on the output. I'm hoping that's a quick fix, but I haven't seen it
yet, even after losing sanity points trying to read the priority code.

Anyway, the main question is how should the code be structured?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Dump/Restore of non-default PKs

From
"David G. Johnston"
Date:
On Tue, Apr 19, 2022 at 9:14 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Mon, 18 Apr 2022 at 22:05, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Mon, 18 Apr 2022 at 21:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > On Mon, Apr 18, 2022 at 1:00 PM Simon Riggs <simon.riggs@enterprisedb.com>
> > > wrote:
> > >> I propose that we change pg_dump so that when it creates a PK it does
> > >> so in 2 commands:
> > >> 1. CREATE [UNIQUE] INDEX iname ...
> > >> 2. ALTER TABLE .. ADD PRIMARY KEY USING INDEX iname;
> >
> > > Why not just get rid of the limitation that constraint definitions don't
> > > support non-default methods?
> >
> > That approach would be doubling down on the assumption that we can always
> > shoehorn more custom options into SQL-standard constraint clauses, and
> > we'll never fall foul of shift/reduce problems or future spec additions.
> > I think for example that USING INDEX TABLESPACE is a blot on humanity,
> > and I'd be very glad to see pg_dump stop using it in favor of doing
> > things as Simon suggests.
>
> Sigh, agreed. It's more work, but its cleaner in the longer term to
> separate indexes from constraints.
>
> I'll look in more detail and come back here later.
>
> Thanks both.

Anyway, the main question is how should the code be structured?


I don't have a good answer to that question but the patch presently produces the dump below for a partitioned table with one partition.

After manually adjusting the order of operations you end up with:

psql:/vagrant/pg_dump_indexattach.v1.txt:67: ERROR:  index "parent_pkey" is not valid
LINE 2:     ADD CONSTRAINT parent_pkey PRIMARY KEY USING INDEX paren...
                ^
Because:

ADD table_constraint_using_index
...This form is not currently supported on partitioned tables.

David J.

===== pg_dump with manual re-ordering of create/alter index before alter table

CREATE TABLE public.parent (
    id integer NOT NULL,
    class text NOT NULL
)
PARTITION BY LIST (class);

CREATE TABLE public.parent_a (
    id integer NOT NULL,
    class text NOT NULL
);

ALTER TABLE public.parent_a OWNER TO vagrant;

ALTER TABLE ONLY public.parent ATTACH PARTITION public.parent_a FOR VALUES IN ('a');

CREATE UNIQUE INDEX parent_pkey ON ONLY public.parent USING btree (id, class);

ALTER TABLE ONLY public.parent
    ADD CONSTRAINT parent_pkey PRIMARY KEY USING INDEX parent_pkey;

CREATE UNIQUE INDEX parent_a_pkey ON public.parent_a USING btree (id, class);

ALTER INDEX public.parent_pkey ATTACH PARTITION public.parent_a_pkey;

ALTER TABLE ONLY public.parent_a
    ADD CONSTRAINT parent_a_pkey PRIMARY KEY USING INDEX parent_a_pkey;


Re: Dump/Restore of non-default PKs

From
Simon Riggs
Date:
On Wed, 20 Apr 2022 at 03:05, David G. Johnston
<david.g.johnston@gmail.com> wrote:

> https://www.postgresql.org/docs/current/sql-altertable.html
> ADD table_constraint_using_index
> ...This form is not currently supported on partitioned tables.

Good to know, thanks very much for pointing it out.

That needs to be fixed before we can progress further on this thread.
Will look into it.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Dump/Restore of non-default PKs

From
Peter Eisentraut
Date:
On 18.04.22 22:48, Tom Lane wrote:
>> Why not just get rid of the limitation that constraint definitions don't
>> support non-default methods?
> That approach would be doubling down on the assumption that we can always
> shoehorn more custom options into SQL-standard constraint clauses, and
> we'll never fall foul of shift/reduce problems or future spec additions.

When we do get the ability to create a table with a primary key with an 
underlying hash index, how would that be done?  Would the only way be

1. create the table without primary key
2. create the index
3. attach the index as primary key constraint

That doesn't sound attractive.



Re: Dump/Restore of non-default PKs

From
Simon Riggs
Date:
On Wed, 20 Apr 2022 at 21:46, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 18.04.22 22:48, Tom Lane wrote:
> >> Why not just get rid of the limitation that constraint definitions don't
> >> support non-default methods?
> > That approach would be doubling down on the assumption that we can always
> > shoehorn more custom options into SQL-standard constraint clauses, and
> > we'll never fall foul of shift/reduce problems or future spec additions.
>
> When we do get the ability to create a table with a primary key with an
> underlying hash index, how would that be done?  Would the only way be
>
> 1. create the table without primary key
> 2. create the index
> 3. attach the index as primary key constraint
>
> That doesn't sound attractive.

Can you explain what you find unattractive about it?

The alternative is we have this

1. create the table without primary key
2. attach the index as primary key constraint (which must be extended
to include ALL of the options available on create index)

Having to extend ALTER TABLE so it exactly matches CREATE INDEX is
painful and maintaining it that way seems unattractive, to me.


Just so we are clear this is not about hash indexes, this is about
using ANY kind of index (i.e. any index access method, extension or
otherwise) to enforce a constraint.


Another idea might be to allow some kind of statement embedding... so
we don't need to constantly fiddle with ALTER TABLE
ALTER TABLE foo ADD PRIMARY KEY USING INDEX (CREATE INDEX .... )

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Dump/Restore of non-default PKs

From
Peter Eisentraut
Date:
On 21.04.22 13:43, Simon Riggs wrote:
>> 1. create the table without primary key
>> 2. create the index
>> 3. attach the index as primary key constraint
>>
>> That doesn't sound attractive.
> Can you explain what you find unattractive about it?

Well, if I want to create a table with a primary key, the established 
way is to say "primary key", not to have to assemble it from multiple 
pieces.

I think this case is very similar to exclusion constraints, which also 
have syntax to specify the index access method.



Re: Dump/Restore of non-default PKs

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 21.04.22 13:43, Simon Riggs wrote:
>> Can you explain what you find unattractive about it?

> Well, if I want to create a table with a primary key, the established 
> way is to say "primary key", not to have to assemble it from multiple 
> pieces.

> I think this case is very similar to exclusion constraints, which also 
> have syntax to specify the index access method.

That analogy would be compelling if exclusion constraints were a
SQL-standard feature; but they aren't so their clause syntax is
fully under our control.  The scenario that worries me is that
somewhere down the pike, the SQL committee might extend the
syntax of PKEY/UNIQUE constraint clauses in a way that breaks
our nonstandard extensions of them.

However, independently of whether we offer a syntax option or not,
it may still simplify pg_dump to make it treat the constraint and
the index as independent objects in all cases.

            regards, tom lane



Re: Dump/Restore of non-default PKs

From
Peter Eisentraut
Date:
On 22.04.22 16:14, Tom Lane wrote:
> That analogy would be compelling if exclusion constraints were a
> SQL-standard feature; but they aren't so their clause syntax is
> fully under our control.  The scenario that worries me is that
> somewhere down the pike, the SQL committee might extend the
> syntax of PKEY/UNIQUE constraint clauses in a way that breaks
> our nonstandard extensions of them.

Some syntax like

     PRIMARY KEY (x, y) USING ACCESS METHOD hash

should be able to avoid any future clashes.



Re: Dump/Restore of non-default PKs

From
Simon Riggs
Date:
On Thu, 28 Apr 2022 at 15:09, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 22.04.22 16:14, Tom Lane wrote:
> > That analogy would be compelling if exclusion constraints were a
> > SQL-standard feature; but they aren't so their clause syntax is
> > fully under our control.  The scenario that worries me is that
> > somewhere down the pike, the SQL committee might extend the
> > syntax of PKEY/UNIQUE constraint clauses in a way that breaks
> > our nonstandard extensions of them.
>
> Some syntax like
>
>      PRIMARY KEY (x, y) USING ACCESS METHOD hash
>
> should be able to avoid any future clashes.

That seems to conflict with USING INDEX TABLESPACE. I've tried a few
things but have not found anything yet.

Any other ideas on syntax?

-- 
Simon Riggs                http://www.EnterpriseDB.com/