Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED - Mailing list pgsql-hackers

From Fabrízio de Royes Mello
Subject Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED
Date
Msg-id CAFcNs+qF5fUkkp9vdJWokiaBn_rRM4+HJqXeeVpD_7-tO0L4AA@mail.gmail.com
Whole thread Raw
In response to Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED  (Christoph Berg <cb@df7cb.de>)
Responses Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED  (Christoph Berg <cb@df7cb.de>)
List pgsql-hackers

On Tue, Jul 8, 2014 at 4:53 PM, Christoph Berg <cb@df7cb.de> wrote:
>
> Hi,
>
> here's my review for this patch.
>
> Generally, the patch addresses a real need, tables often only turn
> out to be desired as unlogged if there are performance problems in
> practice, and the other way round changing an unlogged table to logged
> is way more involved manually than it could be with this patch. So
> yes, we want the feature.
>
> I've tried the patch, and it works as desired, including tab
> completion in psql. There are a few issues to be resolved, though.
>

Thank you so much for your review!


> Re: Fabrízio de Royes Mello 2014-06-26 <CAFcNs+pyV0PBjLo97OSyqV1yQOC7s+JGvWXc8UnY5jSRDwy45A@mail.gmail.com>
> > As part of GSoC2014 and with agreement of my mentor and reviewer (Stephen
> > Frost) I've send a complement of the first patch to add the capability to
> > change a regular table to unlogged.
>
> (Fwiw, I believe this direction is the more interesting one.)
>

:-)


> > With this patch we finish the main goals of the GSoC2014 and now we'll work
> > in the additional goals.
>
> ... that being the non-WAL-logging with wal_level=minimal, or more?
>

This is the first of additional goals,  but we have others. Please see [1].


> > diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
> > index 69a1e14..424f2e9 100644
> > --- a/doc/src/sgml/ref/alter_table.sgml
> > +++ b/doc/src/sgml/ref/alter_table.sgml
> > @@ -58,6 +58,7 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
> >      ENABLE REPLICA RULE <replaceable class="PARAMETER">rewrite_rule_name</replaceable>
> >      ENABLE ALWAYS RULE <replaceable class="PARAMETER">rewrite_rule_name</replaceable>
> >      CLUSTER ON <replaceable class="PARAMETER">index_name</replaceable>
> > +    SET {LOGGED | UNLOGGED}
> >      SET WITHOUT CLUSTER
> >      SET WITH OIDS
> >      SET WITHOUT OIDS
>
> This must not be between the two CLUSTER lines. I think the best spot
> would just be one line down, before SET WITH OIDS.
>

Fixed.


> (While we are at it, SET TABLESPACE should probably be moved up to the
> other SET options...)
>

Fixed.


> > @@ -432,6 +433,20 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
> >     </varlistentry>
> >
> >     <varlistentry>
> > +    <term><literal>SET {LOGGED | UNLOGGED}</literal></term>
> > +    <listitem>
> > +     <para>
> > +      This form change the table persistence type from unlogged to permanent or
>
> This grammar bug pops up consistently: This form *changes*...
>
> There's trailing whitespace.
>

Fixed.


> > +      from unlogged to permanent by rewriting the entire contents of the table
> > +      and associated indexes into new disk files.
> > +     </para>
> > +     <para>
> > +      Changing the table persistence type acquires an <literal>ACCESS EXCLUSIVE</literal> lock.
> > +     </para>
>
> I'd rewrite that and add a reference:
>
> ... from unlogged to permanent (see <xref linkend="SQL-CREATETABLE-UNLOGGED">).
> </para>
> <para>
> Changing the table persistence type acquires an <literal>ACCESS EXCLUSIVE</literal> lock
> and rewrites the table contents and associated indexes into new disk files.
> </para>
>

Fixed.


> > diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
> > index 60d387a..9dfdca2 100644
> > --- a/src/backend/commands/tablecmds.c
> > +++ b/src/backend/commands/tablecmds.c
> > @@ -125,18 +125,19 @@ static List *on_commits = NIL;
> >   * a pass determined by subcommand type.
> >   */
> >
> > -#define AT_PASS_UNSET                        -1              /* UNSET will cause ERROR */
> > -#define AT_PASS_DROP                 0               /* DROP (all flavors) */
> > -#define AT_PASS_ALTER_TYPE           1               /* ALTER COLUMN TYPE */
> > -#define AT_PASS_OLD_INDEX            2               /* re-add existing indexes */
> > -#define AT_PASS_OLD_CONSTR           3               /* re-add existing constraints */
> > -#define AT_PASS_COL_ATTRS            4               /* set other column attributes */
> > +#define AT_PASS_UNSET                                -1              /* UNSET will cause ERROR */
> > +#define AT_PASS_DROP                         0               /* DROP (all flavors) */
> > +#define AT_PASS_ALTER_TYPE                   1               /* ALTER COLUMN TYPE */
> > +#define AT_PASS_OLD_INDEX                    2               /* re-add existing indexes */
> > +#define AT_PASS_OLD_CONSTR                   3               /* re-add existing constraints */
> > +#define AT_PASS_COL_ATTRS                    4               /* set other column attributes */
> >  /* We could support a RENAME COLUMN pass here, but not currently used */
> > -#define AT_PASS_ADD_COL                      5               /* ADD COLUMN */
> > -#define AT_PASS_ADD_INDEX            6               /* ADD indexes */
> > -#define AT_PASS_ADD_CONSTR           7               /* ADD constraints, defaults */
> > -#define AT_PASS_MISC                 8               /* other stuff */
> > -#define AT_NUM_PASSES                        9
> > +#define AT_PASS_ADD_COL                              5               /* ADD COLUMN */
> > +#define AT_PASS_ADD_INDEX                    6               /* ADD indexes */
> > +#define AT_PASS_ADD_CONSTR                   7               /* ADD constraints, defaults */
> > +#define AT_PASS_MISC                         8               /* other stuff */
> > +#define AT_PASS_SET_LOGGED_UNLOGGED  9               /* SET LOGGED and UNLOGGED */
> > +#define AT_NUM_PASSES                                10
>
>
> This unnecessarily rewrites all the tabs, but see below.
>

I did that because the new constant AT_PASS_SET_LOGGED_UNLOGGED is larger than others.


> > @@ -376,6 +377,7 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMOD
> >  static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
> >                                        char *cmd, List **wqueue, LOCKMODE lockmode,
> >                                        bool rewrite);
> > +static void ATPostAlterSetLoggedUnlogged(Oid relid);
>
> (See below)
>
> >  static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
> >  static void TryReuseForeignKey(Oid oldId, Constraint *con);
> >  static void change_owner_fix_column_acls(Oid relationOid,
> > @@ -402,6 +404,13 @@ static void ATExecAddOf(Relation rel, const TypeName *ofTypename, LOCKMODE lockm
> >  static void ATExecDropOf(Relation rel, LOCKMODE lockmode);
> >  static void ATExecReplicaIdentity(Relation rel, ReplicaIdentityStmt *stmt, LOCKMODE lockmode);
> >  static void ATExecGenericOptions(Relation rel, List *options);
> > +static void ATPrepSetLogged(Relation rel);
> > +static void ATPrepSetUnLogged(Relation rel);
> > +static void ATExecSetLogged(Relation rel);
> > +static void ATExecSetUnLogged(Relation rel);
> > +
> > +static void AlterTableSetLoggedCheckForeignConstraints(Relation rel);
> > +static void AlterTableSetLoggedOrUnlogged(Relation rel, bool toLogged);
>
> All that should be ordered like in the docs, i.e. after
> ATExecDropCluster. (... and the SetTableSpace stuff is already here ...)
>

Fixed.


> > +             case AT_SetLogged:
> > +             case AT_SetUnLogged:
> > +                     ATSimplePermissions(rel, ATT_TABLE);
> > +                     if (cmd->subtype == AT_SetLogged)
> > +                             ATPrepSetLogged(rel);                   /* SET LOGGED */
> > +                     else
> > +                             ATPrepSetUnLogged(rel);                 /* SET UNLOGGED */
> > +                     pass = AT_PASS_SET_LOGGED_UNLOGGED;
> > +                     break;
>
> I'm wondering if you shouldn't make a single ATPrepSetLogged function
> that takes and additional toLogged argument. Or alternatively get rid
> of the if() by putting the code also into case AT_SetLogged.
>

Actually I started that way... with just one ATPrep function we have some additional complexity to check relpersistence, define the error message and to run AlterTableSetLoggedCheckForeignConstraints(rel) function. So to simplify the code I decided split in two small functions.


> > @@ -3307,6 +3327,9 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode)
> >                               ATPostAlterTypeCleanup(wqueue, tab, lockmode);
> >
> >                       relation_close(rel, NoLock);
> > +
> > +                     if (pass == AT_PASS_SET_LOGGED_UNLOGGED)
> > +                             ATPostAlterSetLoggedUnlogged(RelationGetRelid(rel));
>
> This must be done before relation_close() which releases all locks.
>
> Moreover, I think you can get rid of that extra PASS here.
> AT_PASS_ALTER_TYPE has its own pass because you can alter several
> columns in a single ALTER TABLE statement, but you can have only one
> SET (UN)LOGGED, so you can to the cluster_rel() directly in
> AlterTableSetLoggedOrUnlogged() (unless cluster_rel() is too intrusive
> and would interfere with other ALTER TABLE operations in this command,
> no idea).
>

I had some troubles here so I decided to do in that way, but I confess I'm not comfortable with this implementation. Looking more carefully on tablecmds.c code, at the ATController we have three phases and the third is 'scan/rewrite tables as needed' so my doubt is if can I use it instead of call 'cluster_rel'?


> > @@ -3526,6 +3549,12 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
> >               case AT_GenericOptions:
> >                       ATExecGenericOptions(rel, (List *) cmd->def);
> >                       break;
> > +             case AT_SetLogged:
> > +                     ATExecSetLogged(rel);
> > +                     break;
> > +             case AT_SetUnLogged:
> > +                     ATExecSetUnLogged(rel);
> > +                     break;
>
> I'd replace ATExecSetLogged and ATExecSetUnLogged directly with
> AlterTableSetLoggedOrUnlogged(rel, true/false). The 1-line wrappers
> don't buy you anything.
>

Fixed.


> > +static void
> > +AlterTableSetLoggedCheckForeignConstraints(Relation rel)
> [...]
>
> I can't comment on the quality of this function, but it seems to be
> doing its job.
>

:-)


> > +/*
> > + * ALTER TABLE <name> SET UNLOGGED
> > + *
> > + * Change the table persistence type from permanent to unlogged by
> > + * rewriting the entire contents of the table and associated indexes
> > + * into new disk files.
> > + *
> > + * The ATPrepSetUnLogged function check all precondictions to perform
> > + * the operation:
> > + * - check if the target table is permanent
> > + */
> > +static void
> > +ATPrepSetUnLogged(Relation rel)
> > +{
> > +     /* check if is an permanent relation */
> > +     if (rel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
> > +             ereport(ERROR,
> > +                             (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> > +                              errmsg("table %s is not permanent",
> > +                              RelationGetRelationName(rel))));
> > +}
>
> Here's the big gotcha: Just like SET LOGGED must check for outgoing
> FKs to unlogged tables, SET UNLOGGED must check for incoming FKs from
> permanent tables. This is missing.
>

I don't think so... we can create an unlogged table with a FK referring to a regular table...

fabrizio=# create table foo(id integer primary key);
CREATE TABLE
fabrizio=# create unlogged table bar(id integer primary key, foo integer references foo);
CREATE TABLE

... but is not possible create a FK from a regular table referring to an unlogged table:

fabrizio=# create unlogged table foo(id integer primary key);
CREATE TABLE
fabrizio=# create table bar(id integer primary key, foo integer references foo);
ERROR:  constraints on permanent tables may reference only permanent tables

... and a FK from an unlogged table referring other unlogged table works:

fabrizio=# create unlogged table bar(id integer primary key, foo integer references foo);
CREATE TABLE

So we must take carefull just when changing an unlogged table to a regular table.

Am I correct or I miss something?


> > +/*
> > + * The AlterTableChangeCatalogToLoggedOrUnlogged function perform the
> > + * catalog changes, i.e. update pg_class.relpersistence to 'p' or 'u'
> > + */
> > +static void
> > +AlterTableChangeCatalogToLoggedOrUnlogged(Relation rel, Relation relrelation, bool toLogged)
> > +{
> > +     HeapTuple               tuple;
> > +     Form_pg_class   pg_class_form;
> > +
> > +     tuple = SearchSysCacheCopy1(RELOID,
> > +                                                              ObjectIdGetDatum(RelationGetRelid(rel)));
> > +     if (!HeapTupleIsValid(tuple))
> > +             elog(ERROR, "cache lookup failed for relation %u",
> > +                      RelationGetRelid(rel));
> > +
> > +     pg_class_form = (Form_pg_class) GETSTRUCT(tuple);
> > +
> > +     Assert(pg_class_form->relpersistence ==
> > +             ((toLogged) ? RELPERSISTENCE_UNLOGGED : RELPERSISTENCE_PERMANENT));
> > +
> > +     pg_class_form->relpersistence = toLogged ?
> > +                     RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED;
> > +
> > +     simple_heap_update(relrelation, &tuple->t_self, tuple);
> > +
> > +     /* keep catalog indexes current */
> > +     CatalogUpdateIndexes(relrelation, tuple);
> > +
> > +     heap_freetuple(tuple);
> > +}
>
> Again I can't comment on the low-level catalog stuff - though I'd
> remove some of those blank lines.
>

Fixed.


> > +/*
> > + * The AlterTableSetLoggedOrUnlogged function contains the main logic
> > + * of the operation, changing the catalog for main heap, toast and indexes
> > + */
> > +static void
> > +AlterTableSetLoggedOrUnlogged(Relation rel, bool toLogged)
> > +{
> > +     Oid                     relid;
> > +     Relation        indexRelation;
> > +     ScanKeyData skey;
> > +     SysScanDesc scan;
> > +     HeapTuple       indexTuple;
> > +     Relation        relrel;
> > +
> > +     relid = RelationGetRelid(rel);
> > +
> > +     /* open pg_class to update relpersistence */
> > +     relrel = heap_open(RelationRelationId, RowExclusiveLock);
> > +
> > +     /* main heap */
> > +     AlterTableChangeCatalogToLoggedOrUnlogged(rel, relrel, toLogged);
> > +
> > +     /* toast heap, if any */
> > +     if (OidIsValid(rel->rd_rel->reltoastrelid))
> > +     {
> > +             Relation        toastrel;
> > +
> > +             toastrel = heap_open(rel->rd_rel->reltoastrelid, AccessShareLock);
> > +             AlterTableChangeCatalogToLoggedOrUnlogged(toastrel, relrel, toLogged);
> > +             heap_close(toastrel, AccessShareLock);
>
> The comment on heap_open() suggests that you could directly invoke
> relation_open() because you know this is a toast table, similarly for
> index_open(). (I can't say which is better style.)
>

I don't know which is better style too... other opinions??


> > +     }
> > +
> > +     /* Prepare to scan pg_index for entries having indrelid = this rel. */
> > +     indexRelation = heap_open(IndexRelationId, AccessShareLock);
> > +     ScanKeyInit(&skey,
> > +                             Anum_pg_index_indrelid,
> > +                             BTEqualStrategyNumber, F_OIDEQ,
> > +                             relid);
> > +
> > +     scan = systable_beginscan(indexRelation, IndexIndrelidIndexId, true,
> > +                                                       NULL, 1, &skey);
> > +
> > +     while (HeapTupleIsValid(indexTuple = systable_getnext(scan)))
> > +     {
> > +             Form_pg_index index = (Form_pg_index) GETSTRUCT(indexTuple);
> > +             Relation indxrel = index_open(index->indexrelid, AccessShareLock);
> > +
> > +             AlterTableChangeCatalogToLoggedOrUnlogged(indxrel, relrel, toLogged);
> > +
> > +             index_close(indxrel, AccessShareLock);
> > +     }
>
> You forgot the TOAST index.
>

Fixed.


> > diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> > index 605c9b4..a784d73 100644
> > --- a/src/backend/parser/gram.y
> > +++ b/src/backend/parser/gram.y
>
> > @@ -2201,6 +2201,20 @@ alter_table_cmd:
> >                                       n->def = $3;
> >                                       $ = (Node *)n;
> >                               }
> > +                     /* ALTER TABLE <name> SET LOGGED  */
> > +                     | SET LOGGED
> > +                             {
> > +                                     AlterTableCmd *n = makeNode(AlterTableCmd);
> > +                                     n->subtype = AT_SetLogged;
> > +                                     $ = (Node *)n;
> > +                             }
> > +                     /* ALTER TABLE <name> SET UNLOGGED  */
> > +                     | SET UNLOGGED
> > +                             {
> > +                                     AlterTableCmd *n = makeNode(AlterTableCmd);
> > +                                     n->subtype = AT_SetUnLogged;
> > +                                     $ = (Node *)n;
> > +                             }
>
> Also move up to match the docs/other code ordering.
>

Fixed. But other ALTER commands in gram.y aren't in the same order of documentation.


> > index ff126eb..dc9f8fa 100644
> > --- a/src/include/nodes/parsenodes.h
> > +++ b/src/include/nodes/parsenodes.h
> > @@ -1331,7 +1331,9 @@ typedef enum AlterTableType
> >       AT_AddOf,                                       /* OF <type_name> */
> >       AT_DropOf,                                      /* NOT OF */
> >       AT_ReplicaIdentity,                     /* REPLICA IDENTITY */
> > -     AT_GenericOptions                       /* OPTIONS (...) */
> > +     AT_GenericOptions,                      /* OPTIONS (...) */
> > +     AT_SetLogged,                           /* SET LOGGED */
> > +     AT_SetUnLogged                          /* SET UNLOGGED */
>
> Likewise.
>

Fixed.


> > --- a/src/test/regress/expected/alter_table.out
> > +++ b/src/test/regress/expected/alter_table.out
>
> Please add test cases for incoming FKs, TOAST table relpersistence,
> and TOAST index relpersistence.
>

Added.


> Thanks for working on this feature, I'm looking forward to it!
>

You're welcome!

Regards,
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: better atomics - v0.5
Next
From: Simon Riggs
Date:
Subject: Re: tweaking NTUP_PER_BUCKET