Re: pglogical - logical replication contrib module - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: pglogical - logical replication contrib module
Date
Msg-id 56D4E01C.1060209@2ndquadrant.com
Whole thread Raw
In response to Re: pglogical - logical replication contrib module  (Steve Singer <steve@ssinger.info>)
List pgsql-hackers
Hi

On 03/02/16 03:25, Steve Singer wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:       tested, failed
> Spec compliant:           not tested
> Documentation:            tested, failed
>
> Here is some more review
>
> +- `pglogical.replication_set_add_table(set_name name, table_name regclass, synchronize boolean)`
> +  Adds a table to replication set.
> +
> +  Parameters:
> +  - `set_name` - name of the existing replication set
> +  - `table_name` - name or OID of the table to be added to the set
> +  - `synchronize` - if true, the table data is synchronized on all subscribers
> +    which are subscribed to given replication set, default false
> +
>
> The argument to this function is actually named "relation" not "table_name" though we might want to update the
functionto name the argument table_name.
 
>
> Also we don't explain what 'synchronize' means I first thought that a value of false would mean that existing data
won'tbe copied but any new changes will be.
 
> A value of false actually seems to mean that nothing will happen with the table until the synchronize function is
manuallycalled. We seem to be using the word 'synchronize' in different sense in different places I find it confusing
(iesynchronize_data and syncronize_structure in create_subscription).
 
>

False should mean exactly what you thought it would, will have to look 
what's the issue there. Obviously UPDATEs or DELETEs won't really do 
anything when there are no data but INSERTs should be replicated even 
with false.

But I agree we need to define sychronized better, as we discussed we 
also want to change status to replicated instead of synchronized. I am 
btw thinking that default value for synchronizing schema should be false 
in the create_subsription.

>
>
> *** a/contrib/pglogical/pglogical_sync.c
> --- b/contrib/pglogical/pglogical_sync.c
> + static void
> + dump_structure(PGLogicalSubscription *sub, const char *snapshot)
> + {
> +   char        pg_dump[MAXPGPATH];
> +   uint32      version;
> +   int         res;
> +   StringInfoData  command;
> +
> +   if (find_other_exec_version(my_exec_path, PGDUMP_BINARY, &version, pg_dump))
> +       elog(ERROR, "pglogical subscriber init failed to find pg_dump relative to binary %s",
> +            my_exec_path);
> +
> +   if (version / 100 != PG_VERSION_NUM / 100)
> +       elog(ERROR, "pglogical subscriber init found pg_dump with wrong major version %d.%d, expected %d.%d",
> +            version / 100 / 100, version / 100 % 100,
> +            PG_VERSION_NUM / 100 / 100, PG_VERSION_NUM / 100 % 100);
> +
> +   initStringInfo(&command);
> + #if PG_VERSION_NUM < 90500
> +   appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -N pglogical_origin -F c -f \"/tmp/pglogical-%d.dump\"
\"%s\"",
> + #else
> +   appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"",
>
> 1) I am not sure we can assume/require that the pg_dump binary be in the same location as the postgres binary.  I
don'tknow think we've ever required that client binaries (ie psql, pg_dump, pg_restore ...) be in the same directory as
postgres. pg_upgrade does require this so maybe this isn't a problem in practice but I thought I'd point it out.
Ideallywouldn't need to call an external program to get a schema dump but turning pg_dump into a library is beyond the
scopeof this patch.
 
>

Well for now I don't see that as big issue, especially given that the 
pg_dump needs to be same version as the server. We can make it GUC if 
needed but that's not something that seems problematic so far. I agree 
ideal solution would be to have library but that's something that will 
take much longer I am afraid.

>
> 2) I don't think we can hard-coded /tmp as the directory for the schema dump.  I don't think will work on most
windowssystems and even on a unix system $TMPDIR might be set to something else.  Maybe writing this into pgsql_tmp
wouldbe a better choice.
 
>

Yeah I turned that into GUC.

> Furtherdown in
> pglogical_sync_subscription(PGLogicalSubscription *sub)
> +   switch (status)
> +   {
> +       /* Already synced, nothing to do except cleanup. */
> +       case SYNC_STATUS_READY:
> +           MemoryContextDelete(myctx);
> +           return;
> +       /* We can recover from crashes during these. */
> +       case SYNC_STATUS_INIT:
> +       case SYNC_STATUS_CATCHUP:
> +           break;
> +       default:
> +           elog(ERROR,
> +                "subscriber %s initialization failed during nonrecoverable step (%c), please try the setup again",
> +                sub->name, status);
> +           break;
> +   }
>
> I think the default case needs to do something to unregister the background worker.  We already discussed trying to
getthe error message to a user in a better way either way there isn't any sense in this background worker being
launchedagain if the error is nonrecoverable.
 
>

Agreed, for this specific case we can actually pretty easily put the 
error into some catalog and just disable the subscription.

>
> +
> +               tables = copy_replication_sets_data(sub->origin_if->dsn,
> +                                                   sub->target_if->dsn,
> +                                                   snapshot,
> +                                                   sub->replication_sets);
> +
> +               /* Store info about all the synchronized tables. */
> +               StartTransactionCommand();
> +               foreach (lc, tables)
>
> Shouldn't we be storing the info about the synchronized tables as part of the same transaction that does the sync?
>

Well that's complicated as we also have post copy stuff to do (creating 
indexes and stuff), so far we wan to begin from beginning I think if the 
table fails so we consider it unsynced until also post-data part is 
done. But I think the initial sync needs a lot of work in general.


>
> I'll keeping going through the code as I have time.   I think it is appropriate to move this to the next CF since the
CFis past the end date and the patch has received some review.   When you have an updated version of the patch post it,
don'twait until March.
 
>

Sorry for not being very active in this thread, I really appreciate that 
you take time to review this, I was just quite busy last few weeks (and 
stolen laptop during business trip didn't help that much either). I 
wasn't specifically waiting for March, but I have more WIP things 
(privately) on this that I wanted to submit as a whole but not enough 
time to get it to -hackers (one of those things is replica trigger 
firing that you mentioned upthread). If you are interested I have the 
"hackers preparation" branch at 
https://github.com/2ndQuadrant/postgres/tree/dev/pglogical , it does not 
have WIP stuff, mostly only things I am already happy with and it's what 
I use for git format-patch for hackers submission.


--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: amcheck (B-Tree integrity checking tool)
Next
From: Kevin Grittner
Date:
Subject: Re: snapshot too old, configured by time