Re: pg_dump and pgpool - Mailing list pgsql-general

From Tatsuo Ishii
Subject Re: pg_dump and pgpool
Date
Msg-id 20041231.004629.57439590.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: pg_dump and pgpool  (Scott Marlowe <smarlowe@g2switchworks.com>)
Responses Re: pg_dump and pgpool  (Scott Marlowe <smarlowe@g2switchworks.com>)
List pgsql-general
> On Wed, 2004-12-29 at 17:30, Tom Lane wrote:
> > Scott Marlowe <smarlowe@g2switchworks.com> writes:
> > > On Wed, 2004-12-29 at 16:56, Tom Lane wrote:
> > >> No, we'd be throwing more, and more complex, queries.  Instead of a
> > >> simple lookup there would be some kind of join, or at least a lookup
> > >> that uses a multicolumn key.
> >
> > > I'm willing to bet the performance difference is less than noise.
> >
> > [ shrug... ]  I don't have a good handle on that, and neither do you.
> > What I am quite sure about though is that pg_dump would become internally
> > a great deal messier and harder to maintain if it couldn't use OIDs.
> > Look at the DumpableObject manipulations and ask yourself what you're
> > going to do instead if you have to use a primary key that is of a
> > different kind (different numbers of columns and datatypes) for each
> > system catalog.  Ugh.
>
> Wait, do you mean it's impossible to throw a single SQL query with a
> proper join clause that USES OIDs but doesn't return them?  Or that it's
> impossible to throw a single query without joining on OIDs.  I don't
> mind joining on OIDs, I just don't want them crossing the connection is
> all.  And yes, it might be ugly, but I can't imagine it being
> unmaintable for some reason.
>
> > I don't think it's worth that price to support a fundamentally bogus
> > approach to backup.
>
> But it's not bogus.  IT allows me to compare two databases running under
> a pgpool synchronous cluster and KNOW if there are inconsistencies in
> data between them, so it is quite useful to me.
>
> > IMHO you don't want extra layers of software in
> > between pg_dump and the database --- each one just introduces another
> > risk of getting a wrong backup.  You've yet to explain what the
> > *benefit* of putting pgpool in there is for this problem.
>
> Actually, it ensures that I get the right backup, because pgpool will
> cause the backup to fail if there are any differences between the two
> backend servers, thus telling me that I have an inconsistency.
>
> That's the primary reason I want this.  The secondary reason, which I
> can work around, is that I'm running the individual databases on
> machines that only answer the specific IP of the pgpool machine's IP, so
> remote backups aren't possible, and only the pgpool machine would be
> capable of doing the backups, but we have (like so many other companies)
> a centralized backup server.  I can always allow that machine to connect
> to the database(s) to do backup, but my fear is that by allowing
> anything other than pgpool to hit those backend databases they could be
> placed out of sync with each other.  Admitted, a backup process
> shouldn't be updating the database, so this, as I said, isn't really a
> big deal.  More of a mild kink really.  As long as all access is
> happening through pgpool, they should stay coherent to each other.

Pgpool could be modified so that it has "no SELECT replication mode",
where pgpool runs SELECT on only master server. I could do this if you
think it's usefull.

However problem is pg_dump is not only running SELECT but also
modifying database (counting up OID counter), i.e. it creates
temporary tables. Is this a problem for you?
--
Tatsuo Ishii

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump and pgpool
Next
From: Greg Stark
Date:
Subject: Re: pg_dump and pgpool