Re: pg_dump and pgpool - Mailing list pgsql-general

From Scott Marlowe
Subject Re: pg_dump and pgpool
Date
Msg-id 1104427339.5893.65.camel@state.g2switchworks.com
Whole thread Raw
In response to Re: pg_dump and pgpool  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-general
On Thu, 2004-12-30 at 09:46, Tatsuo Ishii wrote:
> > On Wed, 2004-12-29 at 17:30, Tom Lane wrote:
> > > Scott Marlowe <smarlowe@g2switchworks.com> writes:
> > > > On Wed, 2004-12-29 at 16:56, Tom Lane wrote:
> > > >> No, we'd be throwing more, and more complex, queries.  Instead of a
> > > >> simple lookup there would be some kind of join, or at least a lookup
> > > >> that uses a multicolumn key.
> > >
> > > > I'm willing to bet the performance difference is less than noise.
> > >
> > > [ shrug... ]  I don't have a good handle on that, and neither do you.
> > > What I am quite sure about though is that pg_dump would become internally
> > > a great deal messier and harder to maintain if it couldn't use OIDs.
> > > Look at the DumpableObject manipulations and ask yourself what you're
> > > going to do instead if you have to use a primary key that is of a
> > > different kind (different numbers of columns and datatypes) for each
> > > system catalog.  Ugh.
> >
> > Wait, do you mean it's impossible to throw a single SQL query with a
> > proper join clause that USES OIDs but doesn't return them?  Or that it's
> > impossible to throw a single query without joining on OIDs.  I don't
> > mind joining on OIDs, I just don't want them crossing the connection is
> > all.  And yes, it might be ugly, but I can't imagine it being
> > unmaintable for some reason.
> >
> > > I don't think it's worth that price to support a fundamentally bogus
> > > approach to backup.
> >
> > But it's not bogus.  IT allows me to compare two databases running under
> > a pgpool synchronous cluster and KNOW if there are inconsistencies in
> > data between them, so it is quite useful to me.
> >
> > > IMHO you don't want extra layers of software in
> > > between pg_dump and the database --- each one just introduces another
> > > risk of getting a wrong backup.  You've yet to explain what the
> > > *benefit* of putting pgpool in there is for this problem.
> >
> > Actually, it ensures that I get the right backup, because pgpool will
> > cause the backup to fail if there are any differences between the two
> > backend servers, thus telling me that I have an inconsistency.
> >
> > That's the primary reason I want this.  The secondary reason, which I
> > can work around, is that I'm running the individual databases on
> > machines that only answer the specific IP of the pgpool machine's IP, so
> > remote backups aren't possible, and only the pgpool machine would be
> > capable of doing the backups, but we have (like so many other companies)
> > a centralized backup server.  I can always allow that machine to connect
> > to the database(s) to do backup, but my fear is that by allowing
> > anything other than pgpool to hit those backend databases they could be
> > placed out of sync with each other.  Admitted, a backup process
> > shouldn't be updating the database, so this, as I said, isn't really a
> > big deal.  More of a mild kink really.  As long as all access is
> > happening through pgpool, they should stay coherent to each other.
>
> Pgpool could be modified so that it has "no SELECT replication mode",
> where pgpool runs SELECT on only master server. I could do this if you
> think it's usefull.
>
> However problem is pg_dump is not only running SELECT but also
> modifying database (counting up OID counter), i.e. it creates
> temporary tables. Is this a problem for you?

Does it?  I didn't know it used temp tables.  It's not that big of a
deal, and I'm certain I can work around it.  I just really like the idea
of a cluster of pg servers running sychronously behind a redirector and
looking, for all the world, like one database.  But I think it would
take log shipping for it to work the way I'm envisioning.  I'd much
rather see work go into making pgpool run atop >2 servers than this
exercise in (_very_) likely futility.

pgsql-general by date:

Previous
From: Greg Stark
Date:
Subject: Re: pg_dump and pgpool
Next
From: Secrétariat
Date:
Subject: Update rule