Thread: Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
From
"Kevin Grittner"
Date:
Tom Lane wrote: > Simon Riggs writes: >> Translating WAL is a very hard task. > > No kidding. I would think it's impossible on its face. Just for > starters, where will you get table and column names from? (Looking > at the system catalogs is cheating, and will not work reliably > anyway.) Well, the xReader approach (Aakash's GSoC project) is to serve as a proxy for a WAL stream going to a hot standby, to interpret each incoming WAL record by "cheating" and querying the HS before passing the WAL along, and then using sync rep to the HS regardless of whether the incoming stream is sync or async. What reliability problem do you see with this approach? -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane wrote: >> Simon Riggs writes: >>> Translating WAL is a very hard task. >> No kidding. I would think it's impossible on its face. > Well, the xReader approach (Aakash's GSoC project) is to serve as a > proxy for a WAL stream going to a hot standby, to interpret each > incoming WAL record by "cheating" and querying the HS before passing > the WAL along, and then using sync rep to the HS regardless of > whether the incoming stream is sync or async. What reliability > problem do you see with this approach? Well, first off, it requires a replica that *is* a physical replica (else it won't have the same table OIDs, for instance). I'm not sure I get the point of logical replication that requires a physical replica as a prerequisite. Next, it breaks immediately in the face of DDL: CREATE TABLE foo (f1 int);BEGIN;ALTER TABLE foo ALTER COLUMN f1 RENAME TO f2;INSERT INTO foo (f2) VALUES (1);COMMIT; The standby is not going to think that the ALTER is committed, so it will not report the right column name when it comes time to translate the INSERT. Actually, you can break it even more easily than that: CREATE TABLE bar AS SELECT ... What will you do with the insertions executed by this CREATE? They have to be executed before the creation of table bar is committed. Also, you'd need a query connection per database (at least, maybe one per session if you were trying to track DDL effects), which seems rather a lot of load on the HS slave. That together with the requirement for synchronous operation seems absolutely catastrophic from a performance standpoint. If you want to spend the summer building a toy, fine, but I don't see this going anywhere for production purposes. It would be enormously more performant for the master to be emitting logical replication records to start with, since it already has all the right names etc at hand at basically no cost. regards, tom lane
<div class="gmail_extra">Hello Tom,</div><div class="gmail_extra"><br /></div><span style="style">> I'm not sure I</span><brstyle="style" /><span style="style">> get the point of logical replication that requires a physical replicaas</span><br style="style" /><div class="gmail_extra"><span style="style">> a prerequisite.</span></div><div class="gmail_extra"><br/></div><div class="gmail_extra">> It would be enormously<br />> more performant for the masterto be emitting logical replication<br /> > records to start with, since it already has all the right names etc<br/>> at hand at basically no cost.<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">I wantto emphasize that any system which uses the logical reader setup, including a replication system, tends to be very computationallyintensive on the database which it queries all so often. In most of the environments, the source databaseis the main database, and as such, any performance degradation on this database is very bad.</div><div class="gmail_extra"><br/></div><div class="gmail_extra">On the other hand, if we offload almost all the work to the physicalreplica, our source database, which is the main database, still functions at the same throughput.</div><div class="gmail_extra"><br/></div><div class="gmail_extra">Thus, at the cost of having the replication system as a whole runa little slower than it could if it were using the main database, we have made sure that our performance critical mainsource database is not affected in performance at all.</div>
On Sat, Apr 28, 2012 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Tom Lane wrote: >>> Simon Riggs writes: >>>> Translating WAL is a very hard task. >>> No kidding. I would think it's impossible on its face. > >> Well, the xReader approach (Aakash's GSoC project) is to serve as a >> proxy for a WAL stream going to a hot standby, to interpret each >> incoming WAL record by "cheating" and querying the HS before passing >> the WAL along, and then using sync rep to the HS regardless of >> whether the incoming stream is sync or async. What reliability >> problem do you see with this approach? > > Well, first off, it requires a replica that *is* a physical replica > (else it won't have the same table OIDs, for instance). It's even harder than that because the physical replica needs to be able to be shutdown/restarted. We currently restart WAL from last checkpoint, but if changes have already been made and committed then the catalog will be incorrect when we replay the WAL again. So, I think saying very hard is justified. We basically have a choice of shipping full metadata with every change or finding some way to avoid doing that. Investigating the latter seems like a worthwhile task but there are limits, as you say. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services