Re: logical changeset generation v5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: logical changeset generation v5 |
Date | |
Msg-id | 20130903231046.GG7018@awork2.anarazel.de Whole thread Raw |
In response to | Re: logical changeset generation v5 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: logical changeset generation v5
|
List | pgsql-hackers |
On 2013-09-03 15:56:15 -0400, Robert Haas wrote: > On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> And why is every 15 seconds good enough? > > > > Waiting 15s to become consistent instead of checkpoint_timeout seems to > > be ok to me and to be a good tradeoff between overhead and waiting. We > > can certainly discuss other values or making it configurable. The latter > > seemed to be unnecessary to me, but I have don't have a problem > > implementing it. I just don't want to document it :P > > I don't think it particularly needs to be configurable, but I wonder > if we can't be a bit smarter about when we do it. For example, > suppose we logged it every 15 s but only until we log a non-overflowed > snapshot. There's actually more benefits than just overflowed snapshots (pruning of the known xids machinery, exclusive lock cleanup). > I realize that the overhead of a WAL record every 15 > seconds is fairly small, but the load on some systems is all but > nonexistent. It would be nice not to wake up the HD unnecessarily. The patch as-is only writes if there has been WAL written since the last time it logged a running_xacts. I think it's not worth building more smarts than that? > >> The WAL writer is supposed to call XLogBackgroundFlush() every time > >> WalWriterDelay expires. Yeah, it can hibernate, but if it's > >> hibernating, then we should respect that decision for this WAL record > >> type also. > > > > Why should we respect it? > > Because I don't see any reason to believe that this WAL record is any > more important or urgent than any other WAL record that might get > logged. I can't follow the logic behind that statement. Just about all WAL records are either pretty immediately flushed afterwards or are done in the context of a transaction where we flush (or do a XLogSetAsyncXactLSN) at transaction commit. XLogBackgroundFlush() won't necessarily flush the running_xacts record. Unless you've set the async xact lsn it will only flush complete blocks. So what can happen (I've seen that more than once in testing, took me a while to debug) that a checkpoint is started in a busy period but nothing happens after it finished. Since the checkpoint triggered running_xact is triggered *before* we do the smgr flush it still is overflowed. Then, after activity has died down, the bgwriter issues the running xact record, but it's filling a block and thus it never get's flushed. To me the alternatives are to do a XLogSetAsyncXactLSN() or an XLogFlush(). The latter is more aggressive and can block for a measurable amount of time, which is why I don't want to do it in the bgwriter. > >> >> I understand why logical replication needs to connect to a database, > >> >> but I don't understand why any other walsender would need to connect > >> >> to a database. > >> > > >> > Well, logical replication actually streams out data using the walsender, > >> > so that's the reason why I want to add it there. But there have been > >> > cases in the past where we wanted to do stuff in the walsender that need > >> > database access, but we couldn't do so because you cannot connect to > >> > one. > > > >> Could you be more specific? > > > > I only remember 3959.1349384333@sss.pgh.pa.us but I think it has come up > > before. > > It seems we need some more design there. Perhaps entering replication > mode could be triggered by writing either dbname=replication or > replication=yes. But then, do the replication commands simply become > SQL commands? I've certainly seen hackers use them that way. And I > can imagine that being a sensible approach, but this patch seems like > it's only covering a fairly small fraction of what really ought to be > a single commit. Yes. I think you're right that we need more input/design here. I've previously started threads about it, but nobody replied :(. The problem with using dbname=replication as a trigger for anything is that we actually allow databases to be created with that name. Perhaps that was a design mistake. I wondered about turning replication from a boolean into something like off|0, on|1, database. dbname= gets only used in the latter variant. That would be compatible with previous versions and would even support using old tools (since all of them seem to do replication=1). > But then, do the replication commands simply become > SQL commands? I've certainly seen hackers use them that way. I don't think that it's a good way at this point to make them to plain SQL. There is more infrastructure (signal handlers, permissions, different timeouts) & memory required for walsenders, so using plain SQL there seems beyond the scope of this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: