This patch is related to the "Reduce pinning in btree indexes"
patch submitted here:
http://www.postgresql.org/message-id/721615179.3351449.1423959585771.JavaMail.yahoo@mail.yahoo.com
That describes how they evolved and how they relate; I won't
duplicate that here.
Unlike the other patch, this one is more at the "proof of concept"
phase, because it requires support in the heap and each index AM to
work correctly; so far I have only had time to cover the heap and
btree indexes. In spite of that, I have thrown the worst test
cases I could think of at it (and only succeeded in uncovering a
bug which was already out there in production), and it has shown
its value in a two-day test test simulating a 300 user load with
complex real-world applications (although the only indexes it used
were btree indexes). Without the patches the database growth was
39GB per day; with the patches it was 28.5GB per day. (The test
does involve more inserts than deletes, so some growth is
expected.) At the end of the tests, pgstattuple reported eight
times as many dead tuples in the database without the patches.
More importantly, without the patches the CPU load started at 60%
and showed linear growth to 92% over the course of the first day;
with the patches it stayed at a stable 60% throughout the test.
What this patch does is add a GUC call old_snapshot_threshold. It
defaults to -1, which leaves behavior matching unpatched code.
Above that it allows tuples to be vacuumed away after the number of
transaction IDs specified by the GUC have been consumed. It also
saves the current insertion LSN into every snapshot when it is
created. When reading from the heap or any index, if the snapshot
is vulnerable to showing incorrect data because the threshold has
been crossed since it was generated, reading any page with an LSN
past the snapshot LSN causes a "snapshot too old" error to be
thrown. Since this is LSN-based, the new logic is not used for any
relation which is not WAL-logged.
Note that if you don't read data from a page which has been
modified after your snapshot was taken, the threshold doesn't
matter.
All `make installcheck` tests succeed with any setting. With a
setting of 0 (the most extreme), `make installcheck-world` sees
four isolation tests fail. Those all pass if you raise the
setting to 2. The postgres_fdw test needs a setting of 4 to
succeed. I would expect most shops would want to tune this to
something in the six-digit to eight-digit range. In the tests
mentioned above it was set to 150000 (which corresponded to just
under 4 minutes of txid consumption) and there were no "snapshot
too old" errors, even though some cursors were left open for the
entire two-day run.
The patch still lacks (as mentioned above) support for index AMs
other than btree, and lacks documentation for the new GUC. I'm
sure that there are some comments and README files that need
adjustment, too. As I said, this is still POC.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company