MVCC catalog access - Mailing list pgsql-hackers

From Robert Haas
Subject MVCC catalog access
Date
Msg-id CA+TgmoaShP2ytBXC3J10dvHzgi8tiL33TxCsAXbJtqY7z9SFQw@mail.gmail.com
Whole thread Raw
Responses Re: MVCC catalog access  (Robert Haas <robertmhaas@gmail.com>)
Re: MVCC catalog access  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
We've had a number of discussions about the evils of SnapshotNow.  As
far as I can tell, nobody likes it and everybody wants it gone, but
there is concern about the performance impact.  I decided to do some
testing to measure the impact.  I was pleasantly surprised by the
results.

The attached patch is a quick hack to provide for MVCC catalog access.
 It adds a GUC called "mvcc_catalog_access".  When this GUC is set to
true, and heap_beginscan() or index_beginscan() is called with
SnapshotNow, they call GetLatestSnapshot() and use the resulting
snapshot in lieu of SnapshotNow.  As a debugging double-check, I
modified HeapTupleSatisfiesNow to elog(FATAL) if called with
mvcc_catalog_access is true.  Of course, both of these are dirty
hacks.  If we were actually to implement MVCC catalog access, I think
we'd probably just go through and start replacing instances of
SnapshotNow with GetLatestSnapshot(), but I wanted to make it easy to
do perf testing.

When I first made this change, I couldn't detect any real change;
indeed, it seemed that make check was running ever-so-slightly faster
than before, although that may well have been a testing artifact.  I
wrote a test case that created a schema with 100,000 functions in it
and then dropped the schema (I believe it was Tom who previously
suggested this test case as a worst-case scenario for MVCC catalog
access).  That didn't seem to be adversely affected either, even
though it must take ~700k additional MVCC snapshots with
mvcc_catalog_access = true.

MVCC Off: Create 8743.101 ms, Drop 9655.471 ms
MVCC On: Create 7462.882 ms, Drop 9515.537 ms
MVCC Off: Create 7519.160 ms, Drop 9380.905 ms
MVCC On: Create 7517.382 ms, Drop 9394.857 ms

The first "Create" seems to be artificially slow because of some kind
of backend startup overhead.  Not sure exactly what.

After wracking my brain for a few minutes, I realized that the lack of
any apparent performance regression was probably due to the lack of
any concurrent connections, making the scans of the PGXACT array very
cheap.  So I wrote a little program to open a bunch of extra
connections.  My MacBook Pro grumbled when I tried to open more than
about 600, so I had to settle for that number.  That was enough to
show up the cost of all those extra snapshots:

MVCC Off: Create 9065.887 ms, Drop 9599.494 ms
MVCC On: Create 8384.065 ms, Drop 10532.909 ms
MVCC Off: Create 7632.197 ms, Drop 9499.502 ms
MVCC On: Create 8215.443 ms, Drop 10033.499 ms

Now, I don't know about you, but I'm having a hard time getting
agitated about those numbers.  Most people are not going to drop
100,000 objects with a single cascaded drop.  And most people are not
going to have 600 connections open when they do.  (The snapshot
overhead should be roughly proportional to the product of the number
of drops and the number of open connections, and the number of cases
where the product is as high as 60 million has got to be pretty
small.)  But suppose that someone is in that situation.  Well, then
they will take a... 10% performance penalty?  That sounds plenty
tolerable to me, if it means we can start moving in the direction of
allowing some concurrent DDL.

Am I missing an important test case here?  Are these results worse
than I think they are?  Did I boot this testing somehow?

[MVCC catalog access patch, test program to create lots of idle
connections, and pg_depend stress test case attached.]

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: plperl segfault in plperl_trusted_init() on kfreebsd
Next
From: Simon Riggs
Date:
Subject: Re: [PATCH]Tablesample Submission