Re: Streamify more code paths - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: Streamify more code paths |
| Date | |
| Msg-id | CABPTF7XFkUM9jm3cvcrr7sPAdmCVVW+PP=GfwUt3N90CeX3u1Q@mail.gmail.com Whole thread |
| In response to | Re: Streamify more code paths (Xuneng Zhou <xunengzhou@gmail.com>) |
| List | pgsql-hackers |
On Wed, Mar 11, 2026 at 11:00 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi, > > On Wed, Mar 11, 2026 at 10:23 AM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > Hi, > > > > On Wed, Mar 11, 2026 at 8:16 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2026-03-10 19:27:59 -0400, Andres Freund wrote: > > > > > > pgstattuple_large base= 12429.3ms patch= 11916.8ms 1.04x > > > > > > ( 4.1%) (reads=206945->12983, io_time=6501.91->32.24ms) > > > > > > > > > > > pgstattuple_large base= 12642.9ms patch= 11873.5ms 1.06x > > > > > > ( 6.1%) (reads=206945->12983, io_time=6516.70->143.46ms) > > > > > > > > > > Yeah, this looks somewhat strange. The io_time has been reduced > > > > > significantly, which should also lead to a substantial reduction in > > > > > runtime. > > > > > > > > It's possible that the bottleneck just moved, e.g to the checksum computation, > > > > if you have data checksums enabled. > > > > > > > > It's also worth noting that likely each of the test reps measures > > > > something different, as likely > > > > psql_run "$ROOT" "$PORT" -c "UPDATE heap_test SET data = data || '!' WHERE id % 5 = 0;" > > > > > > > > leads to some out-of-page updates. > > > > > > > > You're probably better off deleting some of the data in a transaction that is > > > > then rolled back. That will also unset all-visible, but won't otherwise change > > > > the layout, no matter how many test iterations you run. > > > > > > > > > > > > I'd also guess that you're seeing a relatively small win because you're > > > > updating every page. When reading every page from disk, the OS can do > > > > efficient readahead. If there are only occasional misses, that does not work. > > > > > > I think that last one is a big part - if I use > > > BEGIN; DELETE FROM heap_test WHERE id % 500 = 0; ROLLBACK; > > > (which leaves a lot of > > > > > > I see much bigger wins due to the pgstattuple changes. > > > > > > time buffered time DIO > > > w/o read stream 2222.078 ms 2090.239 ms > > > w read stream 299.455 ms 155.124 ms > > > > > > That's with local storage. io_uring, but numbers with worker are similar. > > > > > > > The results look great and interesting. This looks far better than > > what I observed in my earlier tests. I’ll run perf for pgstattuple > > without the switching to see what is keeping the CPU busy. > > > > -- > > Best, > > Xuneng > > io_uring > pgstattuple_large base= 1090.6ms patch= 143.3ms 7.61x > ( 86.9%) (reads=20049→20049, io_time=1040.80→46.91ms) > > I observed a similar magnitude of runtime reduction after switching to > pg_buffercache_evict_relation() and using BEGIN; DELETE FROM heap_test > WHERE id % 500 = 0; ROLLBACK. However, I lost the original flame > graphs after running many performance tests. I will regenerate them > and post them later. In the original setup, UPDATE ... WHERE id % 5 = 0 touches a large fraction of heap pages. On touched pages, we clears PD_ALL_VISIBLE and the corresponding visibility-map bit, and UPDATE also creates new tuple versions, adding tuple-chain/page churn. Since pgstattuple_approx skips only pages still marked all-visible, it ends up reading most heap pages. With shared-buffer eviction and OS page-cache drop before timing, many of those reads are cold misses, so runtime is spent a lot in the buffer-miss path as seen from the original flamegraph. The rollback-delete setup changes this in two ways. First, BEGIN; DELETE ... WHERE id % 500 = 0; ROLLBACK; still clears all-visible state on touched pages, but does not leave persistent successor tuple versions the way UPDATE does. Pages are still modified (tuple headers/page flags), but physical churn is lower. Second, the predicate is much sparser, so fewer pages lose all-visible status, and pgstattuple_approx reads fewer heap pages. A warmup pass further stabilizes results by setting hint bits for aborted xmax once up front, so later repetitions avoid repeating part of transaction-status resolution during visibility checks. With fewer non-all-visible pages, fewer cold misses, less eviction/victim churn, we can observe more pronounced speed-up in the improved version of pgstattuple test. -- Best, Xuneng
Attachment
pgsql-hackers by date: