On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra <tomas@vondra.me> wrote:
> I can do some tests with forward vs. backwards scans. Of course, the
> trouble with finding these weird cases is that they may be fairly rare.
> So hitting them is a matter or luck or just happening to generate the
> right data / query. But I'll give it a try and we'll see.
I was talking more about finding "performance bugs" through a
semi-directed process of trying random things while looking out for
discrepancies. Something like that shouldn't require the usual
"benchmarking rigor", since suspicious inconsistencies should be
fairly obvious once encountered. I expect similar queries to have
similar performance, regardless of superficial differences such as
scan direction, DESC vs ASC column order, etc.
I tested this issue again (using my original pgbench_account query),
having rebased on top of HEAD as of today. I found that the
inconsistency seems to be much smaller now -- so much so that I don't
think that the remaining inconsistency is particularly suspicious.
I also think that performance might have improved across the board. I
see that the same TPC-C query that took 768.454 ms a few weeks back
now takes only 617.408 ms. Also, while I originally saw "I/O Timings:
shared read=138.856" with this query, I now see "I/O Timings: shared
read=46.745". That feels like a performance bug fix to me.
I wonder if today's commit b4212231 from Thomas ("Fix rare bug in
read_stream.c's split IO handling") fixed the issue, without anyone
realizing that the bug in question could manifest like this.
--
Peter Geoghegan