Re: Statistics Import and Export - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Statistics Import and Export
Date
Msg-id Z-3x2AnPCP331JA3@nathan
Whole thread Raw
In response to Re: Statistics Import and Export  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Statistics Import and Export
List pgsql-hackers
On Tue, Apr 01, 2025 at 10:44:19PM -0700, Jeff Davis wrote:
> On Tue, 2025-04-01 at 22:21 -0500, Nathan Bossart wrote:
>> We might be able to improve this by inventing a new callback that fails for
>> all formats except for custom with feesko() available.  That would at least
>> ensure hard failures if these assumptions change.  That problably wouldn't
>> be terribly invasive.  I'm curious what you think.
> 
> That sounds fine, I'd say do that if it feels reasonable, and if the
> extra callbacks get too messy, we can just document the assumptions
> instead.

I did write a version with callbacks, but it felt a bit silly because it is
very obviously intended for this one case.  So, I removed them in the
attached patch set.

>> Hm.  One thing we could do is to send the TocEntry to the callback and
>> verify that matches the one we were expecting to see next (as set by a
>> previous call).  Does that sound like a strong enough check?
> 
> Again, I'd just be practical here and do the check if it feels natural,
> and if not, improve the comments so that someone modifying the code
> would know where to look.

Okay, here is an updated patch set.  I did add some verification code,
which ended up being a really good idea because it revealed a couple of
cases we weren't handling:

* Besides custom format calling WriteToc() twice to update the data
  offsets, tar format calls WriteToc() followed by RestoreArchive() to
  write restore.sql.  I couldn't think of a great way to avoid executing
  the queries twice in this case, so I settled on allowing it for only that
  mode.  While we don't expect the second set of queries to result in
  different stats definitions, even if it did, the worst case is that the
  content of restore.sql (which isn't used by pg_restore) would be
  different.  I noticed some past discussion that seems to suggest that
  this format might be a candidate for deprecation [0], so I'm not sure
  it's worth doing anything fancier.

* Our batching code assumes that stats entries are dumped in TOC order,
  which unfortunately wasn't true for formats that use RestoreArchive() for
  dumping.  This is because RestoreArchive() does multiple passes through
  the TOC and selectively dumps certain entries each time.  This is
  particularly troublesome for index stats and a subset of matview stats;
  both are in SECTION_POST_DATA, but matview stats that depend on matview
  data are dumped in RESTORE_PASS_POST_ACL, while all other stats data is
  dumped in RESTORE_PASS_MAIN.  To deal with this, I propose moving all
  stats entries in SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which
  ensures that we always dump stats in TOC order.  One convenient side
  effect of this change is that we can revert a decent chunk of commit
  a0a4601765.  It might be possible to do better via smarter lookahead code
  or a more sophisticated cache, but it's a bit late in the game for that.

[0] https://postgr.es/m/20180727015306.fzlo4inv5i3zqr2c%40alap3.anarazel.de

-- 
nathan

Attachment

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Fix slot synchronization with two_phase decoding enabled
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export