Re: Proposal: Progressive explain - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Proposal: Progressive explain |
Date | |
Msg-id | CA+TgmoZw+xy3WyrPD4X5ChP+=kJ4ncRZY8xQu0XfChDa4fa7yg@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Progressive explain (Rafael Thofehrn Castro <rafaelthca@gmail.com>) |
Responses |
Re: Proposal: Progressive explain
|
List | pgsql-hackers |
On Wed, Mar 19, 2025 at 6:53 PM Rafael Thofehrn Castro <rafaelthca@gmail.com> wrote: > The strategy I used here is to use a MemoryContextCallback > (ProgressiveExplainReleaseFunc), configured in the memory context > where the query is being executed, being responsible for calling > ProgressiveExplainCleanup() if the query doesn't end gracefully. Thanks for the pointer. I'm a bit skeptical about what's going on here in ProgressiveExplainReleaseFunc(). It seems like we're depending on shared memory to tell us whether we need to do purely backend-local cleanup, like calling disable_timeout() and resetting ProgressiveExplainPending and activeQueryDesc. I would have expected us to keep track in local memory of whether this kind of work needs to be done. It seems roundabout to take an LWLock, do a hash table lookup to see if there's an entry there, release the LWLock, and then very shortly thereafter take the lock a second time to release the entry that we now know is there. The comment in ProgressiveExplainCleanup about only detaching from the DSA if the query ended gracefully is not ideal from my point of view because it says what the code does instead of why the code does that thing. Also, the function is seemingly called with queryDesc as an argument not because you need it for anything but because you're going to test whether it's null. In that case, you could just pass a Boolean. Even then, something seems odd about this: why do we have to be holding ProgressiveExplainHashLock to dsa_detach the somewhat-inscrutably named area pe_a? And why are we not detaching it in case of error? I am wondering why you chose this relatively unusual error cleanup strategy. What I would have done is invent AtEOXact_ProgressiveExplain and AtSubEOXact_ProgressiveExplain. In some sense this looks simpler, because it doesn't need separate handling for transactions and subtransactions, but it's so unlike what we do in most places that it's hard for me to tell whether it's correct. I feel like the approach you've chosen here would make sense if what we wanted to do was basically release memory or some memory-like resource associated closely with the context -- e.g. expandedrecord.c releases a TupleDesc, but this is doing more than that. I think the effect of this choice is that cleanup of the progressive-EXPLAIN stuff happens much later than it normally would. Most of the time, in the case of an abort, we would want AbortTransaction() to release as many resources as possible, leaving basically no work to do at CleanupTransaction() time. This is so that if a user types "SELECT 1/0;" we release resources, as far as possible, right away, and don't wait for them to subsequently type "ROLLBACK;". The transaction lives on only as a shell. But these resources, if I'm reading this right, are going to stick around until the transaction is actually rolled back, because memory is not freed until CleanupTransaction() time. I wonder what happens if a query inside of an explicit transaction aborts after putting an entry in the progressive-explain view. My guess is that the entry will stick around until the actual rollback happens. In fact, now that I think about it, this is probably why we don't dsa_detach() in ProgressiveExplainCleanup() in cases of error -- the resource owner cleanup will have already released the DSA segments long before the memory context is deleted. I'm sort of inclined to say that this should be rewritten to do error cleanup in a more normal way. It's probably more code to do it that way, but I think having it be more similar to other subsystems is probably worth quite a bit. > > It seems like when we replace a longer entry with a shorter one, we > > forget that it was originally longer. Hence, if the length of a > > progressive EXPLAIN is alternately 2922 characters and 2923 > > characters, we'll reallocate on every other progressive EXPLAIN > > instead of at most once. > > Are you talking about re-printing the plan in the same query execution? Yes. > The logic for the code, using your example, would be to allocate 2922 + > PROGRESSIVE_EXPLAIN_FREE_SIZE (4096, currently) initially. If next plans > alternate between 2922 and 2923 no additional allocation will be done. > A reallocation will be needed only if the plan length ends up exceeding > 2922+4096. At the end of query execution (or cancellation) that DSA will > be freed and a next query execution will have to allocate again using the > same logic. It seems to me that if ProgressiveExplainPrint() reaches /* Update shared memory with new data */ without reallocating, strlen(pe_data->plan) can be reduced. On the next trip through the function, we don't know whether the string we're seeing is the original string -- for which strlen()+PROGRESSIVE_EXPLAIN_FREE_SIZE) gives us the original allocation size -- or whether the string we're seeing is a shorter one that was copied over the original, longer string. PROGRESSIVE_EXPLAIN_FREE_SIZE is big enough that this probably isn't much of a problem in practice, because consecutive EXPLAIN outputs for the same query probably won't vary in size by enough to cause any reallocation. Looking at this more carefully, I think that the query plan would have to shrink in size by >4kB and then expand again in order to trigger reallocation, which seems unlikely. But it still seems unclean to me. Normally, we track how much space we have actually allocated explicitly, instead of reconstructing that number from something else, especially something that isn't guaranteed to produce an accurate result. I think you should just store the number of available payload bytes at the beginning of the chunk and then reallocate if it isn't big enough to hold the payload that you have got. > Regarding the execProcNode wrapper strategy. It used it precisely because > of the discussion in that other patch. I actually tried not using it here, > and call ProgressiveExplainPrint() in the timeout callback. This resulted > in sporadic crashes, confirming the suspicion that it is not a good > idea. Makes sense, but we need adequate documentation of what we did and why it works (or at least why we think it works). Another thing I just noticed is that pg_stat_progress_explain() uses beentry->st_procpid == ped->pid as the permissions check, but a more typical test is HAS_PGSTAT_PERMISSIONS(beentry->st_userid). I know that's only in pgstatfuncs.c, but I think it would be OK to duplicate the associated test here (i.e. has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role)). I don't really see a reason why this shouldn't use the same permission rules as other pg_stat_ things, in particular pg_stat_get_activity(). -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: