Re: SQL/JSON features for v15 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: SQL/JSON features for v15 |
Date | |
Msg-id | 20220815223853.itrlpqgpqheg6r6a@awork3.anarazel.de Whole thread Raw |
In response to | SQL/JSON features for v15 ("Jonathan S. Katz" <jkatz@postgresql.org>) |
Responses |
Re: SQL/JSON features for v15
Re: SQL/JSON features for v15 Re: SQL/JSON features for v15 |
List | pgsql-hackers |
Hi, Continuation from the thread at https://postgr.es/m/20220811171740.m5b4h7x63g4lzgrk%40awork3.anarazel.de On 2022-08-11 10:17:40 -0700, Andres Freund wrote: > On 2022-08-11 13:08:27 -0400, Jonathan S. Katz wrote: > > With RMT hat on, Andres do you have any thoughts on this? > > I think I need to prototype how it'd look like to give a more detailed > answer. I have a bunch of meetings over the next few hours, but after that I > can give it a shot. I started hacking on this Friday. I think there's some relatively easy improvements that make the code substantially more understandable, at least for me, without even addressing the structural stuff. One thing I could use help understanding is the logic behind ExecEvalJsonNeedsSubTransaction() - there's no useful comments, so it's hard to follow. bool ExecEvalJsonNeedsSubTransaction(JsonExpr *jsexpr, struct JsonCoercionsState *coercions) { /* want error to be raised, so clearly no subtrans needed */ if (jsexpr->on_error->btype == JSON_BEHAVIOR_ERROR) return false; if (jsexpr->op == JSON_EXISTS_OP && !jsexpr->result_coercion) return false; if (!coercions) return true; return false; } I guess the !coercions bit is just about the planner, where we want to be pessimistic about when subtransactions are used, for the purpose of parallelism? Because that's the only place that passes in NULL. What really baffles me is that last 'return false' - it seems to indicate that there's no paths during query execution where ExecEvalJsonNeedsSubTransaction() returns true. And indeed, tests pass with an Assert(!needSubtrans) added to ExecEvalJson() (and then unsurprisingly also after removing the ExecEvalJsonExprSubtrans() indirection). What's going on here? We, somewhat confusingly, still rely on subtransactions, heavily so. Responsible for that is this hunk of code: bool throwErrors = jexpr->on_error->btype == JSON_BEHAVIOR_ERROR; [...] cxt.error = throwErrors ? NULL : &error; cxt.coercionInSubtrans = !needSubtrans && !throwErrors; Assert(!needSubtrans || cxt.error); So basically we start a subtransaction inside ExecEvalJsonExpr(), to coerce the result type, whenever !needSubtrans (which is always!), unless ERROR ON ERROR is used. Which then also explains the theory behind the EXISTS_OP check in ExecEvalJsonNeedsSubTransaction(). In that case ExecEvalJsonExpr() returns early, before doing a return value coercion, thus not starting a subtransaction. I don't think it's sane from a performance view to start a subtransaction for every coercion, particularly because most coercion paths will never trigger an error, leaving things like out-of-memory or interrupts aside. And those are re-thrown by ExecEvalJsonExprSubtrans(). A quick and dirty benchmark shows ERROR ON ERROR nearly 2xing speed. I'm worried about the system impact of using subtransactions this heavily, it's not exactly the best performing system - the only reason it's kind of ok here is that it's going to be very rare to allocate a subxid, I think. Next question: /* * We should catch exceptions of category ERRCODE_DATA_EXCEPTION and * execute the corresponding ON ERROR behavior then. */ oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; Assert(error); BeginInternalSubTransaction(NULL); /* Want to execute expressions inside function's memory context */ MemoryContextSwitchTo(oldcontext); PG_TRY(); { res = func(op, econtext, res, resnull, p, error); /* Commit the inner transaction, return to outer xact context */ ReleaseCurrentSubTransaction(); MemoryContextSwitchTo(oldcontext); CurrentResourceOwner = oldowner; } PG_CATCH(); { ErrorData *edata; int ecategory; /* Save error info in oldcontext */ MemoryContextSwitchTo(oldcontext); edata = CopyErrorData(); FlushErrorState(); /* Abort the inner transaction */ RollbackAndReleaseCurrentSubTransaction(); MemoryContextSwitchTo(oldcontext); CurrentResourceOwner = oldowner; Two points: 1) I suspect it's not safe to switch to oldcontext before calling func(). On error we'll have leaked memory into oldcontext and we'll just continue on. It might not be very consequential here, because the calling context presumably isn't very long lived, but that's probably not something we should rely on. Also, are we sure that the context will be in a clean state when it's used within an erroring subtransaction? I think the right thing here would be to stay in the subtransaction context and then copy the datum out to the surrounding context in the success case. 2) If there was an out-of-memory error, it'll have been in oldcontext. So switching back to it before calling CopyErrorData() doesn't seem good - we'll just hit OOM issues again. I realize that both of these issues are present in plenty other code (see e.g. plperl_spi_exec()). So I'm curious why they are ok? Greetings, Andres Freund
pgsql-hackers by date: