Thread: RelOptInfo -> Relation
Is there some good reason why get_relation_info() copies all of the data that will be needed later in planning into the RelOptInfo instead of just storing a pointer to the Relation directly into the RelOptInfo? Of course, if we did that, then it would have to leave the relation open instead of closing it, and similarly for the indexes, but so what? As far as I understand it, we're keeping a lock on the relation, so none of this stuff should be able to change under it, and pointing to data is faster than copying it. I think that this problem has gradually gotten worse as we've added features to the system. Between 7.4 and master, we've added the following stuff to what this function needs to propagate into the RelOptInfo: tablespace, attr_needed, attr_width, allvisfrac, rel_parallel_workers, statlist, serverid, fdwroutine, fkey_list, part_scheme, boundinfo, nparts, partexprs. That's a fair amount of stuff, and IndexOptInfo includes more things now, too. The whole thing seems pretty inefficient. get_relation_statistics() stores a list of statistics object OIDs in the relcache and then, in get_relation_statistics(), builds a StaExtInfo for each one. That, however, means that the StaExtInfo is getting rebuilt for every query. If the RelOptInfo could point directly into the relcache, then we could build that stuff once when the relcache entry was created, or maybe on demand, and then keep it forever. But if we did that today it wouldn't work too well: we'd still have to copy the result from the relcache into the RelOptInfo. The just-copy-it approach is what set_relation_partition_info does for rel->boundinfo, and that similarly seems like a waste of cycles. I have a feeling we're all cargo-culting each new feature into a system that we may have outgrown, but maybe there's some reason behind this that I'm missing. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Is there some good reason why get_relation_info() copies all of the > data that will be needed later in planning into the RelOptInfo instead > of just storing a pointer to the Relation directly into the > RelOptInfo? Of course, if we did that, then it would have to leave > the relation open instead of closing it, and similarly for the > indexes, but so what? As far as I understand it, we're keeping a lock > on the relation, so none of this stuff should be able to change under > it, and pointing to data is faster than copying it. We are not keeping a lock strong enough to prevent the relcache entry from changing altogether, e.g. an ANALYZE could commit midway through and change, say, the relpages/reltuples info. The consequences of that for consistent planner behavior are not clear to me, but they're probably not something to dismiss out of hand. Also, there's a big difference between the entry's contents being logically unchanging and being physically unchanging: we could get a relcache inval event that forces a rebuild, and only very limited parts of the entry are guaranteed not to move around if that happens. There's a general rule that code outside the relcache shouldn't keep pointers to most parts of a relcache entry's infrastructure, and for those parts where we do provide a guarantee (like the tupdesc), it doesn't come for free. Maybe the planner's uses of the info wouldn't be any more dangerous than anything else's, but again it's not clear. I'm disinclined to monkey with the way this works without someone presenting hard evidence that it creates enough of a performance problem to be worth spending a significant amount of time changing it. Up to now I don't think I've ever noticed plancat.c being a large fraction of planner runtime, though of course that might depend on the test case. regards, tom lane
On Fri, Feb 2, 2018 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > We are not keeping a lock strong enough to prevent the relcache entry > from changing altogether, e.g. an ANALYZE could commit midway through > and change, say, the relpages/reltuples info. The consequences of > that for consistent planner behavior are not clear to me, but they're > probably not something to dismiss out of hand. Right -- stuff that could actually change during planning should probably be copied, but things like the partition key and partition bounds are annoyingly large to copy and cannot change during planning. > Also, there's a big difference between the entry's contents being > logically unchanging and being physically unchanging: we could get a > relcache inval event that forces a rebuild, and only very limited parts of > the entry are guaranteed not to move around if that happens. There's a > general rule that code outside the relcache shouldn't keep pointers to > most parts of a relcache entry's infrastructure, and for those parts where > we do provide a guarantee (like the tupdesc), it doesn't come for free. > Maybe the planner's uses of the info wouldn't be any more dangerous than > anything else's, but again it's not clear. A good number of bits of the relcache entry defend against this sort of hazard by building a new entry, comparing it to the old one, and swapping in the new one only if there is a difference. This technique is used for the tupledesc, for rules, for policies, and for the partition key and descriptor, and I suspect it could be profitably used for other things that we don't currently bother to cache in the relcache because each query would have to copy/rebuild them anyway. > I'm disinclined to monkey with the way this works without someone > presenting hard evidence that it creates enough of a performance problem > to be worth spending a significant amount of time changing it. Up to > now I don't think I've ever noticed plancat.c being a large fraction > of planner runtime, though of course that might depend on the test case. Well, I got concerned about this looking at Amit's patch for faster partition pruning. It tried to walk around this problem by reopening the relation elsewhere in the planner, but I didn't think you'd like that approach, either. (I don't.) We can certainly handle that problem in the same way we've handled previous ones - viz. just copy more stuff - but at some point that's going to start costing enough to matter. And I'm quite sure that the only reason we have to do that copying is that we close the relation, because we've already done the work to keep the relevant structures stable as long as the relation is open and locked. The executor is already relying on that guarantee. If we're going to have to change this at some point (and I bet we are), I'd rather do it before people jam even more stuff into the current system rather than wait until it gets totally out of hand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Feb 2, 2018 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm disinclined to monkey with the way this works without someone >> presenting hard evidence that it creates enough of a performance problem >> to be worth spending a significant amount of time changing it. Up to >> now I don't think I've ever noticed plancat.c being a large fraction >> of planner runtime, though of course that might depend on the test case. > If we're going to have to change this at some point (and I bet we > are), I'd rather do it before people jam even more stuff into the > current system rather than wait until it gets totally out of hand. While I'm prepared to believe that this *could* be a problem, I repeat that you've offered no hard evidence that it actually *is* a problem, or might become one in the future. We could easily expend a significant amount of effort here for no real return. regards, tom lane
On Tue, Feb 6, 2018 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If we're going to have to change this at some point (and I bet we >> are), I'd rather do it before people jam even more stuff into the >> current system rather than wait until it gets totally out of hand. > > While I'm prepared to believe that this *could* be a problem, I repeat > that you've offered no hard evidence that it actually *is* a problem, > or might become one in the future. We could easily expend a significant > amount of effort here for no real return. Sure. The point of the email was to ask whether there was some consideration of which I had not thought which makes this utterly unthinkable; the answer at least so far seems to be "no". I think it will be interesting to see what happens with the faster partition pruning patch. If it doesn't run into a problem with this, then there's no urgent need to do anything here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company