On 2017/08/08 4:34, Robert Haas wrote:
> On Mon, Aug 7, 2017 at 2:54 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> As long as find_all_inheritors() is a place only to determine the order in
>> which partitions will be locked, it's fine. My concern is about the time
>> of actual locking, which in the current planner implementation is too soon
>> that we end up needlessly locking all the partitions.
>
> I don't think avoiding that problem is going to be easy. We need a
> bunch of per-relation information, like the size of each relation, and
> what indexes it has, and how big they are, and the statistics for each
> one. It was at one point proposed by someone that every partition
> should be required to have the same indexes, but (1) we didn't
> implement it like that and (2) if we had done that it wouldn't solve
> this problem anyway because the sizes are still going to vary.
Sorry, I didn't mean to say we shouldn't lock and open partitions at all.
We do need their relation descriptors for planning and there is no doubt
about that. I was just saying that we should do that only for the
partitions that are not pruned. But, as you say, I can see that the
planner changes required to be able to do that might be hard.
>> The locking-partitions-too-soon issue, I think, is an important one and
>> ISTM, we'd want to lock the partitions after we've determined the specific
>> ones a query needs to scan using the information returned by
>> RelationGetPartitionDispatchInfo. That means the latter had better locked
>> the relations whose cached partition descriptors will be used to determine
>> the result that it produces. One way to do that might be to lock all the
>> tables in the list returned by find_all_inheritors that are partitioned
>> tables before calling RelationGetPartitionDispatchInfo. It seems what the
>> approach you've outlined below will let us do that.
>
> Yeah, I think so. I think we could possibly open and lock partitioned
> children only, then prune away leaf partitions that we can determine
> aren't needed, then open and lock the leaf partitions that are needed.
Yes.
>> BTW, IIUC, there will be two lists of OIDs we'll have: one in the
>> find_all_inheritors order, say, L1 and the other determined by using
>> partitioning-specific information for the given query, say L2.
>>
>> To lock, we iterate L1 and if a given member is in L2, we lock it. It
>> might be possible to make it as cheap as O(nlogn).
>
> Commonly, we'll prune no partitions or all but one; and we should be
> able to make those cases very fast.
Agreed.
>> Maybe, we can make the initial patch use syscache to get the relkind for a
>> given child. If the syscache bloat is unbearable, we go with the
>> denormalization approach.
>
> Yeah. Maybe if you write that patch, you can also test it to see how
> bad the bloat is.
I will try and see, but maybe the syscache solution doesn't get us past
the proof-of-concept stage.
Thanks,
Amit