On Wed, 7 Sept 2022 at 13:33, Levi Aul <levi@covalenthq.com> wrote:
> To be clear, this isn't a bug report. There is no bug—everything is working exactly as it should. The partitions are
notbeing pruned because the workload consists of OLAP aggregations that fetch a small number of rows spread across all
partitionsin the set, relying for speed on an index that isn't prefixed with the partitioning key (nor can it be.)
Probably the -hackers mailing list might a better place to discuss
design ideas for new features. -general is more for general help with
using the software, not hacking on it.
The main reason individual partitions need to be locked is because
they can still be referenced by queries directly as if they were just
a normal table. To get around that we'd either need to have the
locking groups, as you describe, or remove the ability to access the
partition directly, not through the top-level partitioned table. The
ship has probably sailed on the latter one, but it probably could be
done as an opt-in feature if the former was too difficult or
impractical.
FWIW, I'm not quite seeing why you need "sealed" partitions for the
group locking idea. I understand the other parts you mentioned about
conversion to a table AM which is more optimized for non-transactional
workloads, but that seems like a different problem that you're mixing
in and adding complexity to the whole thing. If that's true, then it
might be better not to mix that in and confuse / complicate your
explanation of the problem and proposed solution.
I'd suggest posting to -hackers and stating that your queries can't
make use of partition pruning and that currently all partitions are
being locked and you believe that this is a bottleneck. Some examples
of perf output to show how large the locking overhead is. Extra points
for hacking up some crude code so we don't obtain the partition locks
to show what the performance could be if we didn't lock all the
partitions. That'll help show you have a worthy cause, as FWIW, I'm
surprised that executor startup / shutdown for a plan which accesses a
large number of partitions is not drowning out the locking overheads.
As far as I knew, this problem was only visible when run-time
partition pruning removed the large majority of the Append/MergeAppend
subnodes and made executor startup/shutdown significantly faster.
David