Record a Bitmapset of non-pruned partitions - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Record a Bitmapset of non-pruned partitions |
Date | |
Msg-id | CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com Whole thread Raw |
Responses |
Re: Record a Bitmapset of non-pruned partitions
|
List | pgsql-hackers |
Back when working on 959d00e9d, to allow ordered partition scans for LIST and RANGE partitioned tables, I mentioned [1] that if we had a field that recorded a Bitmapset of the non-pruned partitions, we could use that to do a more thorough check to see if ordered scans are possible. At the moment these ordered scans are possible if the order by matches the partition key and, for LIST partitioned tables we must also ensure that we don't have something like partition1 FOR VALUES IN(1,3) and partition2 FOR VALUES(2,4). Here we can't scan partition1 first before we'd get 3s before the 2s that are in partition2. Since we don't record the list of partitions that survived pruning, I just made that check to ensure all partitions only allow a single value to be stored. That means the optimisation is not done in cases where it's possible to do it. To make this work, as I mentioned in [1], we really need a live_parts field to track which partitions survived pruning. If we have that then we can ensure that just those partitions have no instances where a lower value could appear in a later partition. In the attached patch, I've only added the live_parts field and made use of it in a few locations where the knowledge is useful as an optimisation. Right now apply_scanjoin_target_to_paths() appears in profiles when planning a query to a partitioned table with many partitions and just 1 or a few survive pruning. The reason for this is simple; looping over a large almost entirely empty array is just slow and using the Bitmapset as a sort of index into the interesting elements of that array speeds it up, quite a bit. Since we've already put quite a bit of effort into making that fast, then I think it might be worth adding this field even if it was just for that purpose. With 10k empty hash partitions and 1 random one surviving partition pruning, I see: Master: 18.13% postgres postgres [.] apply_scanjoin_target_to_paths 3.66% postgres postgres [.] AllocSetAlloc 2.05% postgres postgres [.] hash_search_with_hash_value 1.95% postgres libc-2.33.so [.] __memset_avx2_unaligned_erms 1.88% postgres postgres [.] SearchCatCacheInternal 1.55% postgres postgres [.] base_yyparse 0.74% postgres postgres [.] get_relation_info 0.68% postgres [kernel.kallsyms] [k] __d_lookup_rcu 0.61% postgres postgres [.] palloc Patched: 3.72% postgres postgres [.] AllocSetAlloc 2.30% postgres postgres [.] hash_search_with_hash_value 2.22% postgres postgres [.] SearchCatCacheInternal 2.02% postgres libc-2.33.so [.] __memset_avx2_unaligned_erms 1.88% postgres postgres [.] base_yyparse 1.08% postgres postgres [.] palloc 0.92% postgres [kernel.kallsyms] [k] __d_lookup_rcu $ cat setup.sql create table hp (a int primary key, b int not null) partition by hash(a); select 'create table hp'||x|| ' partition of hp for values with (modulus 10000, remainder '||x||');' from generate_series(0,9999) x; \gexec $ cat select.sql \set p random(1,2000000) select * from hp where a = :p master $ pgbench -n -f select.sql -c 1 -j 1 -T 60 postgres tps = 2608.704218 (without initial connection time) tps = 2607.641247 (without initial connection time) tps = 2583.017011 (without initial connection time) patched $ pgbench -n -f select.sql -c 1 -j 1 -T 60 postgres tps = 2715.993495 (without initial connection time) tps = 2701.527640 (without initial connection time) tps = 2707.343009 (without initial connection time) Does anyone have any thoughts about if this is worth a new field in RelOptInfo? David [1] https://www.postgresql.org/message-id/CAKJS1f9W7sg1sb3SXiTQUovs%3DwDMrHATXv68F5dSbe5fuHH%2BiQ%40mail.gmail.com
Attachment
pgsql-hackers by date: