Thread: [PATCH] Enable min/max optimization for bool_and/bool_or/every
Hi list, As discussed on the pgsql-general list, the bool_and() and bool_or() aggregate functions behave exactly like min() and max() would over booleans. While it's not likely that people would have an appropriate index on a boolean column, it seems it wouldn't cost us anything to take advantage of this optimization, as it requires no code changes at all, simply value changes in the pg_aggregate catalog. Before: db=# explain analyze select bool_and(b) from bools;Aggregate (cost=1693.01..1693.02 rows=1 width=1) -> Seq Scan on bools (cost=0.00..1443.01 rows=100001 width=1)Total runtime: 29.736 ms After: db=# explain analyze select bool_and(b) from bools;Result (cost=0.03..0.04 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.00..0.03 rows=1 width=1) -> Index Scan using bools_b_idx on bools (cost=0.00..3300.28 rows=100001 width=1) Index Cond: (b IS NOT NULL)Total runtime: 0.109 ms Original discussion here: http://archives.postgresql.org/message-id/CABRT9RAGwQEP+EFhVpZ6=B4cJEcUE2-QCpb_ZdrNPgQNa8xKuA@mail.gmail.com PS: It seems that the min/max optimization isn't documented in the manual (apart from release notes), so I didn't include any doc changes in this patch. Regards, Marti
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote: > PS: It seems that the min/max optimization isn't documented in the > manual (apart from release notes), so I didn't include any doc changes > in this patch. I don't see a patch attached to this email, so either you forgot to attach it, or the list ate it somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote: >> PS: It seems that the min/max optimization isn't documented in the >> manual (apart from release notes), so I didn't include any doc changes >> in this patch. > > I don't see a patch attached to this email, so either you forgot to > attach it, or the list ate it somehow. I forgot to attach it, sorry. Here it is. Regards, Marti
Attachment
On Thu, Dec 22, 2011 at 11:52 AM, Marti Raudsepp <marti@juffo.org> wrote: > On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote: >>> PS: It seems that the min/max optimization isn't documented in the >>> manual (apart from release notes), so I didn't include any doc changes >>> in this patch. >> >> I don't see a patch attached to this email, so either you forgot to >> attach it, or the list ate it somehow. > > I forgot to attach it, sorry. Here it is. Nice. It doesn't get much simpler than that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Marti Raudsepp <marti@juffo.org> writes: > On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote: >>> PS: It seems that the min/max optimization isn't documented in the >>> manual (apart from release notes), so I didn't include any doc changes >>> in this patch. >> I don't see a patch attached to this email, so either you forgot to >> attach it, or the list ate it somehow. > I forgot to attach it, sorry. Here it is. I applied this patch, since I was busy applying catalog changes from you anyway ;-). I did think of a possible reason to reject the patch: with this change, the planner will take longer to plan queries involving bool_and() et al, since planagg.c will spend time looking (usually fruitlessly) for an index-based plan. I tried this simple test case: create table t (f1 bool);\timingexplain select bool_and(f1) from t; Best-case timings for the EXPLAIN were about 0.480 ms without the patch and 0.500 ms with, so about a 4% penalty. On more complicated queries I think the fractional cost would be less. This seemed acceptable to me, so I went ahead and applied the change, but if anyone wants to argue about it now's the time. regards, tom lane
On Wed, Feb 8, 2012 at 19:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I applied this patch, since I was busy applying catalog changes from you > anyway ;-). Thanks :) > I did think of a possible reason to reject the patch: with this change, > the planner will take longer to plan queries involving bool_and() et al, > since planagg.c will spend time looking (usually fruitlessly) for an > index-based plan. Good point, I should have done those measurements up front. Anyway, since I've often noticed \timing to be unreliable for short queries, I decided to retry your test with pgbench. Long story short, I measured 27% overhead in the un-indexed column case and 33% overhead for an indexed column. That's a lot more than I expected. I even rebuilt and retried a few times to make sure I hadn't botched something. The benchmark script is attached. UNPATCHED select bool_and(b) from unindexed; tps = 13787.023113 (excluding connections establishing) tps = 13880.484788 (excluding connections establishing) tps = 13784.654542 (excluding connections establishing) select bool_and(b) from indexed; tps = 12536.650703 (excluding connections establishing) tps = 12647.767993 (excluding connections establishing) tps = 12500.956407 (excluding connections establishing) PATCHED select bool_and(b) from unindexed; tps = 10096.834678 (excluding connections establishing) tps = 10110.182425 (excluding connections establishing) tps = 10103.904500 (excluding connections establishing) select bool_and(b) from indexed; tps = 8373.631407 (excluding connections establishing) tps = 8659.917173 (excluding connections establishing) tps = 8473.899896 (excluding connections establishing) Regards, Marti