Thread: [PATCH] Enable min/max optimization for bool_and/bool_or/every

[PATCH] Enable min/max optimization for bool_and/bool_or/every

From
Marti Raudsepp
Date:
Hi list,

As discussed on the pgsql-general list, the bool_and() and bool_or()
aggregate functions behave exactly like min() and max() would over
booleans. While it's not likely that people would have an appropriate
index on a boolean column, it seems it wouldn't cost us anything to
take advantage of this optimization, as it requires no code changes at
all, simply value changes in the pg_aggregate catalog.

Before:
db=# explain analyze select bool_and(b) from bools;Aggregate  (cost=1693.01..1693.02 rows=1 width=1) ->  Seq Scan on
bools (cost=0.00..1443.01 rows=100001 width=1)Total runtime: 29.736 ms
 

After:
db=# explain analyze select bool_and(b) from bools;Result  (cost=0.03..0.04 rows=1 width=0) InitPlan 1 (returns $0)
-> Limit  (cost=0.00..0.03 rows=1 width=1)         ->  Index Scan using bools_b_idx on bools
 
(cost=0.00..3300.28 rows=100001 width=1)               Index Cond: (b IS NOT NULL)Total runtime: 0.109 ms

Original discussion here:
http://archives.postgresql.org/message-id/CABRT9RAGwQEP+EFhVpZ6=B4cJEcUE2-QCpb_ZdrNPgQNa8xKuA@mail.gmail.com

PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.

Regards,
Marti


Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every

From
Robert Haas
Date:
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
> PS: It seems that the min/max optimization isn't documented in the
> manual (apart from release notes), so I didn't include any doc changes
> in this patch.

I don't see a patch attached to this email, so either you forgot to
attach it, or the list ate it somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every

From
Marti Raudsepp
Date:
On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
>> PS: It seems that the min/max optimization isn't documented in the
>> manual (apart from release notes), so I didn't include any doc changes
>> in this patch.
>
> I don't see a patch attached to this email, so either you forgot to
> attach it, or the list ate it somehow.

I forgot to attach it, sorry. Here it is.

Regards,
Marti

Attachment

Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every

From
Robert Haas
Date:
On Thu, Dec 22, 2011 at 11:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
> On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
>>> PS: It seems that the min/max optimization isn't documented in the
>>> manual (apart from release notes), so I didn't include any doc changes
>>> in this patch.
>>
>> I don't see a patch attached to this email, so either you forgot to
>> attach it, or the list ate it somehow.
>
> I forgot to attach it, sorry. Here it is.

Nice.  It doesn't get much simpler than that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every

From
Tom Lane
Date:
Marti Raudsepp <marti@juffo.org> writes:
> On Thu, Dec 22, 2011 at 18:41, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp <marti@juffo.org> wrote:
>>> PS: It seems that the min/max optimization isn't documented in the
>>> manual (apart from release notes), so I didn't include any doc changes
>>> in this patch.

>> I don't see a patch attached to this email, so either you forgot to
>> attach it, or the list ate it somehow.

> I forgot to attach it, sorry. Here it is.

I applied this patch, since I was busy applying catalog changes from you
anyway ;-).

I did think of a possible reason to reject the patch: with this change,
the planner will take longer to plan queries involving bool_and() et al,
since planagg.c will spend time looking (usually fruitlessly) for an
index-based plan.  I tried this simple test case:
create table t (f1 bool);\timingexplain select bool_and(f1) from t;

Best-case timings for the EXPLAIN were about 0.480 ms without the patch
and 0.500 ms with, so about a 4% penalty.  On more complicated queries
I think the fractional cost would be less.  This seemed acceptable to
me, so I went ahead and applied the change, but if anyone wants to
argue about it now's the time.
        regards, tom lane


Re: [PATCH] Enable min/max optimization for bool_and/bool_or/every

From
Marti Raudsepp
Date:
On Wed, Feb 8, 2012 at 19:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I applied this patch, since I was busy applying catalog changes from you
> anyway ;-).

Thanks :)

> I did think of a possible reason to reject the patch: with this change,
> the planner will take longer to plan queries involving bool_and() et al,
> since planagg.c will spend time looking (usually fruitlessly) for an
> index-based plan.

Good point, I should have done those measurements up front. Anyway,
since I've often noticed \timing to be unreliable for short queries, I
decided to retry your test with pgbench.

Long story short, I measured 27% overhead in the un-indexed column
case and 33% overhead for an indexed column. That's a lot more than I
expected. I even rebuilt and retried a few times to make sure I hadn't
botched something. The benchmark script is attached.

UNPATCHED
select bool_and(b) from unindexed;
tps = 13787.023113 (excluding connections establishing)
tps = 13880.484788 (excluding connections establishing)
tps = 13784.654542 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 12536.650703 (excluding connections establishing)
tps = 12647.767993 (excluding connections establishing)
tps = 12500.956407 (excluding connections establishing)

PATCHED
select bool_and(b) from unindexed;
tps = 10096.834678 (excluding connections establishing)
tps = 10110.182425 (excluding connections establishing)
tps = 10103.904500 (excluding connections establishing)
select bool_and(b) from indexed;
tps = 8373.631407 (excluding connections establishing)
tps = 8659.917173 (excluding connections establishing)
tps = 8473.899896 (excluding connections establishing)

Regards,
Marti

Attachment