Thread: [NOVICE] Boolean column in multicolumn index

[NOVICE] Boolean column in multicolumn index

From
Dima Pavlov
Date:
Test table and indexes:

------------------------

CREATE TABLE public.t (id serial, cb boolean, ci integer, co integer)

INSERT INTO t(cb, ci, co) 
SELECT ((round(random()*1))::int)::boolean, round(random()*100), round(random()*100)
FROM generate_series(1, 1000000)

CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
CREATE INDEX wrong ON public.t USING btree (ci, co);
CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);

-------------------------

The problem is that I can't force PostgreSQL to use the "right" index. The next query uses the "wrong" index. It's not optimal because it uses "Filter" (condition: cb = TRUE) and so reads more data from memory (and execution becomes longer):

explain (analyze, buffers)
SELECT * FROM t WHERE cb = TRUE AND ci = 46 ORDER BY co LIMIT 1000

"Limit  (cost=0.42..4063.87 rows=1000 width=13) (actual time=0.057..4.405 rows=1000 loops=1)"
"  Buffers: shared hit=1960"
"  ->  Index Scan using wrong on t  (cost=0.42..21784.57 rows=5361 width=13) (actual time=0.055..4.256 rows=1000 loops=1)"
"        Index Cond: (ci = 46)"
"        Filter: cb"
"        Rows Removed by Filter: 967"
"        Buffers: shared hit=1960"
"Planning time: 0.318 ms"
"Execution time: 4.530 ms"

------------------------------

But when I cast bool column to int, that works fine. This is unclear, because selectivity of both indexes (right and right_hack) remains the same.

explain (analyze, buffers)
SELECT * FROM t WHERE cb::int = 1 AND ci = 46 ORDER BY co LIMIT 1000

"Limit  (cost=0.42..2709.91 rows=1000 width=13) (actual time=0.027..1.484 rows=1000 loops=1)"
"  Buffers: shared hit=1003"
"  ->  Index Scan using right_hack on t  (cost=0.42..14525.95 rows=5361 width=13) (actual time=0.025..1.391 rows=1000 loops=1)"
"        Index Cond: ((ci = 46) AND ((cb)::integer = 1))"
"        Buffers: shared hit=1003"
"Planning time: 0.202 ms"
"Execution time: 1.565 ms"

-----------------------------

Are there any limitations of using boolean column inside multicolumn index?

Re: [NOVICE] Boolean column in multicolumn index

From
Tom Lane
Date:
Dima Pavlov <imyfess@gmail.com> writes:
> Test table and indexes:
> ------------------------

> CREATE TABLE public.t (id serial, cb boolean, ci integer, co integer)

> INSERT INTO t(cb, ci, co)
> SELECT ((round(random()*1))::int)::boolean, round(random()*100),
> round(random()*100)
> FROM generate_series(1, 1000000)

> CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
> CREATE INDEX wrong ON public.t USING btree (ci, co);
> CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);

> The problem is that I can't force PostgreSQL to use the "right" index.

Hmm.  Poking at this, it seems not to realize that the cb column is
rendered irrelevant to the index ordering, ie it doesn't notice that
using "right" would allow skipping the sort step.  That's a bug,
likely due to the hacking that goes on to allow "cb" and "cb = true"
to both be considered indexable conditions.

But probably the reason nobody's noticed before is that it's quite
uncommon to have boolean columns in indexes.  If you're only concerned
about doing this with "cb = TRUE", you might consider

CREATE INDEX partial ON public.t USING btree (ci, co) WHERE cb;

            regards, tom lane


Re: [NOVICE] Boolean column in multicolumn index

From
Dima Pavlov
Date:
Will that bug be fixed? 

The problem is that if I use partial indexes I will have to create 3 of them:

CREATE INDEX partial_1 ON public.t USING btree (ci, co) WHERE cb IS TRUE;
CREATE INDEX partial_2 ON public.t USING btree (ci, co) WHERE cb IS FALSE;
CREATE INDEX partial_3 ON public.t USING btree (ci, co) WHERE cb IS NULL;

And if I have 2 booleans columns in index, then it will be 9 combinations.

On Sun, Dec 11, 2016 at 5:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dima Pavlov <imyfess@gmail.com> writes:
> Test table and indexes:
> ------------------------

> CREATE TABLE public.t (id serial, cb boolean, ci integer, co integer)

> INSERT INTO t(cb, ci, co)
> SELECT ((round(random()*1))::int)::boolean, round(random()*100),
> round(random()*100)
> FROM generate_series(1, 1000000)

> CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
> CREATE INDEX wrong ON public.t USING btree (ci, co);
> CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);

> The problem is that I can't force PostgreSQL to use the "right" index.

Hmm.  Poking at this, it seems not to realize that the cb column is
rendered irrelevant to the index ordering, ie it doesn't notice that
using "right" would allow skipping the sort step.  That's a bug,
likely due to the hacking that goes on to allow "cb" and "cb = true"
to both be considered indexable conditions.

But probably the reason nobody's noticed before is that it's quite
uncommon to have boolean columns in indexes.  If you're only concerned
about doing this with "cb = TRUE", you might consider

CREATE INDEX partial ON public.t USING btree (ci, co) WHERE cb;

                        regards, tom lane

Re: [NOVICE] Boolean column in multicolumn index

From
Gerald Cheves
Date:
On 12/10/2016 7:04 PM, Tom Lane wrote:
CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
CREATE INDEX wrong ON public.t USING btree (ci, co);
CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);
The problem is that I can't force PostgreSQL to use the "right" index.

Why is only right in parenthesis but the other indices are not?

-- 
siamo arrivati sani e salvi

Re: [NOVICE] Boolean column in multicolumn index

From
Gerald Cheves
Date:
On 12/10/2016 7:04 PM, Tom Lane wrote:
CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
CREATE INDEX wrong ON public.t USING btree (ci, co);
CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);
The problem is that I can't force PostgreSQL to use the "right" index.

I meant why is right in quotes but the other indices are not in quotes?

-- 
siamo arrivati sani e salvi

Re: [NOVICE] Boolean column in multicolumn index

From
Thomas Kellerer
Date:
Gerald Cheves schrieb am 11.12.2016 um 16:20:
>>> CREATE INDEX "right" ON public.t USING btree (ci, cb, co);
>>> CREATE INDEX wrong ON public.t USING btree (ci, co);
>>> CREATE INDEX right_hack ON public.t USING btree (ci, (cb::integer), co);
>>> The problem is that I can't force PostgreSQL to use the "right" index.
>
> I meant why isright in quotes but the other indices are not in quotes?

Because "right" is a reserved keyword and can only be used as an identifier if you quote it:

https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS




Re: [NOVICE] Boolean column in multicolumn index

From
Tom Lane
Date:
Dima Pavlov <imyfess@gmail.com> writes:
> Will that bug be fixed?

Eventually.  I've written a draft patch here:
https://www.postgresql.org/message-id/1788.1481605684%40sss.pgh.pa.us

I'm not sure we'd risk back-patching it into stable branches, though.

            regards, tom lane