Thread: Partial index on JSON column

Partial index on JSON column

From
Samuel Williams
Date:
Hello

I have a table with ~3 billion events.

Of this, there are a small subset of events which match the following query:

CREATE INDEX index_user_event_for_suggestion_notification ON public.user_event USING btree ((((parameters ->> 'suggestion_id'::text))::integer), what) WHERE ((parameters ->> 'suggestion_id'::text) IS NOT NULL)

When I do this query:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->> 'suggestion_id'::text)::integer = 26) AND what = 'suggestion_notification';


Aggregate  (cost=7115959.48..7115959.49 rows=1 width=8)
  ->  Bitmap Heap Scan on user_event  (cost=37360.24..7115907.56 rows=20771 width=0)
        Recheck Cond: ((what)::text = 'suggestion_notification'::text)
        Filter: (((parameters ->> 'suggestion_id'::text))::integer = 26)
        ->  Bitmap Index Scan on index_user_event_for_clustering  (cost=0.00..37355.05 rows=4154273 width=0)
              Index Cond: ((what)::text = 'suggestion_notification'::text)

It's slow. I need to explicitly add the NULL constraint:

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->> 'suggestion_id'::text)::integer = 26) AND ((parameters ->> 'suggestion_id'::text) IS NOT NULL) AND what = 'suggestion_notification';


Aggregate  (cost=38871.48..38871.49 rows=1 width=8)
  ->  Index Scan using index_user_event_for_suggestion_notification on user_event  (cost=0.42..38819.81 rows=20668 width=0)
        Index Cond: ((((parameters ->> 'suggestion_id'::text))::integer = 26) AND ((what)::text = 'suggestion_notification'::text))

I feel like the null constraint should be implicit.

That being said:

- Is my partial index wrong? Should I write it differently so the optimiser knows this?
- Is my query wrong? How can I make the most use of this index without being explicit?
- Any other suggestions for how I can do this?

Thanks
Samuel

Re: Partial index on JSON column

From
Tom Lane
Date:
Samuel Williams <space.ship.traveller@gmail.com> writes:
> When I do this query:

> EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
> 'suggestion_id'::text)::integer = 26) AND what = 'suggestion_notification';

> It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold.  In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

            regards, tom lane


Re: Partial index on JSON column

From
Samuel Williams
Date:
Thanks for the quick reply Tom,

I will try your advice.

The reason why I used ::integer for the INDEX is because I assumed it would be more efficient both in space and performance.

In the JSONB field, it is actually an integer, i.e. {"location_age": 1, "suggestion_id": 26}

So, now that I think about it, maybe the way I'm using ::text is wrong.

Any further advice is most appreciated.

Kind regards,
Samuel

On Wed, 20 Feb 2019 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Samuel Williams <space.ship.traveller@gmail.com> writes:
> When I do this query:

> EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
> 'suggestion_id'::text)::integer = 26) AND what = 'suggestion_notification';

> It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold.  In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

                        regards, tom lane

Re: Partial index on JSON column

From
Michael Lewis
Date:

On Tue, Feb 19, 2019 at 1:41 PM Samuel Williams <space.ship.traveller@gmail.com> wrote:
Hello

I have a table with ~3 billion events.

Of this, there are a small subset of events which match the following query:

CREATE INDEX index_user_event_for_suggestion_notification ON public.user_event USING btree ((((parameters ->> 'suggestion_id'::text))::integer), what) WHERE ((parameters ->> 'suggestion_id'::text) IS NOT NULL)

You may want to use this-
CREATE INDEX index_user_event_for_suggestion_notification ON public.user_event USING btree (parameters ->> 'suggestion_id', what) WHERE parameters ? 'suggestion_id';


-Michael 

Re: Partial index on JSON column

From
Michael Lewis
Date:


On Wed, 20 Feb 2019 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Samuel Williams <space.ship.traveller@gmail.com> writes:
> When I do this query:

> EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
> 'suggestion_id'::text)::integer = 26) AND what = 'suggestion_notification';

> It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold.  In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

For run-time safe comparisons of something stored in jsonb that I expect to only have booleans (and I don't trust input to always be valid), I have a utility function like this-

CREATE OR REPLACE FUNCTION public.util_to_bool( pInput TEXT )
RETURNS BOOL AS
$BODY$
BEGIN
RETURN pInput::BOOL;
EXCEPTION WHEN OTHERS THEN
RETURN FALSE;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;


This is probably not best practice however.

 

                        regards, tom lane

Re: Partial index on JSON column

From
Samuel Williams
Date:
So, I checked my original code, and I didn't add `::text`, it was actually added by Postgres:

EXPLAIN for: SELECT "user_event".* FROM "user_event" WHERE "user_event"."what" = $1 AND (((parameters ->> 'suggestion_id')::integer) = 119 AND ((parameters ->> 'suggestion_id') IS NOT NULL)) [["what", "suggestion_notification"]]
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Index Scan using index_user_event_for_suggestion_notification on user_event  (cost=0.42..77193.95 rows=20669 width=138)
   Index Cond: ((((parameters ->> 'suggestion_id'::text))::integer = 119) AND ((what)::text = 'suggestion_notification'::text))
(2 rows)

Is there some way to directly use the integer value in the index with minimal type coercions?

Thanks
Samuel

On Wed, 20 Feb 2019 at 10:24, Samuel Williams <space.ship.traveller@gmail.com> wrote:
Thanks for the quick reply Tom,

I will try your advice.

The reason why I used ::integer for the INDEX is because I assumed it would be more efficient both in space and performance.

In the JSONB field, it is actually an integer, i.e. {"location_age": 1, "suggestion_id": 26}

So, now that I think about it, maybe the way I'm using ::text is wrong.

Any further advice is most appreciated.

Kind regards,
Samuel

On Wed, 20 Feb 2019 at 10:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Samuel Williams <space.ship.traveller@gmail.com> writes:
> When I do this query:

> EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
> 'suggestion_id'::text)::integer = 26) AND what = 'suggestion_notification';

> It's slow. I need to explicitly add the NULL constraint:

Try it like

EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

I don't think we assume that CoerceViaIO is strict, and without that
the deduction that the value couldn't be null doesn't hold.  In any
case you're better off without the runtime type conversion: that
isn't doing much for you except raising the odds of getting an error.

                        regards, tom lane

Re: Partial index on JSON column

From
Tom Lane
Date:
I wrote:
> Try it like

> EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
> 'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

> I don't think we assume that CoerceViaIO is strict, and without that
> the deduction that the value couldn't be null doesn't hold.

Hmm ... on closer inspection, we do assume that CoerceViaIO is strict,
but not everyplace knows that, so there's something that could be
improved here.  In the meantime, try it as stated above.

            regards, tom lane


Re: Partial index on JSON column

From
Samuel Williams
Date:
Thanks Tom, I did solve the problem by adding the null constraint for now, it's a quick solution, and I look forward to the future where this case is handled appropriately.

On Wed, 20 Feb 2019 at 12:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Try it like

> EXPLAIN SELECT COUNT(*) FROM "user_event" WHERE ((parameters ->>
> 'suggestion_id'::text) = '26') AND what = 'suggestion_notification';

> I don't think we assume that CoerceViaIO is strict, and without that
> the deduction that the value couldn't be null doesn't hold.

Hmm ... on closer inspection, we do assume that CoerceViaIO is strict,
but not everyplace knows that, so there's something that could be
improved here.  In the meantime, try it as stated above.

                        regards, tom lane