Thread: Performance on JSONB select

Performance on JSONB select

From
PegoraroF10
Date:
select * from MyTable where
  ((JsonBField->>'status'='descartada' and
To_Date(JsonBField->'descartada'->>'data','yyyy-mm-dd') > Current_Date) or
   (JsonBField->>'status'='contrato' and
To_Date(JsonBField->'contrato'->>'data','yyyy-mm-dd') > Current_Date-7) or
   (JsonBField->>'status'='naoatribuido'));

Considering this table has a million records, I would like to use a proper
index, but how can I use an index when using operator >. 




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Performance on JSONB select

From
Michael Lewis
Date:
Much of indexing strategy depends on knowing the data like how many distinct values and what the distribution is like. Is JsonBField->>'status' always set? Are those three values mentioned in this query common or rare? Can you re-write this query to avoid using an OR in the where clause? Are you just wanting to add a GIN index for the jsonb paths? Or do you want indexed like below that are a bit stylized to this query?

CREATE INDEX idx_MyTable_status USING btree( JsonBField->>'status' );
CREATE INDEX idx_MyTable_descartada_date USING btree( To_Date(JsonBField->'descartada'->>'data','yyyy-mm-dd') );
CREATE INDEX idx_MyTable_contrato_date USING btree( To_Date(JsonBField->'contrato'->>'data','yyyy-mm-dd') );

Re: Performance on JSONB select

From
PegoraroF10
Date:
ok, my select performed better but I had to create 8 indices to speed up my
query.

I would love to create just one index using GIN(JsonBField jsonb_ops) but
using version 11 I cannot use operators like > and  <. 

I see on docs that version 12 has jsonpath Filter Expression Elements and
they include > and <. So, the question is, will be possible to have just one
index and use it with "jsonpath Filter Expression Elements ?



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Performance on JSONB select

From
Stephen Frost
Date:
Greetings,

* Michael Lewis (mlewis@entrata.com) wrote:
> Much of indexing strategy depends on knowing the data like how many
> distinct values and what the distribution is like. Is JsonBField->>'status'
> always set? Are those three values mentioned in this query common or rare?
> Can you re-write this query to avoid using an OR in the where clause? Are
> you just wanting to add a GIN index for the jsonb paths? Or do you want
> indexed like below that are a bit stylized to this query?

If you know a field is going to always be there, you're better off, by
far, by just having a regular column for that value and a straight up
btree for it.  This saves a significant amount of space and makes it
much easier to index and work with.

Thanks,

Stephen

Attachment

Re: Performance on JSONB select

From
PegoraroF10
Date:
I think I solved my problem. I didn´t know it was possible but I´ve created
an almost dynamic index, because it´ll index for status and corresponding
datevalue of an object with that status value.

Postgres, I love you.

Just one B-Tree index with 2 fields, Status and DateTime of that respective
Status object. My Json has always a status and a respective object of that
status with other values. So I know if it has that status it has a Date
Value or not on its correspondent object.

{
    "status": "visitadescartada",
    "contrato": {},
    "atribuido": {"datevalue": "2019-09-05 14:47:11.149095-03"},
    "trabalhando": {},
    "visitaagendada": {"datevalue": "2019-09-05
15:06:24.255548-03","caption": "Agendado"},
    "visitadescartada": {"datevalue": "2019-09-12
11:47:17.45782-03","caption": "Desagendado"},
    "digitacaodescartada": {}
}

create index IndexByStatusAndDate on MyTable (
(JsonBField->>$$status$$),
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue'))
where (JsonBField ? $$status$$);

select * from MyTable where (JsonBField ? $$status$$) and
case when JsonBField->>$$status$$=$$cadastrodescartado$$ then
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue')
> castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$visitadescartada$$ then
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue')
> castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$contrato$$ then
castimmutabletimestamp(funilvendas->(JsonBField->>$$status$$)->>'data') >
castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$naoatribuido$$ then True end;

And performance now is great because I´m using both fields on index.



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Performance on JSONB select

From
PegoraroF10
Date:
I think I solved my problem. I didn´t know it was possible but I´ve created
an almost dynamic index, because it´ll index for status and corresponding
datevalue of an object with that status value.

Postgres, I love you.

Just one B-Tree index with 2 fields, Status and DateTime of that respective
Status object. My Json has always a status and a respective object of that
status with other values. So I know if it has that status it has a Date
Value or not on its correspondent object.

{
    "status": "visitadescartada",
    "contrato": {},
    "atribuido": {"datevalue": "2019-09-05 14:47:11.149095-03"},
    "trabalhando": {},
    "visitaagendada": {"datevalue": "2019-09-05
15:06:24.255548-03","caption": "Agendado"},
    "visitadescartada": {"datevalue": "2019-09-12
11:47:17.45782-03","caption": "Desagendado"},
    "digitacaodescartada": {}
}

create index IndexByStatusAndDate on MyTable (
(JsonBField->>$$status$$),
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue'))
where (JsonBField ? $$status$$);

select * from MyTable where (JsonBField ? $$status$$) and
case when JsonBField->>$$status$$=$$cadastrodescartado$$ then
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue')
> castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$visitadescartada$$ then
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue')
> castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$contrato$$ then
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'data') >
castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$naoatribuido$$ then True end;

And performance now is great because I´m using both fields on index.




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Performance on JSONB select

From
Pavel Stehule
Date:
Hi

so 5. 10. 2019 v 13:34 odesílatel PegoraroF10 <marcos@f10.com.br> napsal:
I think I solved my problem. I didn´t know it was possible but I´ve created
an almost dynamic index, because it´ll index for status and corresponding
datevalue of an object with that status value.

you created multicolumn functional index (there are no any dynamic index :))

Pavel
 

Postgres, I love you.

Just one B-Tree index with 2 fields, Status and DateTime of that respective
Status object. My Json has always a status and a respective object of that
status with other values. So I know if it has that status it has a Date
Value or not on its correspondent object.

{
    "status": "visitadescartada",
    "contrato": {},
    "atribuido": {"datevalue": "2019-09-05 14:47:11.149095-03"},
    "trabalhando": {},
    "visitaagendada": {"datevalue": "2019-09-05
15:06:24.255548-03","caption": "Agendado"},
    "visitadescartada": {"datevalue": "2019-09-12
11:47:17.45782-03","caption": "Desagendado"},
    "digitacaodescartada": {}
}

create index IndexByStatusAndDate on MyTable (
(JsonBField->>$$status$$),
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue'))
where (JsonBField ? $$status$$);

select * from MyTable where (JsonBField ? $$status$$) and
case when JsonBField->>$$status$$=$$cadastrodescartado$$ then
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue')
> castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$visitadescartada$$ then
castimmutabletimestamp(JsonBField->(JsonBField->>$$status$$)->>'datevalue')
> castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$contrato$$ then
castimmutabletimestamp(funilvendas->(JsonBField->>$$status$$)->>'data') >
castimmutabletimestamp($$2019-10-01$$)
     when JsonBField->>$$status$$=$$naoatribuido$$ then True end;

And performance now is great because I´m using both fields on index.



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Performance on JSONB select

From
PegoraroF10
Date:
I told it was almost dynamic because it uses DateValue from an object or
another, depending on value of staus key.



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Performance on JSONB select

From
Fabrízio de Royes Mello
Date:

Em sáb, 5 de out de 2019 às 11:49, PegoraroF10 <marcos@f10.com.br> escreveu:
I told it was almost dynamic because it uses DateValue from an object or
another, depending on value of staus key.


Actually it’s named partial index.

Regards,
--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Performance on JSONB select

From
PegoraroF10
Date:
No, no, no, partial index is the where clause of it and I´m not talking about that.
I´m talking about datevalue, which will be used one or another, depending on status value

This record iindex will be status and datevalue from "visitadescartada" object
{
  "status": "visitadescartada",
  "atribuido": {"datevalue": "2019-09-05 14:47:11"},
  "visitadescartada": {"datevalue": "2019-09-12 11:47:17"},
  "digitacaodescartada": {}
}

And here the index record will be status and datevalue from "atribuido" object
{
  "status": "atribuido",
  "atribuido": {"datevalue": "2019-09-05 14:47:11"},
  "visitadescartada": {"datevalue": "2019-09-12 11:47:17"},
  "digitacaodescartada": {}
}

Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Performance on JSONB select

From
Michael Lewis
Date:
My Json has always a status

Why declare the "where" clause when creating the index? It would not seem needed if status is always set and so your index will reference all rows in the table.

Thanks for sharing the trick of having the second column in the index determine the key based on the first column. I don't know if/when I might need this, but an interesting solution. Generically, it seems like the below-

create index idx_mytable_jsonb_dependent_fields on public.mytable ( (JsonBField->>'primary_field_to_filter_on'), JsonBField->(JsonBField->>'primary_field_to_filter_on')->>'secondary_field' );

Re: Performance on JSONB select

From
PegoraroF10
Date:
Sorry, I told you that the status was always populated but not, I need that
filter for the index because not all records are using that json structure.
When json is not null then yes, Status is always there. 
I have maybe 20 or 25% of records having json populated, so, I really need
that filter.

Yes, that solution is fine. Selectivity is fine, speed is fine and the index
is "almost dynamic"



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html