Home > mailing lists

Thread: Indexing on JSONB field not working

Indexing on JSONB field not working

From

"Zhihong Zhang"

Date:

20 December 2019, 21:52:17

I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2

ON public.assets USING btree

(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)

TABLESPACE pg_default;

However query doesn’t use it,

explain select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 3;

Limit (cost=0.00..3.24 rows=3 width=53)

-> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)

Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)

The version of the database,

"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit"

However, the index works for text field in JSONB.

Let me know if I can provide more information.

Zhihong Zhang

zhihong@gmail.com

Re: Indexing on JSONB field not working

From

Tomas Vondra

Date:

20 December 2019, 22:30:05

On Fri, Dec 20, 2019 at 04:52:17PM -0500, Zhihong Zhang wrote:
>I have an index on JSONB fields like this,
>
>
>
>CREATE INDEX float_number_index_path2
>
>    ON public.assets USING btree
>
>    (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
>
>    TABLESPACE pg_default;
>
>
>
>However query doesn't use it,
>
>
>
>explain select id, _doc->>'floatValue' from assets where (_doc #>
>'{floatValue}'::text[])::double precision < 3.0 limit 3;
>
>
>
>Limit  (cost=0.00..3.24 rows=3 width=53)
>
>  ->  Seq Scan on assets  (cost=0.00..936605.40 rows=867607 width=53)
>
>        Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
>'3'::double precision)
>
>

The index scan is likely expected to be more expensive than the plain
sequential scan with the LIMIT interrupting it pretty much right away
(it's expected to scan only ~0.0003% of the table.

You can probably push the database to use the index by disabling
sequential scans, i.e.

   SET enable_seqscan = off;

and then doing the explain again.

The interesting question however is which of the plans is faster. It's
quite possible the database is making the right choice - index scans are
not necessarily faster.

>
>The version of the database,
>
>
>
>"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
>20140120 (Red Hat 4.8.2-16), 64-bit"
>
>
>
>However, the index works for text field in JSONB.
>

Well, the text field probably has different statistics, so the
sequential scan would have so scan much larger part of the table. Who
knows - you haven't shared the execution plans.

>
>
>Let me know if I can provide more information.
>

Show us explain analyze for both queries, with both index-scan and
seq-scan (you'll have to use enable_seqscan and enable_indexscan to
force the plan choice).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

20 December 2019, 22:57:37

On Fri, Dec 20, 2019 at 5:12 PM Zhihong Zhang <zhihong@gmail.com> wrote:

I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2
    ON public.assets USING btree
    (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
    TABLESPACE pg_default;

However query doesn’t use it,

Did you analyze the table after building the index? Expression indexes have their own statistics, but they don't get populated until the table is analyzed.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

21 December 2019, 00:39:28

A few clarifications,

1. The index is very effective. If I treat it as text, the index works. With index, the result returns in 1 second. Otherwise, it’s 5 minutes.

2. Removing limit doesn’t change the behavior.

3. I ran Analyze multiple times after indexing.

Zhihong

On Dec 20, 2019, at 5:57 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Dec 20, 2019 at 5:12 PM Zhihong Zhang <zhihong@gmail.com> wrote:
I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2
    ON public.assets USING btree
    (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
    TABLESPACE pg_default;

However query doesn’t use it,

Did you analyze the table after building the index? Expression indexes have their own statistics, but they don't get populated until the table is analyzed.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

21 December 2019, 01:15:39

Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!

SET enable_seqscan = off;

explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"

" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"

" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"

"Planning Time: 1.617 ms"

"Execution Time: 0.276 ms"

SET enable_seqscan = on;

explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"

" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)"

" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"

" Rows Removed by Filter: 2602824"

"Planning Time: 0.283 ms"

"Execution Time: 313501.777 ms"

Zhihong

On Dec 20, 2019, at 5:30 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On Fri, Dec 20, 2019 at 04:52:17PM -0500, Zhihong Zhang wrote:
I have an index on JSONB fields like this,

CREATE INDEX float_number_index_path2

  ON public.assets USING btree

  (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)

  TABLESPACE pg_default;

However query doesn't use it,

explain select id, _doc->>'floatValue' from assets where (_doc #>
'{floatValue}'::text[])::double precision < 3.0 limit 3;

Limit (cost=0.00..3.24 rows=3 width=53)

-> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)

      Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
'3'::double precision)

The index scan is likely expected to be more expensive than the plain
sequential scan with the LIMIT interrupting it pretty much right away
(it's expected to scan only ~0.0003% of the table.

You can probably push the database to use the index by disabling
sequential scans, i.e.

SET enable_seqscan = off;

and then doing the explain again.

The interesting question however is which of the plans is faster. It's
quite possible the database is making the right choice - index scans are
not necessarily faster.

The version of the database,

"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
20140120 (Red Hat 4.8.2-16), 64-bit"

However, the index works for text field in JSONB.

Well, the text field probably has different statistics, so the
sequential scan would have so scan much larger part of the table. Who
knows - you haven't shared the execution plans.

Let me know if I can provide more information.

Show us explain analyze for both queries, with both index-scan and
seq-scan (you'll have to use enable_seqscan and enable_indexscan to
force the plan choice).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

21 December 2019, 02:32:16

On Fri, Dec 20, 2019 at 7:39 PM Zhihong Zhang <zhihong@gmail.com> wrote:

A few clarifications,

1. The index is very effective. If I treat it as text, the index works. With index, the result returns in 1 second. Otherwise, it’s 5 minutes.

An index is over a datatype. You can't just "treat an index like text". Either it is an index over text, or it is not. It you cast an expression to text while defining the index, then that is a different index than if you defined it with a cast to double precision.

2. Removing limit doesn’t change the behavior.

I have no problem getting your index to be used with or without the LIMIT , as long as the expression "(_doc #> '{floatValue}'::text[])::double precision < 3.0" is selective enough that it appears to be worth using an index for it. For example:

insert into assets (id,_doc) select x, jsonb_build_object('floatValue',random()*1000) from generate_series(1,1000000) f(x);

We can't comment on how two queries might differ, then we have only seen one of them.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Tomas Vondra

Date:

21 December 2019, 11:40:03

On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!
>
>
>SET enable_seqscan = off;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;

>
>"Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>"  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual
time=0.089..0.230rows=7 loops=1)"

>"        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"Planning Time: 1.617 ms"
>"Execution Time: 0.276 ms"
>
>SET enable_seqscan = on;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;

>
>"Limit  (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
>"  ->  Seq Scan on assets  (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7
loops=1)"
>"        Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"        Rows Removed by Filter: 2602824"
>"Planning Time: 0.283 ms"
>"Execution Time: 313501.777 ms"
>

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

For LIMIT queries, I can think of two common issues - the cost estimate
is computed as a simple linear approximation in the input relation. For
example, we know the seqscan is expected to produce 867607 rows with a
total cost of 936605, so the cost of producing just 100 rows is

    100 * 936605 / 867607 = 107.95

But that assumes a number of things: (a) that the seqscan row estimate
is correct, and that (b) the matching rows are uniformly distributed in
the table. If it's misestimated, or if the rows are towards the end of
the relation (i.e. after doing a most of the costed work) this estimate
may be quite off.

Can you do explain analyze of the query without the LIMIT?

BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could
do try using a partial index (if the where condition does not change).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

21 December 2019, 15:17:26

On Sat, Dec 21, 2019 at 7:00 AM Zhihong Zhang <zhihong@gmail.com> wrote:

Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!

SET enable_seqscan = off;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
"Planning Time: 1.617 ms"
"Execution Time: 0.276 ms"

313 is a pretty high estimate for fetching an estimated 100 rows. It must think that nearly every row fetched from the table is going to be a random page fetch. Which means that it must think the correlation between assets_floatvalue_idx and physical table order is close to 0.

SET enable_seqscan = on;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)"
" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
" Rows Removed by Filter: 2602824"
"Planning Time: 0.283 ms"
"Execution Time: 313501.777 ms"

It thinks it will find 867607 rows which meet the <3.0 condition, but really it only finds 7. It has to scan the full table, because with only 7 rows it can never stop early due to the LIMIT 100. Why is the estimate wrong by a factor of over 100,000? It should be using the statistics from the expression index here (even though it is not using the index during execution), and so should have pretty good statistics.

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

21 December 2019, 15:49:16

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>> "Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>> "  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual
time=0.089..0.230rows=7 loops=1)" 
>> "        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"

> Well, this confirms what I suspected before - the optimizer believes the
> seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
> that mismatches the actual performance. The question is why ...

The planner evidently believes that 867607 rows will match the query
condition, so it expects that the scan will stop (after collecting
100 rows) very quickly.  In reality only 7 rows match, so the scan
has to run to completion.  This is what's bollixing the plan choice.

I suspect that 867607 is just a default estimate, but if ANALYZE has
been run then there should be stats for the index column, so why isn't
it doing better?  When I try a similar case here, I get good estimates:

regression=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
(1 row)

regression=# create table public.assets(_doc jsonb);
CREATE TABLE
regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST);
CREATE INDEX
regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x;
INSERT 0 10000
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100; 
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..7.71 rows=100 width=32)
   ->  Seq Scan on assets  (cost=0.00..293.44 rows=3808 width=32)
         Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

regression=# analyze assets;
ANALYZE
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100; 
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..8.33 rows=2 width=32)
   ->  Index Scan using assets_float8_idx on assets  (cost=0.29..8.33 rows=2 width=32)
         Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

The "3808" estimate is just a default for '<' with no stats, but
with stats I get a dead-on estimate.

            regards, tom lane

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

26 December 2019, 15:17:51

Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing
wrong?

If it’s a bug, can I have a tracking or ticket number?

Thanks!

Zhihong


> On Dec 21, 2019, at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>>> "Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>>> "  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual
time=0.089..0.230rows=7 loops=1)" 
>>> "        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>
>> Well, this confirms what I suspected before - the optimizer believes the
>> seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
>> that mismatches the actual performance. The question is why ...
>
> The planner evidently believes that 867607 rows will match the query
> condition, so it expects that the scan will stop (after collecting
> 100 rows) very quickly.  In reality only 7 rows match, so the scan
> has to run to completion.  This is what's bollixing the plan choice.
>
> I suspect that 867607 is just a default estimate, but if ANALYZE has
> been run then there should be stats for the index column, so why isn't
> it doing better?  When I try a similar case here, I get good estimates:
>
> regression=# select version();
>                                                 version
> ---------------------------------------------------------------------------------------------------------
> PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
> (1 row)
>
> regression=# create table public.assets(_doc jsonb);
> CREATE TABLE
> regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST);
> CREATE INDEX
> regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x;
> INSERT 0 10000
> regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision
<3.0 limit 100; 
>                                           QUERY PLAN
> ------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..7.71 rows=100 width=32)
>   ->  Seq Scan on assets  (cost=0.00..293.44 rows=3808 width=32)
>         Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
> (3 rows)
>
> regression=# analyze assets;
> ANALYZE
> regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision
<3.0 limit 100; 
>                                             QUERY PLAN
> ----------------------------------------------------------------------------------------------------
> Limit  (cost=0.29..8.33 rows=2 width=32)
>   ->  Index Scan using assets_float8_idx on assets  (cost=0.29..8.33 rows=2 width=32)
>         Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
> (3 rows)
>
> The "3808" estimate is just a default for '<' with no stats, but
> with stats I get a dead-on estimate.
>
>             regards, tom lane

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

26 December 2019, 15:27:27

Zhihong Zhang <zhihong@gmail.com> writes:
> Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing
wrong?

We're wondering why the optimizer doesn't derive a better rowcount
estimate, and waiting on you to provide information about what the
relevant pg_stats entries are.

            regards, tom lane

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

26 December 2019, 16:30:47

That command yields nothing. If I use the tablename ‘assets’, it only returns stats on columns, nothing about indexes. The indexed field is in JSONB field ‘_doc’. This is the stats on the column,

select * from pg_stats where tablename='assets' and attname='_doc';
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_
common_elem_freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------
------------------+----------------------
 public     | assets    | _doc    | f         |         0 |        18 |         -1 |                  |                   |                  |             |                   |
                  |
(1 row)

Let me know if you want any other information.

Zhihong

On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

26 December 2019, 16:50:24

On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.

On Thu, Dec 26, 2019 at 11:30 AM Zhihong Zhang <zhihong@gmail.com> wrote:

That command yields nothing.

Did you use the actual name of the index (in case its actual name differs from what you showed us in the CREATE INDEX)? If you did and it doesn't show anything, then I don't think the table has been ANALYZED since the index was created. Note that ANALYZE and EXPLAIN ANALYZE are different things. You need to run "ANALYZE assets;" or "VACUUM ANALYZE assets;"

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

26 December 2019, 16:58:27

The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some are created when table is created. None of them is in pg_stats.

The pg_stats only contains a row for each column, nothing else.

The database is on AWS RDS. Does that make any difference?

Zhihong

On Dec 26, 2019, at 11:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.

On Thu, Dec 26, 2019 at 11:30 AM Zhihong Zhang <zhihong@gmail.com> wrote:
That command yields nothing.

Did you use the actual name of the index (in case its actual name differs from what you showed us in the CREATE INDEX)? If you did and it doesn't show anything, then I don't think the table has been ANALYZED since the index was created. Note that ANALYZE and EXPLAIN ANALYZE are different things. You need to run "ANALYZE assets;" or "VACUUM ANALYZE assets;"

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

26 December 2019, 18:58:35

Zhihong Zhang <zhihong@gmail.com> writes:
> The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some
arecreated when table is created. None of them is in pg_stats. 

You still haven't shown us exactly what you're doing, only asserted that
you're doing the right thing, which seems somewhat unlikely given that
nobody has been able to reproduce this behavior based on what you've
told us so far.

One possible gotcha is indicated by this comment in analyze.c:

     * Open all indexes of the relation, and see if there are any analyzable
     * columns in the indexes.  We do not analyze index columns if there was
     * an explicit column list in the ANALYZE command, however.  If we are

that is, if you did something like "ANALYZE assets(_doc);" it would
not result in creation of stats for any expression indexes.

> The database is on AWS RDS. Does that make any difference?

Wouldn't have thought that their version of Postgres deviates much
from community Postgres.  If you were talking about Aurora or Redshift,
I can't speak to what those do.

            regards, tom lane

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

26 December 2019, 19:22:30

I simply ran ‘analyze;’ without table name and I assume that analyzes everything. I just ran ‘analyze assets;’ again
butstill no stats on indexes. 

I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling this?


Thanks!

Zhihong


> On Dec 26, 2019, at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Zhihong Zhang <zhihong@gmail.com> writes:
>> The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some
arecreated when table is created. None of them is in pg_stats. 
>
> You still haven't shown us exactly what you're doing, only asserted that
> you're doing the right thing, which seems somewhat unlikely given that
> nobody has been able to reproduce this behavior based on what you've
> told us so far.
>
> One possible gotcha is indicated by this comment in analyze.c:
>
>     * Open all indexes of the relation, and see if there are any analyzable
>     * columns in the indexes.  We do not analyze index columns if there was
>     * an explicit column list in the ANALYZE command, however.  If we are
>
> that is, if you did something like "ANALYZE assets(_doc);" it would
> not result in creation of stats for any expression indexes.
>
>> The database is on AWS RDS. Does that make any difference?
>
> Wouldn't have thought that their version of Postgres deviates much
> from community Postgres.  If you were talking about Aurora or Redshift,
> I can't speak to what those do.
>
>             regards, tom lane

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

26 December 2019, 20:49:18

Zhihong Zhang <zhihong@gmail.com> writes:
> I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling
this?

No, I don't think so.  We only collect stats on index expressions, though,
not simple columns (since those would be duplicative of the underlying
column's stats).

Hmmm ... looking at the pg_stats view, it has a filter

  WHERE NOT a.attisdropped AND
    has_column_privilege(c.oid, a.attnum, 'select'::text) AND
    (c.relrowsecurity = false OR NOT row_security_active(c.oid))

The has_column_privilege test might be getting in the way if you're
not superuser; it will probably think you have no access privileges
for the index.  I now recall somebody complaining about that before [1],
but no fix has been accepted as yet.

Having said that, though, that only accounts for you not seeing the
entries in the pg_stats view; it doesn't explain why the optimizer
doesn't see them, assuming they're actually there in pg_statistic,
which they surely should be.

As I recall, RDS doesn't give out superuser access, so it may be
hard for you to learn more about what's happening :-(

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/6369212.CF36pTLAQO%40peanuts2

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

26 December 2019, 22:25:22

I am not superuser. Let me see how I can get access to the superuser on RDS.

I will get back to you.

Thanks!

Zhihong


> On Dec 26, 2019, at 3:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Zhihong Zhang <zhihong@gmail.com> writes:
>> I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling
this?
>
> No, I don't think so.  We only collect stats on index expressions, though,
> not simple columns (since those would be duplicative of the underlying
> column's stats).
>
> Hmmm ... looking at the pg_stats view, it has a filter
>
>  WHERE NOT a.attisdropped AND
>    has_column_privilege(c.oid, a.attnum, 'select'::text) AND
>    (c.relrowsecurity = false OR NOT row_security_active(c.oid))
>
> The has_column_privilege test might be getting in the way if you're
> not superuser; it will probably think you have no access privileges
> for the index.  I now recall somebody complaining about that before [1],
> but no fix has been accepted as yet.
>
> Having said that, though, that only accounts for you not seeing the
> entries in the pg_stats view; it doesn't explain why the optimizer
> doesn't see them, assuming they're actually there in pg_statistic,
> which they surely should be.
>
> As I recall, RDS doesn't give out superuser access, so it may be
> hard for you to learn more about what's happening :-(
>
>             regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/6369212.CF36pTLAQO%40peanuts2

Re: Indexing on JSONB field not working

From

Tomas Vondra

Date:

26 December 2019, 23:41:24

On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote:
>I am not superuser. Let me see how I can get access to the superuser on RDS.
>

You can't - that's the point of am environment managed by someone else.

Perhaps you could inspect the data directly in pg_statistic, not through
the pg_stats view. But I haven't tried if that works on RDS.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

26 December 2019, 23:51:34

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote:
>> I am not superuser. Let me see how I can get access to the superuser on RDS.

> You can't - that's the point of am environment managed by someone else.
> Perhaps you could inspect the data directly in pg_statistic, not through
> the pg_stats view. But I haven't tried if that works on RDS.

He definitely shouldn't be able to do that, if not superuser.

However, after experimenting a bit it seems like pg_stats does work for
indexes as long as you are the owner of the index's table, which probably
explains why it took so long for anybody to notice the problem.  Maybe the
OP is trying to look at the stats as a non-owner that's just been GRANT'ed
table access?

            regards, tom lane

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

26 December 2019, 23:57:52

I created the index myself using the same account. Should I be the owner?

Zhihong

> On Dec 26, 2019, at 6:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote:
>>> I am not superuser. Let me see how I can get access to the superuser on RDS.
>
>> You can't - that's the point of am environment managed by someone else.
>> Perhaps you could inspect the data directly in pg_statistic, not through
>> the pg_stats view. But I haven't tried if that works on RDS.
>
> He definitely shouldn't be able to do that, if not superuser.
>
> However, after experimenting a bit it seems like pg_stats does work for
> indexes as long as you are the owner of the index's table, which probably
> explains why it took so long for anybody to notice the problem.  Maybe the
> OP is trying to look at the stats as a non-owner that's just been GRANT'ed
> table access?
>
>             regards, tom lane

Re: Indexing on JSONB field not working

From

Tomas Vondra

Date:

27 December 2019, 02:30:59

On Thu, Dec 26, 2019 at 06:57:52PM -0500, Zhihong Zhang wrote:
>I created the index myself using the same account. Should I be the owner?
>

But Tom explicitly said "owner of the index's table" so it's not
important who created the index. But it's easy to check the owner of
tables (e.g. using \d+ in psql).

BTW please reply in-line (instead of top-posting), it's easier to follow
the discussion that way.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Indexing on JSONB field not working

From

"Zhihong Zhang"

Date:

27 December 2019, 17:28:38

I forgot to emphasize that this problem only happens with JSONB index. The index always works if the same field is copied to a column. That’s how we have been coping with this issue, simply moving the field to a column but now we got too many columns to deal with.

To prove this, I just created a new column 'floatValue' and copied the value from JSONB column (_doc).

alter table assets add column floatValue double precision;

update assets set floatValue = (_doc #> '{floatValue}'::text[])::double precision;

create index floatValue_idx on assets (floatValue);

analyze;

Now the query on the column automatically uses the index,

explain analyze select id, _doc->>'floatValue' from assets where floatValue < 3.0 limit 100;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------

Limit (cost=0.43..28.55 rows=6 width=53) (actual time=0.571..19.557 rows=7 loops=1)

-> Index Scan using floatvalue_idx on assets (cost=0.43..28.55 rows=6 width=53) (actual time=0.569..19.551 rows=7 loops=1)

Index Cond: (floatvalue < '3'::double precision)

Planning Time: 0.322 ms

Execution Time: 19.583 ms

(5 rows)

Since I can’t get stats on the index, wonder if the stats on the column is of any help.

select * from pg_stats where tablename='assets' and attname='floatvalue'

;

-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

schemaname | public

tablename | assets

attname | floatvalue

inherited | f

null_frac | 3.33333e-05

avg_width | 8

n_distinct | -0.999967

most_common_vals |

most_common_freqs |

histogram_bounds | {29.3762423098087,11544.743552804,22327.7248442173,31746.9625733793,42020.0857333839,53353.4693531692,62746.0754476488,71412.5884696841,82250.3692470491,92105.7304367423,101821.479853243,111723.829060793,122510.320041329,132316.885981709,142764.132469893,153306.225780398,162952.687591314,174045.353196561,184753.799811006,195106.332655996,205185.326747596,215598.905924708,225703.200791031,235017.722006887,244950.931984931,255437.318701297,265914.741437882,275904.242414981,286097.948905081,296642.523724586,307137.258350849,317328.266333789,328417.54052788,337816.804181784,348164.874594659,358840.349595994,368590.213824064,378130.563534796,387764.143757522,398006.183560938,407868.383917958,417695.778422058,427624.609787017,437821.539118886,447861.317079514,457633.044105023,467341.55761078,477082.30279386,486012.877896428,496032.587718219,507104.239426553,517111.513298005,526737.637352198,536167.487502098,545932.489912957,555894.987657666,565314.109902829,574438.83176893,584489.227738231,593775.179702789,603509.965352714,613848.30949828,624749.070033431,635505.4769665,644715.944770724,654815.05356729,664933.282416314,673889.273777604,683789.047412574,693903.816863894,703375.991433859,713953.300379217,724327.767267823,733478.933107108,744359.858334064,755353.817716241,764262.578450143,773850.545287132,783374.56099689,793155.808001757,803004.199638963,812858.935445547,822967.322077602,832813.539542258,843135.526403785,853274.697437882,862586.844246835,872572.991997004,882436.727173626,891319.798305631,900905.375834554,910936.662461609,921490.86529389,930878.716986626,940496.629569679,949594.719801098,959338.55464682,969633.623026311,980396.131519228,989476.628601551,999952.361918986}

correlation | -0.00371463

most_common_elems |

most_common_elem_freqs |

elem_count_histogram |

Thanks!

Zhihong

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

27 December 2019, 17:45:03

On Thu, Dec 26, 2019 at 11:58 AM Zhihong Zhang <zhihong@gmail.com> wrote:

The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some are created when table is created. None of them is in pg_stats.

The pg_stats only contains a row for each column, nothing else.

The database is on AWS RDS. Does that make any difference?

I used an RDS 11.5 (not Aurora) instance, and had no trouble getting or seeing the statistics on the expressional index when logged in as the table owner. They were populated by `ANALYZE assets;`, or by `ANALYZE;`, or by just letting auto-analyze run (if there was enough turn over after the index was created to trigger auto-analyze).

Can you do `pg_dump -s -t assets ...` to get a full description of the table? If you don't want to share the full description, make sure whatever simplifications you do to it don't prevent the problem from reproducing.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

27 December 2019, 17:45:32

"Zhihong Zhang" <zhihong@gmail.com> writes:
> I forgot to emphasize that this problem only happens with JSONB index. The index always works if the same field is
copiedto a column. That’s how we have been coping with this issue, simply moving the field to a column but now we got
toomany columns to deal with.

Well, we're still up against the question of why this doesn't work for
you when it does work for everyone else (or at least, everybody who's
responded to this thread). You have yet to show us anything concrete
about your table design, and it seems likely that the explanation is
in some detail that you haven't mentioned.

One thought that comes to mind is an aspect of the permissions
issue: the optimizer won't use index stats unless you have
permissions to select the whole table:

* For simplicity, we insist on the whole
* table being selectable, rather than trying
* to identify which column(s) the index
* depends on. Also require all rows to be
* selectable --- there must be no
* securityQuals from security barrier views
* or RLS policies.

I think we can eliminate the RLS-is-blocking-it idea, because that
would also apply to stats on simple columns. But maybe your user only
has select privilege on some columns of the table in question?

regards, tom lane

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

27 December 2019, 19:43:29

On Dec 27, 2019, at 12:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you do `pg_dump -s -t assets ...` to get a full description of the table? If you don't want to share the full description, make sure whatever simplifications you do to it don't prevent the problem from reproducing.

I created a new table ‘assets_copy’ with only 3 columns. I can still reproduce the problem. The pg_dump output is attached.

Thanks!

Zhihong

--
-- PostgreSQL database dump
---- Dumped from database version 11.4
-- Dumped by pg_dump version 11.6SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;SET default_tablespace = '';SET default_with_oids = false;--
-- Name: assets_copy; Type: TABLE; Schema: public; Owner: postgres
--CREATE TABLE public.assets_copy (
    id character(20),
    _doc jsonb,
    floatvalue double precision
);ALTER TABLE public.assets_copy OWNER TO postgres;--
-- Name: column_float_value_idx; Type: INDEX; Schema: public; Owner: postgres
--CREATE INDEX column_float_value_idx ON public.assets_copy USING btree (floatvalue);--
-- Name: jsonb_float_value_idx; Type: INDEX; Schema: public; Owner: postgres
--CREATE INDEX jsonb_float_value_idx ON public.assets_copy USING btree ((((_doc #> '{floatValue}'::text[]))::double precision));--
-- PostgreSQL database dump complete
--

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

27 December 2019, 19:52:23

On Dec 27, 2019, at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think we can eliminate the RLS-is-blocking-it idea, because that
would also apply to stats on simple columns. But maybe your user only
has select privilege on some columns of the table in question?

The user is the default/vanilla RDS account. We didn’t set any special permissions.

Let me know if you want check any specific permissions. Here are all the permissions I know of,

SELECT grantee, privilege_type FROM information_schema.role_table_grants WHERE table_name='assets_copy';
 grantee  | privilege_type
----------+----------------
 postgres | INSERT
 postgres | SELECT
 postgres | UPDATE
 postgres | DELETE
 postgres | TRUNCATE
 postgres | REFERENCES
 postgres | TRIGGER
(7 rows)

Thanks!

Zhihong

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

27 December 2019, 20:26:17

Finally got stats on the new table assets_copy. Still don’t know what’s the magic. I have 2 indexes in this new table, only this one got stats.

The stats looks the same as the column stats.

Zhihong

select * from pg_stats where tablename='jsonb_float_value_idx';
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | jsonb_float_value_idx
attname                | float8
inherited              | f
null_frac              | 0
avg_width              | 8
n_distinct             | -1
most_common_vals       |
most_common_freqs      |
histogram_bounds       | {11.9074247777462,9841.89659357071,20004.3972581625,30086.5592435002,40730.0088554621,50593.008287251,58767.0910172164,67316.9470392168,77683.9102618396,88379.8920549452,98935.4504272342,109736.490063369,120142.996311188,129663.615487516,139033.944346011,149255.915079266,159277.160651982,168800.76052621,178707.448765635,190234.981942922,200536.492280662,209703.133907169,220037.896186113,230601.808056235,239625.298418105,249577.831011266,259359.933435917,269170.144107193,279207.427985966,289696.279447526,299808.374606073,309782.688505948,319680.196698755,329276.353120804,339375.476818532,349651.550408453,359881.41130656,369298.731908202,380326.490849257,390578.283462673,400972.062721848,411068.622488528,420153.666753322,430431.842803955,439484.844449908,449541.503563523,459902.50306204,469946.910161525,481110.869906843,491827.534046024,501586.132217199,511425.472330302,522126.020863652,532380.575779825,542049.761861563,552384.272217751,562474.489212036,573650.59107542,584696.998354048,594522.320665419,604293.577373028,614212.908316404,623957.396950573,634345.385245979,644869.87143755,654195.504263043,664599.217940122,674438.172951341,683769.1613473,693582.618143409,702202.791813761,712556.770537049,721915.236674249,733364.793937653,743519.58045736,753806.136548519,763560.635503381,772320.672404021,781465.635169297,790334.59816128,801681.783981621,812449.718359858,821786.699350923,832279.333844781,841624.361462891,851525.46595782,861103.327944875,871528.446674347,880865.939892828,889850.093051791,899233.678821474,909849.165473133,919194.979593158,928674.942348152,939167.145173997,949835.113249719,959980.852901936,969834.96658504,979897.553101182,990309.921558946,999946.910422295}
correlation            | 0.00552317
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

27 December 2019, 21:21:09

Zhihong Zhang <zhihong@gmail.com> writes:
>> On Dec 27, 2019, at 12:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Can you do  `pg_dump -s -t assets ...` to get a full description of the table?  If you don't want to share the full
description,make sure whatever simplifications you do to it don't prevent the problem from reproducing. 

> I created a new table ‘assets_copy’ with only 3 columns. I can still reproduce the problem. The pg_dump output is
attached.

I'm a little suspicious of this because it says the table owner is
"postgres", but you say you're not superuser.  Does RDS create a
non-superuser "postgres" account?

This is odd too:

> -- Dumped from database version 11.4
> -- Dumped by pg_dump version 11.6

But anyway, I loaded this dump into v11 as a non-superuser, and
inserted some data, specifically

insert into assets_copy select 'id', ('{"floatValue": ' || random()*1e6 || ', "junk": "z"}')::jsonb, random()*1e6 from
generate_series(1,100000)x; 

and I find that either auto-analyze or a manual ANALYZE will create
stats that I can see in pg_stats and that lead to sane planner
estimates.

Now, I'm testing with a recent 11-branch build, so it's theoretically
possible that some post-11.4 bug fix has changed the behavior, but
I've looked through the git logs quite carefully and I see nothing
that would have affected a query referencing the table directly.
(If you were going through a view or an inheritance/partitioning
parent, that might be a different story.)

It seems like the only remaining theory is that the RDS version
of Postgres differs from community Postgres in some relevant way.
Even that theory has problems explaining why Jeff couldn't duplicate
the behavior on his RDS instance, but we're really out of reasons for
this not to work.  Perhaps it's time for you to take up this question
with RDS support.

            regards, tom lane

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

28 December 2019, 19:10:40

On Fri, Dec 27, 2019 at 3:26 PM Zhihong Zhang <zhihong@gmail.com> wrote:

Finally got stats on the new table assets_copy. Still don’t know what’s the magic. I have 2 indexes in this new table, only this one got stats.

Only one of your indexes is over an expression. The index over a plain column will never have stats, as column stats are stored with the table, not the index.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

28 December 2019, 19:15:29

On Fri, Dec 27, 2019 at 4:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Zhihong Zhang <zhihong@gmail.com> writes:
>> On Dec 27, 2019, at 12:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Can you do `pg_dump -s -t assets ...` to get a full description of the table? If you don't want to share the full description, make sure whatever simplifications you do to it don't prevent the problem from reproducing.

> I created a new table ‘assets_copy’ with only 3 columns. I can still reproduce the problem. The pg_dump output is attached.

I'm a little suspicious of this because it says the table owner is
"postgres", but you say you're not superuser. Does RDS create a
non-superuser "postgres" account?

Yes, the default setting for RDS is to have a user named "postgres" which is as-super-as RDS allows you to get.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

28 December 2019, 19:59:07

Only one of your indexes is over an expression. The index over a plain column will never have stats, as column stats are stored with the table, not the index.

Oh. That explains it. I tweaked a few server parameters, wonder if that made the difference. I changed,

pg_stat_statements.track: ALL

track_activity_query_size: 2048

Now we have stats, can anyone see why the index is not used?

Thanks!

Zhihong

Re: Indexing on JSONB field not working

From

Tomas Vondra

Date:

28 December 2019, 23:56:04

On Sat, Dec 28, 2019 at 02:59:07PM -0500, Zhihong Zhang wrote:
>>
>> Only one of your indexes is over an expression.  The index over a
>> plain column will never have stats, as column stats are stored with
>> the table, not the index.
>>
>
>Oh. That explains it. I tweaked a few server parameters, wonder if that
>made the difference. I changed,
>
>pg_stat_statements.track:    ALL    
>
>track_activity_query_size: 2048
>
>Now we have stats, can anyone see why the index is not used?
>

I don't follow. Neither of these configuration options should affect
which indexes have stats. The expression index should have stats no
matter how these two GUCs are set.

But once again, you're making claims without actually showing us the
data. How do you know we have stats now? Show us pg_stats data and
the explain plans.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

29 December 2019, 01:40:24

The stats was sent in a previous email. Let me know if you can’t find it.

Zhihong

> On Dec 28, 2019, at 6:56 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Sat, Dec 28, 2019 at 02:59:07PM -0500, Zhihong Zhang wrote:
>>>
>>> Only one of your indexes is over an expression.  The index over a
>>> plain column will never have stats, as column stats are stored with
>>> the table, not the index.
>>>
>>
>> Oh. That explains it. I tweaked a few server parameters, wonder if that
>> made the difference. I changed,
>>
>> pg_stat_statements.track:    ALL
>>
>> track_activity_query_size: 2048
>>
>> Now we have stats, can anyone see why the index is not used?
>>
>
> I don't follow. Neither of these configuration options should affect
> which indexes have stats. The expression index should have stats no
> matter how these two GUCs are set.
>
> But once again, you're making claims without actually showing us the
> data. How do you know we have stats now? Show us pg_stats data and
> the explain plans.
>
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

29 December 2019, 16:11:39

On Sat, Dec 28, 2019 at 2:59 PM Zhihong Zhang <zhihong@gmail.com> wrote:

Now we have stats, can anyone see why the index is not used?

You will have to redo the EXPLAIN (ANALYZE, BUFFERS) now that you have stats, under both settings of enable_seqscan, and show us those.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

30 December 2019, 16:31:27

You will have to redo the EXPLAIN (ANALYZE, BUFFERS) now that you have stats, under both settings of enable_seqscan, and show us those.

It’s working now! I really don’t know what’s changed. Other than tweaking a few unrelated parameters, all I did was bouncing the server a few times.

The missing stats for the jsonb index is back also.

We had the same problem on another RDS server in a different cloud also. Let me see if I can get access there and do some comparison.

Here is the explain for the same query that took minutes before,

SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
   ->  Index Scan using float_number_index_path2 on assets  (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
         Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
 Planning Time: 0.274 ms
 Execution Time: 0.205 ms
(5 rows)

SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
   ->  Index Scan using float_number_index_path2 on assets  (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
         Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
 Planning Time: 0.274 ms
 Execution Time: 0.205 ms
(5 rows)

Thanks for all your help! I will keep digging till I find the cause.

Zhihong

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

02 January 2020, 21:49:01

I think the root cause of my problem is that the “CREATE INDEX” with expression wouldn’t generate stats automatically. Running ‘Analyze’ manually solves the problem in most cases but sometimes I have to restart the server.

So I like to report 2 bugs,

1. “CREATE INDEX” with expression should generate stats automatically.

2. Auto-analyze should generate stats if it is not available, regardless of the number of records changed.

We have a large JSONB column. We don’t really know which fields are used by customer so we create indexes for those fields on demand when there is enough usage but it never worked. We were on the brink of switching to another database. Now with those findings, I can implement some workaround. Thanks for your help!

Zhihong

On Dec 30, 2019, at 11:31 AM, Zhihong Zhang <zhihong@gmail.com> wrote:

You will have to redo the EXPLAIN (ANALYZE, BUFFERS) now that you have stats, under both settings of enable_seqscan, and show us those.

It’s working now! I really don’t know what’s changed. Other than tweaking a few unrelated parameters, all I did was bouncing the server a few times.

The missing stats for the jsonb index is back also.

We had the same problem on another RDS server in a different cloud also. Let me see if I can get access there and do some comparison.

Here is the explain for the same query that took minutes before,

SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
   ->  Index Scan using float_number_index_path2 on assets  (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
         Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
 Planning Time: 0.274 ms
 Execution Time: 0.205 ms
(5 rows)

SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
   ->  Index Scan using float_number_index_path2 on assets  (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
         Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
 Planning Time: 0.274 ms
 Execution Time: 0.205 ms
(5 rows)

Thanks for all your help! I will keep digging till I find the cause.

Zhihong

Re: Indexing on JSONB field not working

From

Tom Lane

Date:

02 January 2020, 21:58:14

Zhihong Zhang <zhihong@gmail.com> writes:
> So I like to report 2 bugs,

> 1. “CREATE INDEX” with expression should generate stats automatically.

It does.

> 2. Auto-analyze should generate stats if it is not available, regardless of the number of records changed.

This doesn't square with anyone else's reality either.

You haven't given us one single bit of information about what you are
doing differently that might trigger such unexpected behavior.  But without
that, there's not much we can do to investigate this report.

            regards, tom lane

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

02 January 2020, 22:07:28

> On Jan 2, 2020, at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Zhihong Zhang <zhihong@gmail.com> writes:
>> So I like to report 2 bugs,
>
>> 1. “CREATE INDEX” with expression should generate stats automatically.
>
> It does.
I created an index with JSONB field yesterday. It still didnn’t have stats today till I ran ‘analyze’.

>
>> 2. Auto-analyze should generate stats if it is not available, regardless of the number of records changed.
>
> This doesn't square with anyone else's reality either.

We have auto-analyze running every night and it didn’t create the stats either. I think it’s probably related to the
logicthat it skips analyzing if not enough records changed. There was no database changes at all in our case. 

I can reproduce those problems easily. Just tell me what information you need to help debugging and I will get it for
you.Thanks! 

Zhihong

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

02 January 2020, 22:30:23

On Jan 2, 2020, at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You haven't given us one single bit of information about what you are
doing differently that might trigger such unexpected behavior. But without
that, there's not much we can do to investigate this report.

Attached is the screen capture of my test sequence. This is what I did,

1. Start with the table without the JSONB index. Run select, no stats as expected.

2. Create index

3. Wait days, I actually did step 2 last year :)

4. Run select and still no stats. “Create index” or auto vacuum should take care of this, right?

4. Run ‘analyze’.

5. Run select again and the stats shows up.

Let me know if I should have done anything differently to get the desired result.

Thanks!

Zhihong

catalog_load_test=> select * from pg_stats where tablename='float_number_index_path2';
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem
_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+-----
-----------------
(0 rows)catalog_load_test=> CREATE INDEX float_number_index_path2
catalog_load_test->     ON public.assets USING btree
catalog_load_test->     (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
catalog_load_test->     TABLESPACE pg_default;
CREATE INDEX------------------ Days later -------------------------catalog_load_test=> select * from pg_stats where tablename='float_number_index_path2';
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem
_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+-----
-----------------
(0 rows)
catalog_load_test=> analyze;
WARNING:  skipping "pg_authid" --- only superuser can analyze it
WARNING:  skipping "pg_subscription" --- only superuser can analyze it
WARNING:  skipping "pg_database" --- only superuser can analyze it
WARNING:  skipping "pg_db_role_setting" --- only superuser can analyze it
WARNING:  skipping "pg_tablespace" --- only superuser can analyze it
WARNING:  skipping "pg_pltemplate" --- only superuser can analyze it
WARNING:  skipping "pg_auth_members" --- only superuser can analyze it
WARNING:  skipping "pg_shdepend" --- only superuser can analyze it
WARNING:  skipping "pg_shdescription" --- only superuser can analyze it
WARNING:  skipping "pg_replication_origin" --- only superuser can analyze it
WARNING:  skipping "pg_shseclabel" --- only superuser can analyze it
NOTICE:  no non-null/empty features, unable to compute statistics
NOTICE:  no non-null/empty features, unable to compute statistics
ANALYZE
catalog_load_test=> select * from pg_stats where tablename='float_number_index_path2';
 schemaname |        tablename         | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                                                                                                                                                histogram_bounds                                                                                                                                                                                                      | correl
ation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+--------------------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+--------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------
------+-------------------+------------------------+----------------------
 public     | float_number_index_path2 | float8  | f         |         0 |         8 |  -0.994275 |                  |                   | {67.9828226566315,9880.58233633637,19484.4105280936,29129.853006452
3,38355.4133586586,47072.1824094653,57639.1913928092,67335.2866433561,76439.0416443348,86354.6561449766,96093.4087634087,106792.015489191,117061.504628509,125703.127589077,135128.147900105,145247.848238796,
155961.862299591,166463.26566115,176670.81207037,187789.324205369,197144.483681768,207021.271344274,216668.588574976,227840.67876637,237936.515826732,247922.076378018,257857.841439545,267939.695157111,27702
5.00205487,287662.557791919,297872.847877443,309003.66185233,319923.490285873,330532.386898994,339468.085207045,349151.492118835,358672.737609595,369132.092688233,378833.52348581,388865.450397134,399013.585
876673,407844.387926161,418233.385775238,427318.078000098,438366.677146405,448453.094344586,458905.486389995,468482.088763267,478278.840426356,487119.99412626,495660.125277936,505299.935583025,515099.597163
498,525450.137443841,536038.665566593,546619.640663266,556582.688819617,566576.57166943,576369.696762413,587215.536739677,597477.190662175,606301.207095385,616701.394319534,626550.197601318,635750.317480415
,646002.440713346,655989.156104624,667213.554959744,677401.37828514,687671.223655343,698001.290205866,707360.081840307,716366.450302303,726345.336064696,735977.729782462,745276.737492532,755845.261737704,76
5410.838183016,774972.880259156,785988.09055984,795699.819922447,805251.396726817,815074.041485786,824808.841571212,835754.215717316,846041.257493198,856068.658642471,866092.296782881,876734.56966877,885931
.862983853,896236.94261536,907013.318967074,916161.817498505,927834.809292108,938955.033197999,948850.627522916,958572.782110423,968669.227790087,978541.388176382,989408.961031586,999981.255270541} |  -0.01
26945 |                   |                        |
(1 row)

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

03 January 2020, 13:54:23

On Thu, Jan 2, 2020 at 4:49 PM Zhihong Zhang <zhihong@gmail.com> wrote:

I think the root cause of my problem is that the “CREATE INDEX” with expression wouldn’t generate stats automatically. Running ‘Analyze’ manually solves the problem in most cases but sometimes I have to restart the server.

You told us vociferously that ANALYZE was run after the index was created.

So I like to report 2 bugs,

1. “CREATE INDEX” with expression should generate stats automatically.

I thought we did do something about that in a recent release, having the CREATE INDEX send some message to the stats collector which would cause the autovacuum launcher to analyze it the next time nap-time was up. But I can't find it in the commit log, nor find the behavior. Maybe this was discussed but not implemented.

2. Auto-analyze should generate stats if it is not available, regardless of the number of records changed.

If the table gets auto-analyzed, then it does generate the stats. But it doesn't fire auto-analyze just to generate those stats. Is your complaint that it doesn't run, or that it does run but doesn't collect the stats? If the latter, then no one can reproduce it, other than through permissions issues which apparently don't apply to you.

Cheers,

Jeff

Re: Indexing on JSONB field not working

From

Jeff Janes

Date:

03 January 2020, 14:14:48

On Thu, Jan 2, 2020 at 5:07 PM Zhihong Zhang <zhihong@gmail.com> wrote:

We have auto-analyze running every night and it didn’t create the stats either.

Auto-analyze (ANALYZE fired by the autovacuum launcher) is automatic. How can you choose the time at which it runs ("every night")?

Are you doing something non-standard, like dropping autovacuum_analyze_scale_factor every evening and then increasing it again every morning?

I think it’s probably related to the logic that it skips analyzing if not enough records changed.

Yes. The creation of an index is not something which triggers an auto-analyze. Perhaps it should.

Re: Indexing on JSONB field not working

From

Zhihong Zhang

Date:

03 January 2020, 15:29:04

On Jan 3, 2020, at 8:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

You told us vociferously that ANALYZE was run after the index was created.

I did have that problem but can’t reproduce it right now. I am just trying to concentrate on the issues I can reproduce easily.

We have been dealing with the slow query issue for months. The first advice is always to run vacuum/analyze. That didn’t help. I think things changed when I accidentally restarted the server by applying some configuration changes. Now, bouncing the server always brings the stats back, without running ‘analyze’.

We had the same issue on another server. I made a snapshot of the database before I restarted the server. Anything I can capture to help nail the issue?

Thanks!

Zhihong