Thread: Indexing on JSONB field not working
I have an index on JSONB fields like this,
CREATE INDEX float_number_index_path2
ON public.assets USING btree
(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
TABLESPACE pg_default;
However query doesn’t use it,
explain select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 3;
Limit (cost=0.00..3.24 rows=3 width=53)
-> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)
Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
The version of the database,
"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit"
However, the index works for text field in JSONB.
Let me know if I can provide more information.
Zhihong Zhang
On Fri, Dec 20, 2019 at 04:52:17PM -0500, Zhihong Zhang wrote: >I have an index on JSONB fields like this, > > > >CREATE INDEX float_number_index_path2 > > ON public.assets USING btree > > (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST) > > TABLESPACE pg_default; > > > >However query doesn't use it, > > > >explain select id, _doc->>'floatValue' from assets where (_doc #> >'{floatValue}'::text[])::double precision < 3.0 limit 3; > > > >Limit (cost=0.00..3.24 rows=3 width=53) > > -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) > > Filter: (((_doc #> '{floatValue}'::text[]))::double precision < >'3'::double precision) > > The index scan is likely expected to be more expensive than the plain sequential scan with the LIMIT interrupting it pretty much right away (it's expected to scan only ~0.0003% of the table. You can probably push the database to use the index by disabling sequential scans, i.e. SET enable_seqscan = off; and then doing the explain again. The interesting question however is which of the plans is faster. It's quite possible the database is making the right choice - index scans are not necessarily faster. > >The version of the database, > > > >"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 >20140120 (Red Hat 4.8.2-16), 64-bit" > > > >However, the index works for text field in JSONB. > Well, the text field probably has different statistics, so the sequential scan would have so scan much larger part of the table. Who knows - you haven't shared the execution plans. > > >Let me know if I can provide more information. > Show us explain analyze for both queries, with both index-scan and seq-scan (you'll have to use enable_seqscan and enable_indexscan to force the plan choice). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I have an index on JSONB fields like this,
CREATE INDEX float_number_index_path2
ON public.assets USING btree
(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
TABLESPACE pg_default;
However query doesn’t use it,
On Dec 20, 2019, at 5:57 PM, Jeff Janes <jeff.janes@gmail.com> wrote:On Fri, Dec 20, 2019 at 5:12 PM Zhihong Zhang <zhihong@gmail.com> wrote:I have an index on JSONB fields like this,
CREATE INDEX float_number_index_path2
ON public.assets USING btree
(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
TABLESPACE pg_default;
However query doesn’t use it,
Did you analyze the table after building the index? Expression indexes have their own statistics, but they don't get populated until the table is analyzed.Cheers,Jeff
On Dec 20, 2019, at 5:30 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:On Fri, Dec 20, 2019 at 04:52:17PM -0500, Zhihong Zhang wrote:I have an index on JSONB fields like this,
CREATE INDEX float_number_index_path2
ON public.assets USING btree
(((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
TABLESPACE pg_default;
However query doesn't use it,
explain select id, _doc->>'floatValue' from assets where (_doc #>
'{floatValue}'::text[])::double precision < 3.0 limit 3;
Limit (cost=0.00..3.24 rows=3 width=53)
-> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53)
Filter: (((_doc #> '{floatValue}'::text[]))::double precision <
'3'::double precision)
The index scan is likely expected to be more expensive than the plain
sequential scan with the LIMIT interrupting it pretty much right away
(it's expected to scan only ~0.0003% of the table.
You can probably push the database to use the index by disabling
sequential scans, i.e.
SET enable_seqscan = off;
and then doing the explain again.
The interesting question however is which of the plans is faster. It's
quite possible the database is making the right choice - index scans are
not necessarily faster.
The version of the database,
"PostgreSQL 11.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
20140120 (Red Hat 4.8.2-16), 64-bit"
However, the index works for text field in JSONB.
Well, the text field probably has different statistics, so the
sequential scan would have so scan much larger part of the table. Who
knows - you haven't shared the execution plans.
Let me know if I can provide more information.
Show us explain analyze for both queries, with both index-scan and
seq-scan (you'll have to use enable_seqscan and enable_indexscan to
force the plan choice).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
A few clarifications,1. The index is very effective. If I treat it as text, the index works. With index, the result returns in 1 second. Otherwise, it’s 5 minutes.
2. Removing limit doesn’t change the behavior.
On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote: >Run those 2 EXPLAINs with seqscan off and on. See what difference it makes! > > >SET enable_seqscan = off; >explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; > >"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)" >" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230rows=7 loops=1)" >" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" >"Planning Time: 1.617 ms" >"Execution Time: 0.276 ms" > >SET enable_seqscan = on; >explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; > >"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)" >" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)" >" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" >" Rows Removed by Filter: 2602824" >"Planning Time: 0.283 ms" >"Execution Time: 313501.777 ms" > Well, this confirms what I suspected before - the optimizer believes the seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but that mismatches the actual performance. The question is why ... For LIMIT queries, I can think of two common issues - the cost estimate is computed as a simple linear approximation in the input relation. For example, we know the seqscan is expected to produce 867607 rows with a total cost of 936605, so the cost of producing just 100 rows is 100 * 936605 / 867607 = 107.95 But that assumes a number of things: (a) that the seqscan row estimate is correct, and that (b) the matching rows are uniformly distributed in the table. If it's misestimated, or if the rows are towards the end of the relation (i.e. after doing a most of the costed work) this estimate may be quite off. Can you do explain analyze of the query without the LIMIT? BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could do try using a partial index (if the where condition does not change). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!SET enable_seqscan = off;explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;"Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"" -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"" Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)""Planning Time: 1.617 ms""Execution Time: 0.276 ms"
SET enable_seqscan = on;explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;"Limit (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"" -> Seq Scan on assets (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)"" Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"" Rows Removed by Filter: 2602824""Planning Time: 0.283 ms""Execution Time: 313501.777 ms"
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote: >> "Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)" >> " -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230rows=7 loops=1)" >> " Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" > Well, this confirms what I suspected before - the optimizer believes the > seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but > that mismatches the actual performance. The question is why ... The planner evidently believes that 867607 rows will match the query condition, so it expects that the scan will stop (after collecting 100 rows) very quickly. In reality only 7 rows match, so the scan has to run to completion. This is what's bollixing the plan choice. I suspect that 867607 is just a default estimate, but if ANALYZE has been run then there should be stats for the index column, so why isn't it doing better? When I try a similar case here, I get good estimates: regression=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit (1 row) regression=# create table public.assets(_doc jsonb); CREATE TABLE regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST); CREATE INDEX regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x; INSERT 0 10000 regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; QUERY PLAN ------------------------------------------------------------------------------------------------ Limit (cost=0.00..7.71 rows=100 width=32) -> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32) Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) (3 rows) regression=# analyze assets; ANALYZE regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; QUERY PLAN ---------------------------------------------------------------------------------------------------- Limit (cost=0.29..8.33 rows=2 width=32) -> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32) Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) (3 rows) The "3808" estimate is just a default for '<' with no stats, but with stats I get a dead-on estimate. regards, tom lane
Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing wrong? If it’s a bug, can I have a tracking or ticket number? Thanks! Zhihong > On Dec 21, 2019, at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote: >>> "Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)" >>> " -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230rows=7 loops=1)" >>> " Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" > >> Well, this confirms what I suspected before - the optimizer believes the >> seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but >> that mismatches the actual performance. The question is why ... > > The planner evidently believes that 867607 rows will match the query > condition, so it expects that the scan will stop (after collecting > 100 rows) very quickly. In reality only 7 rows match, so the scan > has to run to completion. This is what's bollixing the plan choice. > > I suspect that 867607 is just a default estimate, but if ANALYZE has > been run then there should be stats for the index column, so why isn't > it doing better? When I try a similar case here, I get good estimates: > > regression=# select version(); > version > --------------------------------------------------------------------------------------------------------- > PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit > (1 row) > > regression=# create table public.assets(_doc jsonb); > CREATE TABLE > regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST); > CREATE INDEX > regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x; > INSERT 0 10000 > regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <3.0 limit 100; > QUERY PLAN > ------------------------------------------------------------------------------------------------ > Limit (cost=0.00..7.71 rows=100 width=32) > -> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32) > Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) > (3 rows) > > regression=# analyze assets; > ANALYZE > regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <3.0 limit 100; > QUERY PLAN > ---------------------------------------------------------------------------------------------------- > Limit (cost=0.29..8.33 rows=2 width=32) > -> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32) > Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) > (3 rows) > > The "3808" estimate is just a default for '<' with no stats, but > with stats I get a dead-on estimate. > > regards, tom lane
Zhihong Zhang <zhihong@gmail.com> writes: > Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing wrong? We're wondering why the optimizer doesn't derive a better rowcount estimate, and waiting on you to provide information about what the relevant pg_stats entries are. regards, tom lane
select * from pg_stats where tablename='assets' and attname='_doc';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_
common_elem_freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------
------------------+----------------------
public | assets | _doc | f | 0 | 18 | -1 | | | | | |
|
(1 row)
On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:Can you show the output of:select * from pg_stats where tablename ='float_number_index_path2'For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.
On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:Can you show the output of:select * from pg_stats where tablename ='float_number_index_path2'For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.
That command yields nothing.
On Dec 26, 2019, at 11:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:On Dec 21, 2019, at 10:17 AM, Jeff Janes <jeff.janes@gmail.com> wrote:Can you show the output of:select * from pg_stats where tablename ='float_number_index_path2'For readability, use the output format which shows the columns down the screen, not across. In psql, that would toggled on with \x.On Thu, Dec 26, 2019 at 11:30 AM Zhihong Zhang <zhihong@gmail.com> wrote:That command yields nothing.Did you use the actual name of the index (in case its actual name differs from what you showed us in the CREATE INDEX)? If you did and it doesn't show anything, then I don't think the table has been ANALYZED since the index was created. Note that ANALYZE and EXPLAIN ANALYZE are different things. You need to run "ANALYZE assets;" or "VACUUM ANALYZE assets;"Cheers,Jeff
Zhihong Zhang <zhihong@gmail.com> writes: > The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some arecreated when table is created. None of them is in pg_stats. You still haven't shown us exactly what you're doing, only asserted that you're doing the right thing, which seems somewhat unlikely given that nobody has been able to reproduce this behavior based on what you've told us so far. One possible gotcha is indicated by this comment in analyze.c: * Open all indexes of the relation, and see if there are any analyzable * columns in the indexes. We do not analyze index columns if there was * an explicit column list in the ANALYZE command, however. If we are that is, if you did something like "ANALYZE assets(_doc);" it would not result in creation of stats for any expression indexes. > The database is on AWS RDS. Does that make any difference? Wouldn't have thought that their version of Postgres deviates much from community Postgres. If you were talking about Aurora or Redshift, I can't speak to what those do. regards, tom lane
I simply ran ‘analyze;’ without table name and I assume that analyzes everything. I just ran ‘analyze assets;’ again butstill no stats on indexes. I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling this? Thanks! Zhihong > On Dec 26, 2019, at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Zhihong Zhang <zhihong@gmail.com> writes: >> The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some arecreated when table is created. None of them is in pg_stats. > > You still haven't shown us exactly what you're doing, only asserted that > you're doing the right thing, which seems somewhat unlikely given that > nobody has been able to reproduce this behavior based on what you've > told us so far. > > One possible gotcha is indicated by this comment in analyze.c: > > * Open all indexes of the relation, and see if there are any analyzable > * columns in the indexes. We do not analyze index columns if there was > * an explicit column list in the ANALYZE command, however. If we are > > that is, if you did something like "ANALYZE assets(_doc);" it would > not result in creation of stats for any expression indexes. > >> The database is on AWS RDS. Does that make any difference? > > Wouldn't have thought that their version of Postgres deviates much > from community Postgres. If you were talking about Aurora or Redshift, > I can't speak to what those do. > > regards, tom lane
Zhihong Zhang <zhihong@gmail.com> writes: > I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling this? No, I don't think so. We only collect stats on index expressions, though, not simple columns (since those would be duplicative of the underlying column's stats). Hmmm ... looking at the pg_stats view, it has a filter WHERE NOT a.attisdropped AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND (c.relrowsecurity = false OR NOT row_security_active(c.oid)) The has_column_privilege test might be getting in the way if you're not superuser; it will probably think you have no access privileges for the index. I now recall somebody complaining about that before [1], but no fix has been accepted as yet. Having said that, though, that only accounts for you not seeing the entries in the pg_stats view; it doesn't explain why the optimizer doesn't see them, assuming they're actually there in pg_statistic, which they surely should be. As I recall, RDS doesn't give out superuser access, so it may be hard for you to learn more about what's happening :-( regards, tom lane [1] https://www.postgresql.org/message-id/flat/6369212.CF36pTLAQO%40peanuts2
I am not superuser. Let me see how I can get access to the superuser on RDS. I will get back to you. Thanks! Zhihong > On Dec 26, 2019, at 3:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Zhihong Zhang <zhihong@gmail.com> writes: >> I looked at pg_stats for all our databases. None of them have stats on indexes. Are there any settings disabling this? > > No, I don't think so. We only collect stats on index expressions, though, > not simple columns (since those would be duplicative of the underlying > column's stats). > > Hmmm ... looking at the pg_stats view, it has a filter > > WHERE NOT a.attisdropped AND > has_column_privilege(c.oid, a.attnum, 'select'::text) AND > (c.relrowsecurity = false OR NOT row_security_active(c.oid)) > > The has_column_privilege test might be getting in the way if you're > not superuser; it will probably think you have no access privileges > for the index. I now recall somebody complaining about that before [1], > but no fix has been accepted as yet. > > Having said that, though, that only accounts for you not seeing the > entries in the pg_stats view; it doesn't explain why the optimizer > doesn't see them, assuming they're actually there in pg_statistic, > which they surely should be. > > As I recall, RDS doesn't give out superuser access, so it may be > hard for you to learn more about what's happening :-( > > regards, tom lane > > [1] https://www.postgresql.org/message-id/flat/6369212.CF36pTLAQO%40peanuts2
On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote: >I am not superuser. Let me see how I can get access to the superuser on RDS. > You can't - that's the point of am environment managed by someone else. Perhaps you could inspect the data directly in pg_statistic, not through the pg_stats view. But I haven't tried if that works on RDS. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote: >> I am not superuser. Let me see how I can get access to the superuser on RDS. > You can't - that's the point of am environment managed by someone else. > Perhaps you could inspect the data directly in pg_statistic, not through > the pg_stats view. But I haven't tried if that works on RDS. He definitely shouldn't be able to do that, if not superuser. However, after experimenting a bit it seems like pg_stats does work for indexes as long as you are the owner of the index's table, which probably explains why it took so long for anybody to notice the problem. Maybe the OP is trying to look at the stats as a non-owner that's just been GRANT'ed table access? regards, tom lane
I created the index myself using the same account. Should I be the owner? Zhihong > On Dec 26, 2019, at 6:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Thu, Dec 26, 2019 at 05:25:22PM -0500, Zhihong Zhang wrote: >>> I am not superuser. Let me see how I can get access to the superuser on RDS. > >> You can't - that's the point of am environment managed by someone else. >> Perhaps you could inspect the data directly in pg_statistic, not through >> the pg_stats view. But I haven't tried if that works on RDS. > > He definitely shouldn't be able to do that, if not superuser. > > However, after experimenting a bit it seems like pg_stats does work for > indexes as long as you are the owner of the index's table, which probably > explains why it took so long for anybody to notice the problem. Maybe the > OP is trying to look at the stats as a non-owner that's just been GRANT'ed > table access? > > regards, tom lane
On Thu, Dec 26, 2019 at 06:57:52PM -0500, Zhihong Zhang wrote: >I created the index myself using the same account. Should I be the owner? > But Tom explicitly said "owner of the index's table" so it's not important who created the index. But it's easy to check the owner of tables (e.g. using \d+ in psql). BTW please reply in-line (instead of top-posting), it's easier to follow the discussion that way. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I forgot to emphasize that this problem only happens with JSONB index. The index always works if the same field is copied to a column. That’s how we have been coping with this issue, simply moving the field to a column but now we got too many columns to deal with.
To prove this, I just created a new column 'floatValue' and copied the value from JSONB column (_doc).
alter table assets add column floatValue double precision;
update assets set floatValue = (_doc #> '{floatValue}'::text[])::double precision;
create index floatValue_idx on assets (floatValue);
analyze;
Now the query on the column automatically uses the index,
explain analyze select id, _doc->>'floatValue' from assets where floatValue < 3.0 limit 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..28.55 rows=6 width=53) (actual time=0.571..19.557 rows=7 loops=1)
-> Index Scan using floatvalue_idx on assets (cost=0.43..28.55 rows=6 width=53) (actual time=0.569..19.551 rows=7 loops=1)
Index Cond: (floatvalue < '3'::double precision)
Planning Time: 0.322 ms
Execution Time: 19.583 ms
(5 rows)
Since I can’t get stats on the index, wonder if the stats on the column is of any help.
select * from pg_stats where tablename='assets' and attname='floatvalue'
;
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | assets
attname | floatvalue
inherited | f
null_frac | 3.33333e-05
avg_width | 8
n_distinct | -0.999967
most_common_vals |
most_common_freqs |
histogram_bounds | {29.3762423098087,11544.743552804,22327.7248442173,31746.9625733793,42020.0857333839,53353.4693531692,62746.0754476488,71412.5884696841,82250.3692470491,92105.7304367423,101821.479853243,111723.829060793,122510.320041329,132316.885981709,142764.132469893,153306.225780398,162952.687591314,174045.353196561,184753.799811006,195106.332655996,205185.326747596,215598.905924708,225703.200791031,235017.722006887,244950.931984931,255437.318701297,265914.741437882,275904.242414981,286097.948905081,296642.523724586,307137.258350849,317328.266333789,328417.54052788,337816.804181784,348164.874594659,358840.349595994,368590.213824064,378130.563534796,387764.143757522,398006.183560938,407868.383917958,417695.778422058,427624.609787017,437821.539118886,447861.317079514,457633.044105023,467341.55761078,477082.30279386,486012.877896428,496032.587718219,507104.239426553,517111.513298005,526737.637352198,536167.487502098,545932.489912957,555894.987657666,565314.109902829,574438.83176893,584489.227738231,593775.179702789,603509.965352714,613848.30949828,624749.070033431,635505.4769665,644715.944770724,654815.05356729,664933.282416314,673889.273777604,683789.047412574,693903.816863894,703375.991433859,713953.300379217,724327.767267823,733478.933107108,744359.858334064,755353.817716241,764262.578450143,773850.545287132,783374.56099689,793155.808001757,803004.199638963,812858.935445547,822967.322077602,832813.539542258,843135.526403785,853274.697437882,862586.844246835,872572.991997004,882436.727173626,891319.798305631,900905.375834554,910936.662461609,921490.86529389,930878.716986626,940496.629569679,949594.719801098,959338.55464682,969633.623026311,980396.131519228,989476.628601551,999952.361918986}
correlation | -0.00371463
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
Thanks!
Zhihong
The index name is correct and I ran ‘analyze' multiple times since the index is created. I have dozen indexes, some are created when table is created. None of them is in pg_stats.The pg_stats only contains a row for each column, nothing else.The database is on AWS RDS. Does that make any difference?
"Zhihong Zhang" <zhihong@gmail.com> writes: > I forgot to emphasize that this problem only happens with JSONB index. The index always works if the same field is copiedto a column. That’s how we have been coping with this issue, simply moving the field to a column but now we got toomany columns to deal with. Well, we're still up against the question of why this doesn't work for you when it does work for everyone else (or at least, everybody who's responded to this thread). You have yet to show us anything concrete about your table design, and it seems likely that the explanation is in some detail that you haven't mentioned. One thought that comes to mind is an aspect of the permissions issue: the optimizer won't use index stats unless you have permissions to select the whole table: * For simplicity, we insist on the whole * table being selectable, rather than trying * to identify which column(s) the index * depends on. Also require all rows to be * selectable --- there must be no * securityQuals from security barrier views * or RLS policies. I think we can eliminate the RLS-is-blocking-it idea, because that would also apply to stats on simple columns. But maybe your user only has select privilege on some columns of the table in question? regards, tom lane
On Dec 27, 2019, at 12:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote:Can you do `pg_dump -s -t assets ...` to get a full description of the table? If you don't want to share the full description, make sure whatever simplifications you do to it don't prevent the problem from reproducing.
--
-- PostgreSQL database dump
---- Dumped from database version 11.4
-- Dumped by pg_dump version 11.6SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;SET default_tablespace = '';SET default_with_oids = false;--
-- Name: assets_copy; Type: TABLE; Schema: public; Owner: postgres
--CREATE TABLE public.assets_copy (
id character(20),
_doc jsonb,
floatvalue double precision
);ALTER TABLE public.assets_copy OWNER TO postgres;--
-- Name: column_float_value_idx; Type: INDEX; Schema: public; Owner: postgres
--CREATE INDEX column_float_value_idx ON public.assets_copy USING btree (floatvalue);--
-- Name: jsonb_float_value_idx; Type: INDEX; Schema: public; Owner: postgres
--CREATE INDEX jsonb_float_value_idx ON public.assets_copy USING btree ((((_doc #> '{floatValue}'::text[]))::double precision));--
-- PostgreSQL database dump complete
--
On Dec 27, 2019, at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:I think we can eliminate the RLS-is-blocking-it idea, because that
would also apply to stats on simple columns. But maybe your user only
has select privilege on some columns of the table in question?
SELECT grantee, privilege_type FROM information_schema.role_table_grants WHERE table_name='assets_copy';
grantee | privilege_type
----------+----------------
postgres | INSERT
postgres | SELECT
postgres | UPDATE
postgres | DELETE
postgres | TRUNCATE
postgres | REFERENCES
postgres | TRIGGER
(7 rows)
select * from pg_stats where tablename='jsonb_float_value_idx';
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | jsonb_float_value_idx
attname | float8
inherited | f
null_frac | 0
avg_width | 8
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {11.9074247777462,9841.89659357071,20004.3972581625,30086.5592435002,40730.0088554621,50593.008287251,58767.0910172164,67316.9470392168,77683.9102618396,88379.8920549452,98935.4504272342,109736.490063369,120142.996311188,129663.615487516,139033.944346011,149255.915079266,159277.160651982,168800.76052621,178707.448765635,190234.981942922,200536.492280662,209703.133907169,220037.896186113,230601.808056235,239625.298418105,249577.831011266,259359.933435917,269170.144107193,279207.427985966,289696.279447526,299808.374606073,309782.688505948,319680.196698755,329276.353120804,339375.476818532,349651.550408453,359881.41130656,369298.731908202,380326.490849257,390578.283462673,400972.062721848,411068.622488528,420153.666753322,430431.842803955,439484.844449908,449541.503563523,459902.50306204,469946.910161525,481110.869906843,491827.534046024,501586.132217199,511425.472330302,522126.020863652,532380.575779825,542049.761861563,552384.272217751,562474.489212036,573650.59107542,584696.998354048,594522.320665419,604293.577373028,614212.908316404,623957.396950573,634345.385245979,644869.87143755,654195.504263043,664599.217940122,674438.172951341,683769.1613473,693582.618143409,702202.791813761,712556.770537049,721915.236674249,733364.793937653,743519.58045736,753806.136548519,763560.635503381,772320.672404021,781465.635169297,790334.59816128,801681.783981621,812449.718359858,821786.699350923,832279.333844781,841624.361462891,851525.46595782,861103.327944875,871528.446674347,880865.939892828,889850.093051791,899233.678821474,909849.165473133,919194.979593158,928674.942348152,939167.145173997,949835.113249719,959980.852901936,969834.96658504,979897.553101182,990309.921558946,999946.910422295}
correlation | 0.00552317
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
Zhihong Zhang <zhihong@gmail.com> writes: >> On Dec 27, 2019, at 12:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Can you do `pg_dump -s -t assets ...` to get a full description of the table? If you don't want to share the full description,make sure whatever simplifications you do to it don't prevent the problem from reproducing. > I created a new table ‘assets_copy’ with only 3 columns. I can still reproduce the problem. The pg_dump output is attached. I'm a little suspicious of this because it says the table owner is "postgres", but you say you're not superuser. Does RDS create a non-superuser "postgres" account? This is odd too: > -- Dumped from database version 11.4 > -- Dumped by pg_dump version 11.6 But anyway, I loaded this dump into v11 as a non-superuser, and inserted some data, specifically insert into assets_copy select 'id', ('{"floatValue": ' || random()*1e6 || ', "junk": "z"}')::jsonb, random()*1e6 from generate_series(1,100000)x; and I find that either auto-analyze or a manual ANALYZE will create stats that I can see in pg_stats and that lead to sane planner estimates. Now, I'm testing with a recent 11-branch build, so it's theoretically possible that some post-11.4 bug fix has changed the behavior, but I've looked through the git logs quite carefully and I see nothing that would have affected a query referencing the table directly. (If you were going through a view or an inheritance/partitioning parent, that might be a different story.) It seems like the only remaining theory is that the RDS version of Postgres differs from community Postgres in some relevant way. Even that theory has problems explaining why Jeff couldn't duplicate the behavior on his RDS instance, but we're really out of reasons for this not to work. Perhaps it's time for you to take up this question with RDS support. regards, tom lane
Finally got stats on the new table assets_copy. Still don’t know what’s the magic. I have 2 indexes in this new table, only this one got stats.
Zhihong Zhang <zhihong@gmail.com> writes:
>> On Dec 27, 2019, at 12:45 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Can you do `pg_dump -s -t assets ...` to get a full description of the table? If you don't want to share the full description, make sure whatever simplifications you do to it don't prevent the problem from reproducing.
> I created a new table ‘assets_copy’ with only 3 columns. I can still reproduce the problem. The pg_dump output is attached.
I'm a little suspicious of this because it says the table owner is
"postgres", but you say you're not superuser. Does RDS create a
non-superuser "postgres" account?
Only one of your indexes is over an expression. The index over a plain column will never have stats, as column stats are stored with the table, not the index.
On Sat, Dec 28, 2019 at 02:59:07PM -0500, Zhihong Zhang wrote: >> >> Only one of your indexes is over an expression. The index over a >> plain column will never have stats, as column stats are stored with >> the table, not the index. >> > >Oh. That explains it. I tweaked a few server parameters, wonder if that >made the difference. I changed, > >pg_stat_statements.track: ALL > >track_activity_query_size: 2048 > >Now we have stats, can anyone see why the index is not used? > I don't follow. Neither of these configuration options should affect which indexes have stats. The expression index should have stats no matter how these two GUCs are set. But once again, you're making claims without actually showing us the data. How do you know we have stats now? Show us pg_stats data and the explain plans. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The stats was sent in a previous email. Let me know if you can’t find it. Zhihong > On Dec 28, 2019, at 6:56 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sat, Dec 28, 2019 at 02:59:07PM -0500, Zhihong Zhang wrote: >>> >>> Only one of your indexes is over an expression. The index over a >>> plain column will never have stats, as column stats are stored with >>> the table, not the index. >>> >> >> Oh. That explains it. I tweaked a few server parameters, wonder if that >> made the difference. I changed, >> >> pg_stat_statements.track: ALL >> >> track_activity_query_size: 2048 >> >> Now we have stats, can anyone see why the index is not used? >> > > I don't follow. Neither of these configuration options should affect > which indexes have stats. The expression index should have stats no > matter how these two GUCs are set. > > But once again, you're making claims without actually showing us the > data. How do you know we have stats now? Show us pg_stats data and > the explain plans. > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Now we have stats, can anyone see why the index is not used?
You will have to redo the EXPLAIN (ANALYZE, BUFFERS) now that you have stats, under both settings of enable_seqscan, and show us those.
SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
-> Index Scan using float_number_index_path2 on assets (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
Planning Time: 0.274 ms
Execution Time: 0.205 ms
(5 rows)
SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
-> Index Scan using float_number_index_path2 on assets (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
Planning Time: 0.274 ms
Execution Time: 0.205 ms
(5 rows)
On Dec 30, 2019, at 11:31 AM, Zhihong Zhang <zhihong@gmail.com> wrote:You will have to redo the EXPLAIN (ANALYZE, BUFFERS) now that you have stats, under both settings of enable_seqscan, and show us those.It’s working now! I really don’t know what’s changed. Other than tweaking a few unrelated parameters, all I did was bouncing the server a few times.The missing stats for the jsonb index is back also.We had the same problem on another RDS server in a different cloud also. Let me see if I can get access there and do some comparison.Here is the explain for the same query that took minutes before,SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
-> Index Scan using float_number_index_path2 on assets (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
Planning Time: 0.274 ms
Execution Time: 0.205 ms
(5 rows)SET enable_seqscan = on;
SET
catalog_load_test=> explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..28.55 rows=6 width=53) (actual time=0.046..0.177 rows=7 loops=1)
-> Index Scan using float_number_index_path2 on assets (cost=0.43..28.55 rows=6 width=53) (actual time=0.045..0.172 rows=7 loops=1)
Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
Planning Time: 0.274 ms
Execution Time: 0.205 ms
(5 rows)Thanks for all your help! I will keep digging till I find the cause.Zhihong
Zhihong Zhang <zhihong@gmail.com> writes: > So I like to report 2 bugs, > 1. “CREATE INDEX” with expression should generate stats automatically. It does. > 2. Auto-analyze should generate stats if it is not available, regardless of the number of records changed. This doesn't square with anyone else's reality either. You haven't given us one single bit of information about what you are doing differently that might trigger such unexpected behavior. But without that, there's not much we can do to investigate this report. regards, tom lane
> On Jan 2, 2020, at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Zhihong Zhang <zhihong@gmail.com> writes: >> So I like to report 2 bugs, > >> 1. “CREATE INDEX” with expression should generate stats automatically. > > It does. I created an index with JSONB field yesterday. It still didnn’t have stats today till I ran ‘analyze’. > >> 2. Auto-analyze should generate stats if it is not available, regardless of the number of records changed. > > This doesn't square with anyone else's reality either. We have auto-analyze running every night and it didn’t create the stats either. I think it’s probably related to the logicthat it skips analyzing if not enough records changed. There was no database changes at all in our case. I can reproduce those problems easily. Just tell me what information you need to help debugging and I will get it for you.Thanks! Zhihong
On Jan 2, 2020, at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:You haven't given us one single bit of information about what you are
doing differently that might trigger such unexpected behavior. But without
that, there's not much we can do to investigate this report.
catalog_load_test=> select * from pg_stats where tablename='float_number_index_path2';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem
_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+-----
-----------------
(0 rows)catalog_load_test=> CREATE INDEX float_number_index_path2
catalog_load_test-> ON public.assets USING btree
catalog_load_test-> (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST)
catalog_load_test-> TABLESPACE pg_default;
CREATE INDEX------------------ Days later -------------------------catalog_load_test=> select * from pg_stats where tablename='float_number_index_path2';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem
_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+-----
-----------------
(0 rows)
catalog_load_test=> analyze;
WARNING: skipping "pg_authid" --- only superuser can analyze it
WARNING: skipping "pg_subscription" --- only superuser can analyze it
WARNING: skipping "pg_database" --- only superuser can analyze it
WARNING: skipping "pg_db_role_setting" --- only superuser can analyze it
WARNING: skipping "pg_tablespace" --- only superuser can analyze it
WARNING: skipping "pg_pltemplate" --- only superuser can analyze it
WARNING: skipping "pg_auth_members" --- only superuser can analyze it
WARNING: skipping "pg_shdepend" --- only superuser can analyze it
WARNING: skipping "pg_shdescription" --- only superuser can analyze it
WARNING: skipping "pg_replication_origin" --- only superuser can analyze it
WARNING: skipping "pg_shseclabel" --- only superuser can analyze it
NOTICE: no non-null/empty features, unable to compute statistics
NOTICE: no non-null/empty features, unable to compute statistics
ANALYZE
catalog_load_test=> select * from pg_stats where tablename='float_number_index_path2';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correl
ation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+--------------------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+--------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------
------+-------------------+------------------------+----------------------
public | float_number_index_path2 | float8 | f | 0 | 8 | -0.994275 | | | {67.9828226566315,9880.58233633637,19484.4105280936,29129.853006452
3,38355.4133586586,47072.1824094653,57639.1913928092,67335.2866433561,76439.0416443348,86354.6561449766,96093.4087634087,106792.015489191,117061.504628509,125703.127589077,135128.147900105,145247.848238796,
155961.862299591,166463.26566115,176670.81207037,187789.324205369,197144.483681768,207021.271344274,216668.588574976,227840.67876637,237936.515826732,247922.076378018,257857.841439545,267939.695157111,27702
5.00205487,287662.557791919,297872.847877443,309003.66185233,319923.490285873,330532.386898994,339468.085207045,349151.492118835,358672.737609595,369132.092688233,378833.52348581,388865.450397134,399013.585
876673,407844.387926161,418233.385775238,427318.078000098,438366.677146405,448453.094344586,458905.486389995,468482.088763267,478278.840426356,487119.99412626,495660.125277936,505299.935583025,515099.597163
498,525450.137443841,536038.665566593,546619.640663266,556582.688819617,566576.57166943,576369.696762413,587215.536739677,597477.190662175,606301.207095385,616701.394319534,626550.197601318,635750.317480415
,646002.440713346,655989.156104624,667213.554959744,677401.37828514,687671.223655343,698001.290205866,707360.081840307,716366.450302303,726345.336064696,735977.729782462,745276.737492532,755845.261737704,76
5410.838183016,774972.880259156,785988.09055984,795699.819922447,805251.396726817,815074.041485786,824808.841571212,835754.215717316,846041.257493198,856068.658642471,866092.296782881,876734.56966877,885931
.862983853,896236.94261536,907013.318967074,916161.817498505,927834.809292108,938955.033197999,948850.627522916,958572.782110423,968669.227790087,978541.388176382,989408.961031586,999981.255270541} | -0.01
26945 | | |
(1 row)
I think the root cause of my problem is that the “CREATE INDEX” with expression wouldn’t generate stats automatically. Running ‘Analyze’ manually solves the problem in most cases but sometimes I have to restart the server.
So I like to report 2 bugs,1. “CREATE INDEX” with expression should generate stats automatically.
I thought we did do something about that in a recent release, having the CREATE INDEX send some message to the stats collector which would cause the autovacuum launcher to analyze it the next time nap-time was up. But I can't find it in the commit log, nor find the behavior. Maybe this was discussed but not implemented.
2. Auto-analyze should generate stats if it is not available, regardless of the number of records changed.
We have auto-analyze running every night and it didn’t create the stats either.
I think it’s probably related to the logic that it skips analyzing if not enough records changed.
On Jan 3, 2020, at 8:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:You told us vociferously that ANALYZE was run after the index was created.