BUG #19443: GIN jsonb_path_ops index not used when many rows contain nested null values (PostgreSQL 18.3) - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #19443: GIN jsonb_path_ops index not used when many rows contain nested null values (PostgreSQL 18.3)
Date
Msg-id 19443-a8d2aef8b11ea452@postgresql.org
Whole thread Raw
Responses Re: BUG #19443: GIN jsonb_path_ops index not used when many rows contain nested null values (PostgreSQL 18.3)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      19443
Logged by:          Xianghang Zheng
Email address:      zheng_xianghang@163.com
PostgreSQL version: 18.3
Operating system:   Linux x86_64
Description:

I believe I've found a planner bug in PostgreSQL 18.3 where a GIN index
using
jsonb_path_ops is not selected for @? queries when the jsonb column contains
a large number of rows with nested null values.
Problem Description
-------------------
When a jsonb column has a high proportion of rows with {"a": null}, and a
small
subset with valid nested paths like {"a":{"b":{"c":1}}}, the query planner
chooses a sequential scan instead of using a jsonb_path_ops GIN index for
the
path query @? '$.a.b.c' LIMIT 10.
This leads to unexpected performance degradation.
Version Information
-------------------
PostgreSQL 18.3
Platform
--------
Linux x86_64
Complete Reproducible Test Case
-------------------------------
-- Create test table
DROP TABLE IF EXISTS test_jsonb;
CREATE TABLE test_jsonb (
    id serial primary key,
    data jsonb not null
);
-- Insert 100,000 rows: 90% null, 10% valid nested JSON
INSERT INTO test_jsonb (data)
SELECT
    CASE WHEN random() < 0.1 THEN
        '{"a":{"b":{"c":1}}}'::jsonb
    ELSE
        '{"a":null}'::jsonb
    END
FROM generate_series(1,100000);
-- Create jsonb_path_ops index
CREATE INDEX idx_test_jsonb ON test_jsonb USING gin (data jsonb_path_ops);
-- Query that should use the index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM test_jsonb
WHERE data @? '$.a.b.c'
LIMIT 10;
Actual Output
-------------
 Limit  (cost=0.00..19.37 rows=10 width=36) (actual time=0.067..0.117
rows=10 loops=1)
   Buffers: shared hit=2
   ->  Seq Scan on test_jsonb  (cost=0.00..1937.00 rows=1000 width=36)
(actual time=0.060..0.107 rows=10 loops=1)
         Filter: (data @? '$."a"."b"."c"'::jsonpath)
         Rows Removed by Filter: 110
         Buffers: shared hit=2
 Planning Time: 0.592 ms
 Execution Time: 0.152 ms
The planner chooses a sequential scan even though a valid jsonb_path_ops
index exists.
Expected Behavior
-----------------
The planner should use the GIN index for the path query.
Additional Observations
-----------------------
1. The index works correctly when forced with enable_seqscan = off.
2. The default jsonb_ops opclass does not have this problem.
3. The issue appears to be caused by null scalar values not being indexed by
jsonb_path_ops,
   which makes the planner avoid the index due to misestimation.
Impact
------
This bug affects production systems using jsonb_path_ops for path queries
on tables with many null-containing JSON structures, causing unexpected
full table scans.
Workaround
----------
1. Use the default jsonb_ops opclass
2. Use a partial index excluding nulls
3. Force index usage with enable_seqscan = off
I'm happy to provide more details or run further tests if needed.





pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #19442: PL/pgSQL: domain over composite type bypasses type validation when assigning NULL (PostgreSQL 18.3)
Next
From: PG Bug reporting form
Date:
Subject: BUG #19444: conkey field empty for domain NOT NULL constraint in pg_constraint (18.3)