Re: [PERFORM] performance issue with bitmap index scans on hugeamounts of big jsonb documents - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: [PERFORM] performance issue with bitmap index scans on hugeamounts of big jsonb documents
Date
Msg-id CAHyXU0zY8EwNSFYaNrzOpK-vuWruBTeZSyjLcV07VREAPkFs4g@mail.gmail.com
Whole thread Raw
In response to performance issue with bitmap index scans on huge amounts of big jsonb documents  (Marc-Olaf Jaschke <marc-olaf.jaschke@s24.com>)
List pgsql-performance
On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
<marc-olaf.jaschke@s24.com> wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big jsonb documents.
>
>
> ===== Background =====
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> ===== Example =====
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 100000) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> --  query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 rows=1 loops=1)
>   ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 width=0) (actual time=16.496..617.431
rows=50000loops=1) 
>         Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
>         Heap Blocks: exact=637
>         ->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  (cost=0.00..120.75 rows=100 width=0) (actual
time=16.371..16.371rows=50000 loops=1) 
>               Index Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead  Shared Object     Symbol
>   51,06%  postgres          [.] pglz_decompress
>    7,33%  libc-2.12.so      [.] memcpy

Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data.  See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

Naturally this is a huge tradeoff so do some careful analysis before
making the change.

merlin


pgsql-performance by date:

Previous
From: Eric Jiang
Date:
Subject: [PERFORM] Querying with multicolumn index
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: [PERFORM] Querying with multicolumn index