Performance issue on GIN index with gin_trgm_ops index column - Mailing list pgsql-general

From Lars Vonk
Subject Performance issue on GIN index with gin_trgm_ops index column
Date
Msg-id CAMX1ThhJswV7Mi19qUArLwHQqWoTsXgqMLuGnHoe6kAdbZf+hA@mail.gmail.com
Whole thread Raw
List pgsql-general
 Hi all,

 We are investigating a performance issue with searching on a GIN with gin_trgm_ops indexed column. This specific (organization_id,aggregate_type) has in total 19K records and the (organization_id) has in total 30K records. The search record table has in total 38M records.

 The table and index are defined as follows:

       Column      |       Type        | Collation | Nullable |                               Default
  -----------------+-------------------+-----------+----------+---------------------------------------------------------------------
   id              | bigint            |           | not null | nextval('search_records_id_seq'::regclass)
   organization_id | uuid              |           | not null |
   aggregate_id    | uuid              |           | not null |
   aggregate_type  | character varying |           | not null |
   document        | text              |           |          |
  Indexes:
      "search_records_pkey" PRIMARY KEY, btree (id)
      "search_records_keys" UNIQUE, btree (organization_id, aggregate_id, aggregate_type)
      "search_records_btree_gin" gin ((organization_id::character varying), aggregate_type, document gin_trgm_ops)


The query we execute is:

  select aggregate_id from search_records where organization_id::varchar = '975097c5-e760-4603-9236-fcf2e8580a7c' and aggregate_type = 'FooRecord' and document ilike '%user.name@gmail.com%';

Resulting in the following plan:

   Bitmap Heap Scan on search_records  (cost=2184.00..2188.02 rows=1 width=104) (actual time=4332.007..4332.008 rows=1 loops=1)
     Recheck Cond: ((((organization_id)::character varying)::text = '975097c5-e760-4603-9236-fcf2e8580a7c'::text) AND ((aggregate_type)::text = 'FooRecord'::text) AND (document ~~* '%user.name@gmail.com%'::text))
     Heap Blocks: exact=1
     Buffers: shared hit=23920 read=9752
     I/O Timings: read=4017.360
     ->  Bitmap Index Scan on search_records_btree_gin  (cost=0.00..2184.00 rows=1 width=0) (actual time=4331.987..4331.987 rows=1 loops=1)
           Index Cond: ((((organization_id)::character varying)::text = '975097c5-e760-4603-9236-fcf2e8580a7c'::text) AND ((aggregate_type)::text = 'FooRecord'::text) AND (document ~~* '%user.name@gmail.com%'::text))
           Buffers: shared hit=23920 read=9751
           I/O Timings: read=4017.355
   Planning Time: 0.268 ms
   Execution Time: 4332.030 ms
  (11 rows)


We are running on Postgres RDS with engine version 12.8 with 32GB memory and 8GB shared_buffer. We have 442GB of 2000GB diskspace left.

Sometimes we also have queries for this particular customer which take more than 20 seconds. The content of the document in those cases are similar to:

   User Name Kees postgresstreet Amsterdam 1000 AA user.name@gmail.com 1234

Are we doing something wrong? I find the I/O timings quite high, does this mean that it took 4000MS to read the 9752 blocks from the disk?

Any other tips and or suggestions are welcome.

Kind regards,
Lars Vonk

pgsql-general by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Feature request: psql --idle
Next
From: Michael Nolan
Date:
Subject: Re: Feature request: psql --idle