Re: How to speed up pg_trgm / gin index scan - Mailing list pgsql-general

From Jaime Casanova
Subject Re: How to speed up pg_trgm / gin index scan
Date
Msg-id CAJKUy5hOcKZe6dtZPuyt8+wCBUOyRu-68uLTZxUuUs_1wnk09Q@mail.gmail.com
Whole thread Raw
In response to How to speed up pg_trgm / gin index scan  (Christian Ramseyer <rc@networkz.ch>)
Responses Re: How to speed up pg_trgm / gin index scan
List pgsql-general
On Mon, Jun 22, 2015 at 6:51 AM, Christian Ramseyer <rc@networkz.ch> wrote:
>
> DM=# \d+ logs_01
>
>     Column    |            Type             |
> --------------+-----------------------------+
>  host         | character varying(255)      |
>  facility     | character varying(10)       |
>  priority     | character varying(10)       |
>  tag          | character varying(255)      |
>  log_date     | timestamp without time zone |
>  program      | character varying(255)      |
>  msg          | text                        |
>  seq          | bigint                      |
>
> Indexes:
>     "logs_01_pkey" PRIMARY KEY, btree (seq)
>     "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops)
>     "logs_01_date_index" btree (log_date)
>     "tridx_logs_01_msg" gin (msg gin_trgm_ops)
>
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> I'd like to provide a fast "like %x%" search on the msg column, hence I added a trigram based gin index on it. It is
around60 GB on the 35 GB table: 
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> DM=# \dt+ logs_01
>                      List of relations
>  Schema |  Name   | Type  |  Owner   | Size  | Description
> --------+---------+-------+----------+-------+-------------
>  public | logs_01 | table | postgres | 35 GB |
>
> DM=# \di+ tridx_logs_01_msg
>                                List of relations
>  Schema |       Name        | Type  |  Owner   |  Table  | Size  | Description
> --------+-------------------+-------+----------+---------+-------+-------------
>  public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB |
>
>

What version of postgres is this? GIN indexes improved a lot in 9.4,
they use less than half the space and have doubled the speed (on
average).

Now, whatever version you have; a GIN index has two data structures,
the main one in which the index entries are stored as key-value pairs
(please someone correct my description of the situation) and a pending
list, which is a temporary unsorted list of pending entries in which
all the newly inserted tuples arrive until a VACUUM (or until the
pending list grows upto work_mem) moves that list into the main
structure.

That happens to avoid the penalty of inserting new rows in the main
structure which could be expensive.
But while the pending list grows the speed of the index decreases. And
because you have work_mem in 16Gb your pending list is possibly
growing without control.

if you have 9.3 or superior you can know how big is that pending list
installing pgstattuple.

CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatginindex('tridx_logs_01_msg');

NOTE: remember that pending_pages is expressed in 8kb-pages

if that is the problem or if you are in <= 9.2 then try VACUUM the table

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


pgsql-general by date:

Previous
From: Christian Ramseyer
Date:
Subject: Re: How to speed up pg_trgm / gin index scan
Next
From: Suresh Raja
Date:
Subject: Re: extracting PII data and transforming it across table.