Re: How to speed up pg_trgm / gin index scan - Mailing list pgsql-general
From | Jaime Casanova |
---|---|
Subject | Re: How to speed up pg_trgm / gin index scan |
Date | |
Msg-id | CAJKUy5hOcKZe6dtZPuyt8+wCBUOyRu-68uLTZxUuUs_1wnk09Q@mail.gmail.com Whole thread Raw |
In response to | How to speed up pg_trgm / gin index scan (Christian Ramseyer <rc@networkz.ch>) |
Responses |
Re: How to speed up pg_trgm / gin index scan
|
List | pgsql-general |
On Mon, Jun 22, 2015 at 6:51 AM, Christian Ramseyer <rc@networkz.ch> wrote: > > DM=# \d+ logs_01 > > Column | Type | > --------------+-----------------------------+ > host | character varying(255) | > facility | character varying(10) | > priority | character varying(10) | > tag | character varying(255) | > log_date | timestamp without time zone | > program | character varying(255) | > msg | text | > seq | bigint | > > Indexes: > "logs_01_pkey" PRIMARY KEY, btree (seq) > "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops) > "logs_01_date_index" btree (log_date) > "tridx_logs_01_msg" gin (msg gin_trgm_ops) > > > DM=# select count(*) from logs_01; > count > ---------- > 83052864 > > > I'd like to provide a fast "like %x%" search on the msg column, hence I added a trigram based gin index on it. It is around60 GB on the 35 GB table: > > DM=# select count(*) from logs_01; > count > ---------- > 83052864 > > > DM=# \dt+ logs_01 > List of relations > Schema | Name | Type | Owner | Size | Description > --------+---------+-------+----------+-------+------------- > public | logs_01 | table | postgres | 35 GB | > > DM=# \di+ tridx_logs_01_msg > List of relations > Schema | Name | Type | Owner | Table | Size | Description > --------+-------------------+-------+----------+---------+-------+------------- > public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB | > > What version of postgres is this? GIN indexes improved a lot in 9.4, they use less than half the space and have doubled the speed (on average). Now, whatever version you have; a GIN index has two data structures, the main one in which the index entries are stored as key-value pairs (please someone correct my description of the situation) and a pending list, which is a temporary unsorted list of pending entries in which all the newly inserted tuples arrive until a VACUUM (or until the pending list grows upto work_mem) moves that list into the main structure. That happens to avoid the penalty of inserting new rows in the main structure which could be expensive. But while the pending list grows the speed of the index decreases. And because you have work_mem in 16Gb your pending list is possibly growing without control. if you have 9.3 or superior you can know how big is that pending list installing pgstattuple. CREATE EXTENSION pgstattuple; SELECT * FROM pgstatginindex('tridx_logs_01_msg'); NOTE: remember that pending_pages is expressed in 8kb-pages if that is the problem or if you are in <= 9.2 then try VACUUM the table -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación
pgsql-general by date: