Re: insert and query performance on big string table with pg_trgm - Mailing list pgsql-performance

From Matthew Hall
Subject Re: insert and query performance on big string table with pg_trgm
Date
Msg-id 351E3FBB-7DFC-4BB0-841E-E7AB16DD57A3@mhcomputing.net
Whole thread Raw
In response to Re: insert and query performance on big string table with pg_trgm  (Matthew Hall <mhall@mhcomputing.net>)
Responses Re: insert and query performance on big string table with pg_trgm  (Sergei Kornilov <sk@zsrv.org>)
List pgsql-performance
On Nov 21, 2017, at 12:05 AM, Matthew Hall <mhall@mhcomputing.net> wrote:
>> Do you really need the artificial primary key, when you already have another column that would be used as the
primarykey?  If you need to use this it a foreign key in another type, then very well might.  But maintaining two
uniqueindexes doesn't come free. 
>
> OK, fair enough, I'll test with it removed and see what happens.

With the integer primary key removed, it still takes ~9 hours to load the table, so it didn't seem to make a big
difference.

> Fixed it; I will re-test w/ 1 GB. Have you got any recommended values so I don't screw it up?

I also took this step for maintenance_work_mem.

Queries on the table still take a long time with the PK removed:

# explain (analyze, buffers) select * from huge_table where value ilike '%yahoo%';

 Bitmap Heap Scan on huge_table  (cost=593.72..68828.97 rows=18803 width=25) (actual time=3224.100..70059.839
rows=20909loops=1) 
   Recheck Cond: ((value)::text ~~* '%yahoo%'::text)
   Rows Removed by Index Recheck: 17
   Heap Blocks: exact=6682
   Buffers: shared hit=544 read=6760 dirtied=4034
   I/O Timings: read=69709.611
   ->  Bitmap Index Scan on huge_table_value_trgm_idx  (cost=0.00..589.02 rows=18803 width=0) (actual
time=3216.545..3216.545rows=20926 loops=1) 
         Index Cond: ((value)::text ~~* '%yahoo%'::text)
         Buffers: shared hit=352 read=270
         I/O Timings: read=3171.872
 Planning time: 0.283 ms
 Execution time: 70065.157 ms
(12 rows)

The slow process during inserts is:

postgres: username dbname [local] INSERT

The slow statement example is:

2017-12-06 04:27:11 UTC [16085-10378] username@dbname LOG:  duration: 5028.190 ms  plan:
        Query Text: INSERT INTO huge_table (value) VALUES
        .... 5000 values at once ...
        ON CONFLICT (value) DO NOTHING
        Insert on huge_table  (cost=0.00..75.00 rows=5000 width=40)
          Conflict Resolution: NOTHING
          Conflict Arbiter Indexes: huge_table_value_idx
          ->  Values Scan on "*VALUES*"  (cost=0.00..75.00 rows=5000 width=40)

> What is the size of the table and the gin index?

The table is 10 GB. The gin index is 5.8 GB.

> [From Gabor Szucs] [H]ow about adding a hash value column and creating the unique index on that one? May block some
falseduplicates but the unique index would be way smaller, speeding up inserts. 

The mean length of the input items is about 18 bytes. The max length of the input items is about 67 bytes. The size of
themd5 would of course be 16 bytes. I'm testing it now, and I'll write another update. 

Matthew.

pgsql-performance by date:

Previous
From: Aaron Werman
Date:
Subject: Re: Half billion records in one table? RDS
Next
From: Vitaliy Garnashevich
Date:
Subject: Re: Bitmap scan is undercosted?