Re: Is it possible to have a "fast-write" Index? - Mailing list pgsql-hackers

From deavid
Subject Re: Is it possible to have a "fast-write" Index?
Date
Msg-id CAFR-75teFn5Uneq=gTaDrmYhEyUqSVGRosMWfsf7iw1AAJoNEg@mail.gmail.com
Whole thread Raw
In response to Re: Is it possible to have a "fast-write" Index?  (deavid <deavidsedice@gmail.com>)
Responses Re: Is it possible to have a "fast-write" Index?  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
I did another try on BRIN and GIN indexes, and I compared to regular btree indexes. Now i have 16M rows to do the test.

The numbers seem to be good. Both GIN and BRIN seem good options for certain tables with more writes than reads (Specially BRIN is very good)

I want to share with you my test; I used real-world data, but i didn't had time to do something accurate or real-word uses. I know the methodology is not enough, and maybe some calculations on the spreadsheet are wrong. I tried to do my best.

I'm using an SSD and I'm trying to compare CPU cost, not I/O.

In short, the results were: (compared to btree)
- INSERT: GIN is 50% faster; BRIN is 6x faster. This is the best scenario.
- UPDATE: each case has a winner; for big updates BRIN is 10x faster and GIN is 25x faster. For small updates (most real world cases) BTREE is always the winner; but BRIN gives some good results too.
- DELETE: Almost no difference between the three. 
- SELECT: BTREE here is the winner. BRIN is 10% slower, and GIN performance seems a bit random.

VACUUM, ANALYZE and other tasks are 6x faster with BRIN, 50% faster with GIN. 
Index sizes are 50% smaller with GIN, but with BRIN they are very very small   

Hope you find useful these numbers.     


El sáb., 13 jun. 2015 a las 11:41, deavid (<deavidsedice@gmail.com>) escribió:
Sorry; Because some misconfiugration vacuum and analyze were'nt working. Now I'm getting better numbers for BRIN indexes where there are zero rows to match.

El sáb., 13 jun. 2015 a las 3:17, deavid (<deavidsedice@gmail.com>) escribió:
So I just ran a test case for hash, btree, gin_btree and brin indexes. Also without indexes, and without primary keys.
* Testing "deliverynotes" table. 
  - Definition and use case:
It is a table contaning real delivery note headers of several years
It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data 
excluding indexes. Since is a table visible for users, almost every 
column can be searched so we need lots of indexes. We do not need 
searches to be the fastest possible, we only need to accelerate a 
bit our user searches; without harming too much writes.
  - Things to test:
- measure index creation times.
- measure index space.
- with indexes but without primary key
- with everything
- Create fully, delete everything and Insert again data in blocks
- Test updates for recent data

I attached the logs for every test, if anyone wants to see what i'm exactly testing.
This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ Agility 3). I'm trying to measure CPU time, not I/O time, so some configurations and tests are specific to avoid as much as IO as I can. 
I'm using a dev build for Postgresql 9.5 downloaded from git sources.

Conclusions:
- Gin_btree seems slower in almost every case. It's writes are marginally better than regular btrees even when using work_mem=160MB. (May be 20% faster than btree). They are smaller than I thought.
- BRIN indexes seem very fast for writes. For selects maybe is a blend between having indexes and don't having them. They don't recognize that some values are simply out of range of indexed values, and that's a pity. If the values we want are packed together I guess I would get even better results.
- Primary keys and uniqueness checks doesn't seem to make any difference here.
- Having no indexes at all is faster than I imagined. (Sometimes it beats BRIN or Btree) Maybe because the IO here is faster than usual.
- Hash indexes: i tried to do something, but they take too much time to build and i don't know why. If creates are slow, updates should be slow too. I'm not going to test them again.

And finally, don't know why but i couldn't vacuum or analyze tables. It always get stalled without doing anything; so i had to comment every vacuum. Maybe there is a bug in this dev version or i misconfigured something.

El vie., 12 jun. 2015 a las 7:27, Simon Riggs (<simon@2ndquadrant.com>) escribió:
On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes.
Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table. 
I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind.

Some people already asked for "delayed write" indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects). 

This is exactly the use case and mechanism for BRIN indexes.
 
--
Simon Riggs                http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Inheritance planner CPU and memory usage change since 9.3.2
Next
From: Jim Nasby
Date:
Subject: Re: Is it possible to have a "fast-write" Index?