Thread: Is it possible to have a "fast-write" Index?

Is it possible to have a "fast-write" Index?

From

deavid

Date:

05 June 2015, 16:07:58

There are several use cases where I see useful an index, but adding it will slow too much inserts and updates.

For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes.

Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table.

I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind.

Some people already asked for "delayed write" indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects).

Since I do not know every internal of postgres, i feel simpler to share here and ask which things can or cannot be done.

Let's imagine there is a new type of index called "weird_btree", where we trade-off simplicity for speed. In almost every mode, we will rely on VACUUM to put our index in optimal state.

Mode 1: on "aminsert" mark the index as INVALID. So, if you modified the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before doing SELECT. This is almost the same as create index concurrently, the main difference is you don't have to remember to drop the index before writing. (I don't see much benefit here)

Mode 2: on "aminsert", put the new entry in a plain, unordered list instead of the btree. Inserting at the end of a list should be faster than big btrees and you'll know later which entries you missed indexing.

Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the btree there is the list and output every row, out-of order. You will have to tell postgres that our index isn't sorted and it will have to recheck every row.

Mode 2.b: mark the index invalid instead. When doing the next vacuum, sort the list and insert it to the btree in a bulk operation. If it's ok, mark the index valid.

Mode 3: on "aminsert", put the new entry on a second btree; leaving the first one untouched. Because the second btree is new, will be small, and writes should be faster. When doing a index scan, read tuples from both at same time (like merge sort). On vacuum, merge the second btree onto the first. On this mode, the index is sorted and there's no need of recheck.

Anyone thinks this would be a interesting feature for postgresql?
Did I miss something?

PD: Maybe it's also possible to take advantage of clustering, and have indexes which entries are range of TIDs; but i'm not sure if this is too exotic, or if it will make a difference.

Sincerely,

David.

Re: Is it possible to have a "fast-write" Index?

From

Jim Nasby

Date:

05 June 2015, 17:46:05

On 6/5/15 11:07 AM, deavid wrote:
> Did I miss something?

These are interesting ideas but the problem here is the problem is far 
to hypothetical. You're trying to defer index maintenance cost in a case 
where if there's any real problem the index pages are already in memory. 
So if it's too slow it's not because of IO... but then why is it too slow?

If you have significantly more than 10M rows then IO would be much more 
likely to be a problem, but at that point you should probably just be 
partitioning anyway.

If you want to attract attention here I think you'll need to come up 
with some concrete scenarios and provide data on where all the 
performance hit actually is.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Is it possible to have a "fast-write" Index?

From

Tom Lane

Date:

05 June 2015, 17:59:45

deavid <deavidsedice@gmail.com> writes:
> Some people already asked for "delayed write" indexes, but the idea gets
> discarded because the index could get out of sync, so it can omit results
> and this is unacceptable. But i think maybe that could be fixed in several
> ways and we can have a fast and reliable index (but maybe not so fast on
> selects).

FWIW, GIN indexes already implement something that's like your mode 2 but
a bit better: there's an unordered "pending insertions" list that has to
be scanned by every search in addition to checking the main index body.
Every so often the pending insertions list is automatically flushed into
the main index body.

The reason we do this for GIN is that that index type puts a huge premium
on doing inserts "in bulk"; it's a lot more efficient if you push many
rows into the index at once, because frequently they'll be inserting into
the same per-key posting lists.  I do not see much opportunity for a
corresponding gain for btree.

So I really doubt that anyone would have any enthusiasm for saddling btree
with a similar mechanism.  It's complicated (and has been the cause of
multiple bugs); it's hard to figure out when is the optimal time to flush
the pending insertions; and it slows down searches in favor of making
inserts cheap, which is generally not the way to bet --- if that's the
tradeoff you want, why not drop the index altogether?

But anyway, since you can use contrib/btree_gin to get more or less btree
semantics for GIN indexes (except for uniqueness enforcement), you might
try whether just replacing your btree indexes with GIN indexes provides
any win for your insertions.
        regards, tom lane

Re: Is it possible to have a "fast-write" Index?

From

Claudio Freire

Date:

05 June 2015, 18:09:26

On Fri, Jun 5, 2015 at 2:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> deavid <deavidsedice@gmail.com> writes:
>> Some people already asked for "delayed write" indexes, but the idea gets
>> discarded because the index could get out of sync, so it can omit results
>> and this is unacceptable. But i think maybe that could be fixed in several
>> ways and we can have a fast and reliable index (but maybe not so fast on
>> selects).
>
> FWIW, GIN indexes already implement something that's like your mode 2 but
> a bit better: there's an unordered "pending insertions" list that has to
> be scanned by every search in addition to checking the main index body.
> Every so often the pending insertions list is automatically flushed into
> the main index body.
>
> The reason we do this for GIN is that that index type puts a huge premium
> on doing inserts "in bulk"; it's a lot more efficient if you push many
> rows into the index at once, because frequently they'll be inserting into
> the same per-key posting lists.  I do not see much opportunity for a
> corresponding gain for btree.

A forest of btrees (say mode 2.c) may not be a bad idea. When tables
grow consistently, the cost of I/O is usually high in FPW and random
I/O due to the large spread of index updates. I don't have numbers,
but on the databases I've handled it certainly was so.

If you have a btree_forest am that will consist of several btrees that
follow the GIN pattern only instead of an unordered list you have an
ordered btree (which also simplifies merging), you should gain a lot.

The big btrees will be read-only, so they will be compact (100% fill
rate), you will generate less WAL (updates are all local on the small
"staging" btree) and even the disk may perform better with that
pattern.

It is in fact a pattern used by inverted indexes already, so it
wouldn't be too far-fetched.

It is however hard to figure out when compaction has to happen.
Concurrency shouldn't be an issue though, since all but the smallest
btree would be read-only, so you only need a lock while modifying the
forest structure (adding a new btree, swapping components with merged
versions, etc).

It would indeed, though, require a lot of extra storage to perform
compaction. An alternative would be to implement compaction as a
massive insert/delete instead. Certainly, how exactly compaction gets
implemented would be key in deciding whether the approach breaks even.

Re: Is it possible to have a "fast-write" Index?

From

Alvaro Herrera

Date:

05 June 2015, 18:12:05

deavid wrote:
> There are several use cases where I see useful an index, but adding it will
> slow too much inserts and updates.

Maybe try a BRIN index.  You can't use them for text search currently,
or many other cases for that matter, but there are enough interesting
cases in which they are useful that perhaps you don't need anything
extra.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Is it possible to have a "fast-write" Index?

From

Gavin Flower

Date:

05 June 2015, 23:12:12

On 06/06/15 04:07, deavid wrote:
> There are several use cases where I see useful an index, but adding it 
> will slow too much inserts and updates.
> For example, when we have 10 million rows on a table, and it's a table 
> which has frequent updates, we need several index to speed up selects, 
> but then we'll slow down updates a lot, specially when we have 10 or 
> more indexes.
> Other cases involve indexes for text search, which are used only for 
> user search and aren't that important, so we want to have them, but we 
> don't want the overload they put whenever we write on the table.
> I know different approaches that already solve some of those problems 
> in some ways (table partitioning, partial indexes, etc), but i don't 
> feel they are the solution to every problem of this kind.
>
> Some people already asked for "delayed write" indexes, but the idea 
> gets discarded because the index could get out of sync, so it can omit 
> results and this is unacceptable. But i think maybe that could be 
> fixed in several ways and we can have a fast and reliable index (but 
> maybe not so fast on selects).
>
> Since I do not know every internal of postgres, i feel simpler to 
> share here and ask which things can or cannot be done.
>
> Let's imagine there is a new type of index called "weird_btree", where 
> we trade-off simplicity for speed. In almost every mode, we will rely 
> on VACUUM to put our index in optimal state.
>
> Mode 1: on "aminsert" mark the index as INVALID. So, if you modified 
> the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before 
> doing SELECT. This is almost the same as create index concurrently, 
> the main difference is you don't have to remember to drop the index 
> before writing. (I don't see much benefit here)
>
> Mode 2: on "aminsert", put the new entry in a plain, unordered list 
> instead of the btree. Inserting at the end of a list should be faster 
> than big btrees and you'll know later which entries you missed indexing.
>
> Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the 
> btree there is the list and output every row, out-of order. You will 
> have to tell postgres that our index isn't sorted and it will have to 
> recheck every row.
>
> Mode 2.b: mark the index invalid instead. When doing the next vacuum, 
> sort the list and insert it to the btree in a bulk operation. If it's 
> ok, mark the index valid.
>
> Mode 3: on "aminsert", put the new entry on a second btree; leaving 
> the first one untouched. Because the second btree is new, will be 
> small, and writes should be faster. When doing a index scan, read 
> tuples from both at same time (like merge sort). On vacuum, merge the 
> second btree onto the first. On this mode, the index is sorted and 
> there's no need of recheck.
>
> Anyone thinks this would be a interesting feature for postgresql?
> Did I miss something?
>
> PD: Maybe it's also possible to take advantage of clustering, and have 
> indexes which entries are range of TIDs; but i'm not sure if this is 
> too exotic, or if it will make a difference.
>
> Sincerely,
> David.
How about a hybrid indexing system, with 2 parts:

(1) existing index system which is checked first and has been mostly 
optimised for speed of reading.  If there are only a few inserts/updates 
and the system is not heavily loaded, then it gets modified 
immediately.  The threshold for being too busy, and few enough changes, 
could be configurable.

(2) overflow index optimised for writing.  Possible in memory and not 
backed to permanent storage.  A crash would require a complete index 
rebuild - but only when there were entries in it (or at least more than 
some configurable threshold, to allow for cases were some missing index 
entries are acceptable).

So where the index is needed for a query, part 1 is checked first, and 
the part 2 if necessary

Have a background process that will move entries from part 2 to part 1, 
when the systems is less busy.


Cheers,
Gavin

Re: Is it possible to have a "fast-write" Index?

From

deavid

Date:

05 June 2015, 23:54:18

Thanks to everybody for answering. I wasn't expecting this attention; this is a great community :-)

Jim asked me about something real. Well, the problem is this showed up more than five years ago, and keeps popping from time to time since in different circumstances. I solved them in different ways each time, depending the exact use-case. I wanted to generalize, because seems a good feature for several situations; and I don't expect a solution for me as each time I hit with this I found some way to sort it out.

As Jim said, we need here are figures for real examples, and i don't have yet. I'll do my "homework" and email back with exact problems with exact timing. Give me a week or two.

Also, some of you are talking about IO. Well, it's hard to say without the figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on those big databases, and also in my tests i tried setting fsync=off.

So the problem is: i see a low iowait, and CPU time for one core is at 80-90% most of the time. I can buy more ram, better disks, or cpu's with more cores. But one cpu core would have more-or-less the same speed no matter how much money you invest.

When someone wants a delayed-write index is similar to setting "synchronous_commit = off". We want to give an OK to the backend as soon as is possible and do this work in background. But we also want some reliability against crashes.

Also, if the task is done in background it may be done from other backend, so probably several indexes could be packed at once using different backend processes. We could use the entire cpu if our index writes aren't tied to the session who wrote the row.

PD: I'm very interested on existent approaches like GIN or BRIN (this one is new to me). Thanks a lot; i'll try them in my tests.

Re: Is it possible to have a "fast-write" Index?

From

"ktm@rice.edu"

Date:

06 June 2015, 11:07:33

On Fri, Jun 05, 2015 at 11:54:01PM +0000, deavid wrote:
> Thanks to everybody for answering. I wasn't expecting this attention; this
> is a great community :-)
> 
> Jim asked me about something real. Well, the problem is this showed up more
> than five years ago, and keeps popping from time to time since in different
> circumstances. I solved them in different ways each time, depending the
> exact use-case. I wanted to generalize, because seems a good feature for
> several situations; and I don't expect a solution for me as each time I hit
> with this I found some way to sort it out.
> As Jim said, we need here are figures for real examples, and i don't have
> yet. I'll do my "homework" and email back with exact problems with exact
> timing. Give me a week or two.
> 
> Also, some of you are talking about IO. Well, it's hard to say without the
> figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on
> those big databases, and also in my tests i tried setting fsync=off.
> 
> So the problem is: i see a low iowait, and CPU time for one core is at
> 80-90% most of the time. I can buy more ram, better disks, or cpu's with
> more cores. But one cpu core would have more-or-less the same speed no
> matter how much money you invest.
> 
> When someone wants a delayed-write index is similar to setting
>  "synchronous_commit = off". We want to give an OK to the backend as soon
> as is possible and do this work in background. But we also want some
> reliability against crashes.
> 
> Also, if the task is done in background it may be done from other backend,
> so probably several indexes could be packed at once using different backend
> processes. We could use the entire cpu if our index writes aren't tied to
> the session who wrote the row.
> 
> PD: I'm very interested on existent approaches like GIN or BRIN (this one
> is new to me). Thanks a lot; i'll try them in my tests.

Hi David,

Here is an interesting read comparing LSM and Fractal Tree indexing:

http://highscalability.com/blog/2014/8/6/tokutek-white-paper-a-comparison-of-log-structured-merge-lsm.html

Regards,
Ken

Re: Is it possible to have a "fast-write" Index?

From

deavid

Date:

10 June 2015, 21:01:39

Hi again. I tried to do some test on my office computer, but after spending 2-3 hours I gave up. I'm going to need a real SSD disk to try these things. 100k rows of my "delivery notes" table use 100MB of disk; and 2Gb of RAM may be not enough to emulate a fast IO. (I was disabling fsync, activating write caches, etc)

I downloaded Postgresql 9.5-dev from git sources, compiled everything and restored there two client databases (10Gb each one). My goal was to test gin_btree and brin indexes as well, but i gave up before doing a complete test of gin.

Now I have a better plan, I'm going to use my laptop (intel i5, 4Gb of ram) and i will put here a spare SSD i wasn't using (OCZ Agility 2 120Gb). Hope this time I could get some figures closer to production.

By now, my results were a bit disappointing: (comparing gin_btree against regular btree for a column with very low cardinality)

- create index and updates: about 10-20% faster (i had a primary key, so btree unique checks may be here blurring the results)

- selects: about 2-5 times slower

- index size: about 2 times smaller

What i've found is, I was wrong on fillfactor. (Maybe something has changed here since postgresql 8.1). I believed a fillfactor lower than 80 will do more harm than good. At least that was the case 5 years ago. Now I could get a noticeable speedup with fillfactor=50 in the case of updating the whole table.

Hope i could setup this laptop soon and get those tests done.

El sáb., 6 jun. 2015 a las 13:07, ktm@rice.edu (<ktm@rice.edu>) escribió:

On Fri, Jun 05, 2015 at 11:54:01PM +0000, deavid wrote:
> Thanks to everybody for answering. I wasn't expecting this attention; this
> is a great community :-)
>
> Jim asked me about something real. Well, the problem is this showed up more
> than five years ago, and keeps popping from time to time since in different
> circumstances. I solved them in different ways each time, depending the
> exact use-case. I wanted to generalize, because seems a good feature for
> several situations; and I don't expect a solution for me as each time I hit
> with this I found some way to sort it out.
> As Jim said, we need here are figures for real examples, and i don't have
> yet. I'll do my "homework" and email back with exact problems with exact
> timing. Give me a week or two.
>
> Also, some of you are talking about IO. Well, it's hard to say without the
> figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on
> those big databases, and also in my tests i tried setting fsync=off.
>
> So the problem is: i see a low iowait, and CPU time for one core is at
> 80-90% most of the time. I can buy more ram, better disks, or cpu's with
> more cores. But one cpu core would have more-or-less the same speed no
> matter how much money you invest.
>
> When someone wants a delayed-write index is similar to setting
> "synchronous_commit = off". We want to give an OK to the backend as soon
> as is possible and do this work in background. But we also want some
> reliability against crashes.
>
> Also, if the task is done in background it may be done from other backend,
> so probably several indexes could be packed at once using different backend
> processes. We could use the entire cpu if our index writes aren't tied to
> the session who wrote the row.
>
> PD: I'm very interested on existent approaches like GIN or BRIN (this one
> is new to me). Thanks a lot; i'll try them in my tests.

Hi David,

Here is an interesting read comparing LSM and Fractal Tree indexing:

http://highscalability.com/blog/2014/8/6/tokutek-white-paper-a-comparison-of-log-structured-merge-lsm.html

Regards,
Ken

Re: Is it possible to have a "fast-write" Index?

From

Claudio Freire

Date:

10 June 2015, 22:02:30

On Wed, Jun 10, 2015 at 6:01 PM, deavid <deavidsedice@gmail.com> wrote:
> By now, my results were a bit disappointing: (comparing gin_btree against
> regular btree for a column with very low cardinality)
> - create index and updates: about 10-20% faster (i had a primary key, so
> btree unique checks may be here blurring the results)

That could be the effect of GIN's buffering (lets call it LSM? it's similar)

So that with pure btrees could get a similar speedup.

On Wed, Jun 10, 2015 at 6:01 PM, deavid <deavidsedice@gmail.com> wrote:
> What i've found is, I was wrong on fillfactor. (Maybe something has changed
> here since postgresql 8.1). I believed a fillfactor lower than 80 will do
> more harm than good. At least that was the case 5 years ago. Now I could get
> a noticeable speedup with fillfactor=50 in the case of updating the whole
> table.

8.1 didn't have HOT. I'd bet it's that.

Re: Is it possible to have a "fast-write" Index?

From

Jim Nasby

Date:

11 June 2015, 21:23:59

On 6/5/15 6:54 PM, deavid wrote:
>
> So the problem is: i see a low iowait, and CPU time for one core is at
> 80-90% most of the time. I can buy more ram, better disks, or cpu's with
> more cores. But one cpu core would have more-or-less the same speed no
> matter how much money you invest.
>
> When someone wants a delayed-write index is similar to setting
>   "synchronous_commit = off". We want to give an OK to the backend as
> soon as is possible and do this work in background. But we also want
> some reliability against crashes.
>
> Also, if the task is done in background it may be done from other
> backend, so probably several indexes could be packed at once using
> different backend processes. We could use the entire cpu if our index
> writes aren't tied to the session who wrote the row.

Something that might help here would be doing the index maintenance in 
parallel via background workers. There's now enough parallelism 
infrastructure that it shouldn't be too hard to hack up a quick test of 
that.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Is it possible to have a "fast-write" Index?

From

Qingqing Zhou

Date:

12 June 2015, 02:32:43

On Fri, Jun 5, 2015 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I really doubt that anyone would have any enthusiasm for saddling btree
> with a similar mechanism.  It's complicated (and has been the cause of
> multiple bugs); it's hard to figure out when is the optimal time to flush
> the pending insertions; and it slows down searches in favor of making
> inserts cheap, which is generally not the way to bet --- if that's the
> tradeoff you want, why not drop the index altogether?
>
I have seen a case that a major fact table with up to 7 indices, every
15~60 mins with large amount of data loading, and there are
concurrrent seeks against indices at the same time. We can play with
partitioning, or sarcrifice some application semantics, to alleviate
the pressure but it is good to see if we can improve: sorting and
batching insert into btree is helpful for better IO and locking
behavior. So can we guard the case that hard to handle, e.g., the
indices enforcing some constraints (like uniqueness), and improve the
loading senario?

Hint bits update is also painful in above case, but it is out of the topic here.

Thanks,
Qingqing

Re: Is it possible to have a "fast-write" Index?

From

Simon Riggs

Date:

12 June 2015, 05:27:50

On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:

There are several use cases where I see useful an index, but adding it will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes.
Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind.

Some people already asked for "delayed write" indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects).

This is exactly the use case and mechanism for BRIN indexes.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Is it possible to have a "fast-write" Index?

From

deavid

Date:

13 June 2015, 01:18:19

So I just ran a test case for hash, btree, gin_btree and brin indexes. Also without indexes, and without primary keys.

* Testing "deliverynotes" table.

- Definition and use case:

It is a table contaning real delivery note headers of several years

It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data

excluding indexes. Since is a table visible for users, almost every

column can be searched so we need lots of indexes. We do not need

searches to be the fastest possible, we only need to accelerate a

bit our user searches; without harming too much writes.

- Things to test:

- measure index creation times.

- measure index space.

- with indexes but without primary key

- with everything

- Create fully, delete everything and Insert again data in blocks

- Test updates for recent data

I attached the logs for every test, if anyone wants to see what i'm exactly testing.

This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ Agility 3). I'm trying to measure CPU time, not I/O time, so some configurations and tests are specific to avoid as much as IO as I can.

I'm using a dev build for Postgresql 9.5 downloaded from git sources.

Conclusions:

- Gin_btree seems slower in almost every case. It's writes are marginally better than regular btrees even when using work_mem=160MB. (May be 20% faster than btree). They are smaller than I thought.

- BRIN indexes seem very fast for writes. For selects maybe is a blend between having indexes and don't having them. They don't recognize that some values are simply out of range of indexed values, and that's a pity. If the values we want are packed together I guess I would get even better results.

- Primary keys and uniqueness checks doesn't seem to make any difference here.

- Having no indexes at all is faster than I imagined. (Sometimes it beats BRIN or Btree) Maybe because the IO here is faster than usual.

- Hash indexes: i tried to do something, but they take too much time to build and i don't know why. If creates are slow, updates should be slow too. I'm not going to test them again.

And finally, don't know why but i couldn't vacuum or analyze tables. It always get stalled without doing anything; so i had to comment every vacuum. Maybe there is a bug in this dev version or i misconfigured something.

El vie., 12 jun. 2015 a las 7:27, Simon Riggs (<simon@2ndquadrant.com>) escribió:

On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes.
Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind.

Some people already asked for "delayed write" indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects).

This is exactly the use case and mechanism for BRIN indexes.

--
Simon Riggs http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Is it possible to have a "fast-write" Index?

From

deavid

Date:

13 June 2015, 09:41:34

Sorry; Because some misconfiugration vacuum and analyze were'nt working. Now I'm getting better numbers for BRIN indexes where there are zero rows to match.

El sáb., 13 jun. 2015 a las 3:17, deavid (<deavidsedice@gmail.com>) escribió:

So I just ran a test case for hash, btree, gin_btree and brin indexes. Also without indexes, and without primary keys.
* Testing "deliverynotes" table.
- Definition and use case:
It is a table contaning real delivery note headers of several years
It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data
excluding indexes. Since is a table visible for users, almost every
column can be searched so we need lots of indexes. We do not need
searches to be the fastest possible, we only need to accelerate a
bit our user searches; without harming too much writes.
- Things to test:
- measure index creation times.
- measure index space.
- with indexes but without primary key
- with everything
- Create fully, delete everything and Insert again data in blocks
- Test updates for recent data

I attached the logs for every test, if anyone wants to see what i'm exactly testing.
This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ Agility 3). I'm trying to measure CPU time, not I/O time, so some configurations and tests are specific to avoid as much as IO as I can.
I'm using a dev build for Postgresql 9.5 downloaded from git sources.

Conclusions:
- Gin_btree seems slower in almost every case. It's writes are marginally better than regular btrees even when using work_mem=160MB. (May be 20% faster than btree). They are smaller than I thought.
- BRIN indexes seem very fast for writes. For selects maybe is a blend between having indexes and don't having them. They don't recognize that some values are simply out of range of indexed values, and that's a pity. If the values we want are packed together I guess I would get even better results.
- Primary keys and uniqueness checks doesn't seem to make any difference here.
- Having no indexes at all is faster than I imagined. (Sometimes it beats BRIN or Btree) Maybe because the IO here is faster than usual.
- Hash indexes: i tried to do something, but they take too much time to build and i don't know why. If creates are slow, updates should be slow too. I'm not going to test them again.

And finally, don't know why but i couldn't vacuum or analyze tables. It always get stalled without doing anything; so i had to comment every vacuum. Maybe there is a bug in this dev version or i misconfigured something.

El vie., 12 jun. 2015 a las 7:27, Simon Riggs (<simon@2ndquadrant.com>) escribió:
On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes.
Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind.

Some people already asked for "delayed write" indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects).

This is exactly the use case and mechanism for BRIN indexes.

--
Simon Riggs http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Is it possible to have a "fast-write" Index?

From

deavid

Date:

18 June 2015, 19:37:17

I did another try on BRIN and GIN indexes, and I compared to regular btree indexes. Now i have 16M rows to do the test.

The numbers seem to be good. Both GIN and BRIN seem good options for certain tables with more writes than reads (Specially BRIN is very good)

I want to share with you my test; I used real-world data, but i didn't had time to do something accurate or real-word uses. I know the methodology is not enough, and maybe some calculations on the spreadsheet are wrong. I tried to do my best.

I'm using an SSD and I'm trying to compare CPU cost, not I/O.

In short, the results were: (compared to btree)

- INSERT: GIN is 50% faster; BRIN is 6x faster. This is the best scenario.

- UPDATE: each case has a winner; for big updates BRIN is 10x faster and GIN is 25x faster. For small updates (most real world cases) BTREE is always the winner; but BRIN gives some good results too.

- DELETE: Almost no difference between the three.

- SELECT: BTREE here is the winner. BRIN is 10% slower, and GIN performance seems a bit random.

VACUUM, ANALYZE and other tasks are 6x faster with BRIN, 50% faster with GIN.

Index sizes are 50% smaller with GIN, but with BRIN they are very very small

Hope you find useful these numbers.

El sáb., 13 jun. 2015 a las 11:41, deavid (<deavidsedice@gmail.com>) escribió:

Sorry; Because some misconfiugration vacuum and analyze were'nt working. Now I'm getting better numbers for BRIN indexes where there are zero rows to match.

El sáb., 13 jun. 2015 a las 3:17, deavid (<deavidsedice@gmail.com>) escribió:
So I just ran a test case for hash, btree, gin_btree and brin indexes. Also without indexes, and without primary keys.
* Testing "deliverynotes" table.
- Definition and use case:
It is a table contaning real delivery note headers of several years
It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data
excluding indexes. Since is a table visible for users, almost every
column can be searched so we need lots of indexes. We do not need
searches to be the fastest possible, we only need to accelerate a
bit our user searches; without harming too much writes.
- Things to test:
- measure index creation times.
- measure index space.
- with indexes but without primary key
- with everything
- Create fully, delete everything and Insert again data in blocks
- Test updates for recent data

I attached the logs for every test, if anyone wants to see what i'm exactly testing.
This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ Agility 3). I'm trying to measure CPU time, not I/O time, so some configurations and tests are specific to avoid as much as IO as I can.
I'm using a dev build for Postgresql 9.5 downloaded from git sources.

Conclusions:
- Gin_btree seems slower in almost every case. It's writes are marginally better than regular btrees even when using work_mem=160MB. (May be 20% faster than btree). They are smaller than I thought.
- BRIN indexes seem very fast for writes. For selects maybe is a blend between having indexes and don't having them. They don't recognize that some values are simply out of range of indexed values, and that's a pity. If the values we want are packed together I guess I would get even better results.
- Primary keys and uniqueness checks doesn't seem to make any difference here.
- Having no indexes at all is faster than I imagined. (Sometimes it beats BRIN or Btree) Maybe because the IO here is faster than usual.
- Hash indexes: i tried to do something, but they take too much time to build and i don't know why. If creates are slow, updates should be slow too. I'm not going to test them again.

And finally, don't know why but i couldn't vacuum or analyze tables. It always get stalled without doing anything; so i had to comment every vacuum. Maybe there is a bug in this dev version or i misconfigured something.

El vie., 12 jun. 2015 a las 7:27, Simon Riggs (<simon@2ndquadrant.com>) escribió:
On 5 June 2015 at 18:07, deavid <deavidsedice@gmail.com> wrote:
There are several use cases where I see useful an index, but adding it will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes.
Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind.

Some people already asked for "delayed write" indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects).

This is exactly the use case and mechanism for BRIN indexes.

--
Simon Riggs http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

compare_indexes.ods

Re: Is it possible to have a "fast-write" Index?

From

Jim Nasby

Date:

18 June 2015, 19:50:16

On 6/12/15 9:17 PM, deavid wrote:
>    (upper(codagencia::text) );

FWIW, you should consider using citext for these cases; it would let you 
cut down on the number of indexes (by 50%?).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Is it possible to have a "fast-write" Index?

From

Alvaro Herrera

Date:

18 June 2015, 21:05:19

deavid wrote:
> I did another try on BRIN and GIN indexes, and I compared to regular btree
> indexes. Now i have 16M rows to do the test.
> 
> The numbers seem to be good. Both GIN and BRIN seem good options for
> certain tables with more writes than reads (Specially BRIN is very good)

Thanks, that's very nice to hear.

So you may or may not have realized two important details regarding
BRIN.  One is that when you insert into a table, the values are not
immediately put into the index but only as part of a later vacuum, or a
brin_summarize_new_values() function call.  Either of those things scan
the not-already-summarized part of the table and insert the summarized
values into the index.  If the new values are not in the index, then a
query would have to read that part of the table completely instead of
excluding it from the scan.

The other thing is that in BRIN you can tell the system to make the
summary information very detailed or coarsely detailed -- say one
summary tuple for every page, or one summary tuple for every 128 pages.
The latter is the default.  Obviously, the more detailed it is, the more
you can skip when scanning the table.  If the values are perfectly
correlated, then there's no point to the extra detail, but if there's
some variability then it could be worthwhile.  You change this by
specifying "WITH (pages_per_range=16)" to the CREATE INDEX command, or
by doing ALTER INDEX SET (pages_per_range=16) and then REINDEX it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Is it possible to have a "fast-write" Index?

From

Nicolas Barbier

Date:

18 June 2015, 21:43:27

2015-06-05 deavid <deavidsedice@gmail.com>:

> Mode 3: on "aminsert", put the new entry on a second btree; leaving the
> first one untouched. Because the second btree is new, will be small, and
> writes should be faster. When doing a index scan, read tuples from both at
> same time (like merge sort). On vacuum, merge the second btree onto the
> first. On this mode, the index is sorted and there's no need of recheck.

You might be interested in reading the thread “Fast insertion indexes:
why no developments” (2013), starting at
1383033222.73186.YahooMailNeo@web172602.mail.ir2.yahoo.com .

That thread talks mostly about reducing the (delayed) I/O caused by
inserting in a super-big index at a continuously high rate, while you
seem more interested in the delay problem caused by the CPU time used
when inserting in multiple indexes (which should be quite fast
already, as most of the writing is delayed).

Your problem (if it is really about delay and not about continuous
throughput) might be better served by a delay-reducing solution such
as writing a logical “row X is inserted, please make sure that all
indexes are up-to-date” to the WAL, instead of the physical “row X is
inserted into table A, part Y of index Z must be updated like this,
part Q of index S must be updated like so, etc” as is done now.
Updating the indexes (not only the writing itself) would then be
performed in a delayed fashion. Reading of an index must always
additionally scan the in-memory queue of logical WAL records that is
kept.

Of course, switching the WAL wholesale from a physical description of
the changes that must be performed to a logical description is
probably not feasible. Therefore, one could think about some new kind
of “logical WAL” that gets logged separately (or even mixed in with
the normal, physical WAL), where first the logical WAL is written
(“insert row X in table A”), after which other operations can continue
and the logical WAL is converted to physical WAL asynchronously (by
“performing” the changes as is currently done, but by a background
process). Recovery then would first need to replay the physical WAL,
and then replay the logical WAL.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

Re: Is it possible to have a "fast-write" Index?

From

Robert Haas

Date:

19 June 2015, 12:30:42

> So I really doubt that anyone would have any enthusiasm for saddling btree
> with a similar mechanism.  It's complicated (and has been the cause of
> multiple bugs); it's hard to figure out when is the optimal time to flush
> the pending insertions; and it slows down searches in favor of making
> inserts cheap, which is generally not the way to bet --- if that's the
> tradeoff you want, why not drop the index altogether?

I'm not sure you're right about that.  MySQL has a feature called
secondary index buffering:

https://dev.mysql.com/doc/refman/5.0/en/innodb-insert-buffering.html

Now that might not be exactly what we want to do for one reason or
another, but I think it would be silly to think that they implemented
that for any reason other than performance, so there may be some
performance to be gained there.

Consider that on a table with multiple indexes, we've got to insert
into all of them.  If it turns out that the first leaf page we need
isn't in shared buffers, we'll wait for it to be read in.  We won't
start the second index insertion until we've completed the first one,
and so on.  So the whole thing is serial.  In the system MySQL has
implemented, the foreground process would proceed unimpeded and any
indexes whose pages were not in the buffer pool would get updated in
the background.

Ignoring for the moment the complexities of whether they've got the
right design and how to implement it, that's sort of cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is it possible to have a "fast-write" Index?

From

Simon Riggs

Date:

19 June 2015, 13:06:53

On 19 June 2015 at 14:30, Robert Haas <robertmhaas@gmail.com> wrote:

> So I really doubt that anyone would have any enthusiasm for saddling btree
> with a similar mechanism. It's complicated (and has been the cause of
> multiple bugs); it's hard to figure out when is the optimal time to flush
> the pending insertions; and it slows down searches in favor of making
> inserts cheap, which is generally not the way to bet --- if that's the
> tradeoff you want, why not drop the index altogether?

I'm not sure you're right about that. MySQL has a feature called
secondary index buffering:

https://dev.mysql.com/doc/refman/5.0/en/innodb-insert-buffering.html

Now that might not be exactly what we want to do for one reason or
another, but I think it would be silly to think that they implemented
that for any reason other than performance, so there may be some
performance to be gained there.

Consider that on a table with multiple indexes, we've got to insert
into all of them. If it turns out that the first leaf page we need
isn't in shared buffers, we'll wait for it to be read in. We won't
start the second index insertion until we've completed the first one,
and so on. So the whole thing is serial. In the system MySQL has
implemented, the foreground process would proceed unimpeded and any
indexes whose pages were not in the buffer pool would get updated in
the background.

Ignoring for the moment the complexities of whether they've got the
right design and how to implement it, that's sort of cool.

Interesting.

Reading that URL it shows that they would need to write WAL to insert into the buffer and then again to insert into the index. You might get away with skipping WAL logs on the index buffer if you had a special WAL record to record the event "all indexes updated for xid NNNN", but since that would be written lazily it would significantly complicate the lazy update mechanism to track that.

It doesn't say anything about their being only one index buffer per table, nor do I think it would make sense to do it that way. So ISTM that the foreground process still has to insert serially into N index buffers, with each insert being WAL logged.

So the only saving for the foreground process is the random I/O from inserting into the indexes, which means the value of the technique is in the case where you have many very large secondary indexes - which is now covered by BRIN.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Is it possible to have a "fast-write" Index?

From

Merlin Moncure

Date:

19 June 2015, 13:37:51

On Thu, Jun 11, 2015 at 9:32 PM, Qingqing Zhou
<zhouqq.postgres@gmail.com> wrote:
> On Fri, Jun 5, 2015 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I really doubt that anyone would have any enthusiasm for saddling btree
>> with a similar mechanism.  It's complicated (and has been the cause of
>> multiple bugs); it's hard to figure out when is the optimal time to flush
>> the pending insertions; and it slows down searches in favor of making
>> inserts cheap, which is generally not the way to bet --- if that's the
>> tradeoff you want, why not drop the index altogether?
>
> Hint bits update is also painful in above case, but it is out of the topic here.

Are your records spread out around many transactions or so you tend to
have large batches all with the same xid?

merlin

Re: Is it possible to have a "fast-write" Index?

From

deavid

Date:

20 June 2015, 19:39:38

El vie., 19 jun. 2015 a las 15:06, Simon Riggs (<simon@2ndquadrant.com>) escribió:

It doesn't say anything about their being only one index buffer per table, nor do I think it would make sense to do it that way. So ISTM that the foreground process still has to insert serially into N index buffers, with each insert being WAL logged.

So the only saving for the foreground process is the random I/O from inserting into the indexes, which means the value of the technique is in the case where you have many very large secondary indexes - which is now covered by BRIN.

I'm still learning how postgresql, but, you're assuming when inserting in bulk into an insert would require the same amount of CPU cycles and the same amount of kB written compared to doing it row-by-row.

Most memory-based indexes have a bulk load technique that relies in having the data pre-sorted. Sorting pure random data and then bulk-inserting it into the index is faster than the classic insertion. (less CPU time, no idea about IO)

Database indexes are disk-based and there are some points (regarding IO performance) that are hard for me to fully understand. But seems logic that would be faster to scan the index only once from begin to end and do something like a "merge sort" between pre-sorted input and the index.

So I guess I missed something. Maybe is WAL logging the problem? If so, could this work for TEMP/UNLOGGED tables?

Lots of tables that are heavily written are materialized views (or they perform more or less the same), so they could be refreshed in case of server failure. I hope bulk inserts could double the performance; otherwise, this idea may not be worth it.

About BRIN indexes, i'm really impressed. They are several times faster than I could imagine. Also, on select they perform very well. I have to test them more, with more complex queries (they would work when used on JOIN clauses?). If select times are good enough even in those cases, then there's no need for doing bulk-inserts with btree.