Home > mailing lists

Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Fast insertion indexes: why no developments
Date	October 30, 2013 14:08:09
Msg-id	CA+U5nMJ=RrTXUY0g1VS7KcR9DVz9DezvN7Z2QE_R7TyTZCh36A@mail.gmail.com Whole thread Raw
In response to	Re: Fast insertion indexes: why no developments (Leonardo Francalanci <m_lists@yahoo.it>)
Responses	Re: Fast insertion indexes: why no developments (Leonardo Francalanci <m_lists@yahoo.it>) Re: Fast insertion indexes: why no developments (Kaare Rasmussen <kaare@jasonic.dk>)
List	pgsql-hackers

Tree view

On 30 October 2013 10:35, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> Presumably the data you are inserting isn't actually random. Please
>> describe the use case you are considering in more detail and some view
>> on how frequent that is, with some examples. Once we understand the
>> use case and agree it is important, we might solve problems.
>
>
> Collecting calls data for mobile network operators (and no, I don't work for the NSA...)
> Easily 5000-10000 inserts per second. Indexes in timestamp and ID (not a problem, always increasing so no btree
issues)and in called #, calling #, imsi, imei. The last four obviously are random, out of millions of possible values. 
> After the few first millions of records, the disks can't keep up with the amount of random writing in the indexes.
Workaround:the table is partitioned every 15 minutes, and indexes created in bulk after we "start" the new 15-minutes
partition.Searches on current 15 minutes are not allowed (as it is not indexed), and searches on older data are
K*log(N)(where K is the number of partitions). 
> Yes, I could throw more disks, use ssd, sharding more, etc etc. But I still think that btree just aren't fit for this
kindof problem. I don't delete data, I don't update data, there's not that much concurrency going on. I would sacrifice
searchspeed (K*log(N) is already much slower than "regular" btree usage) for realtime insertion. 
>
> I don't think I'm the only one having a big system to be indexed by random values.
>
> In fact, I didn't want to turn this thread into a "help me with this workload" thread. I just wanted to know if there
wassome other known issues with these "different indexes" other than "not enough time to implement them correctly": I
wasafraid that someone already dismissed them as "good in theory, bad in practice"... 

What is the reason for needing such fast access to individual groups
of records? Sure sounds like the NSA or similar ;-)

Sacrificing timeliness for efficiency is a common solution. I'm seeing
lots of areas where being able to specify the timeliness that is
acceptable in a query leads to various optimisations of this and
similar.

Indexes are a declarative solution.  We would need to be able to
specify the tolerances to be able to do this. (You can write your own
index...)

In terms of generality, do you think its worth a man year of developer
effort to replicate what you have already achieved? Who would pay?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Leonardo Francalanci
Date: 30 October 2013, 13:38:05
Subject: Re: Fast insertion indexes: why no developments

From: Leonardo Francalanci
Date: 30 October 2013, 14:24:14
Subject: Re: Fast insertion indexes: why no developments

Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

Previous

Next