Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Leonardo Francalanci
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id 1383129310.94766.YahooMailNeo@web172601.mail.ir2.yahoo.com
Whole thread Raw
In response to Re: Fast insertion indexes: why no developments  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Fast insertion indexes: why no developments  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Fast insertion indexes: why no developments  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
> Presumably the data you are inserting isn't actually random. Please
> describe the use case you are considering in more detail and some view
> on how frequent that is, with some examples. Once we understand the
> use case and agree it is important, we might solve problems.


Collecting calls data for mobile network operators (and no, I don't work for the NSA...)
Easily 5000-10000 inserts per second. Indexes in timestamp and ID (not a problem, always increasing so no btree issues)
andin called #, calling #, imsi, imei. The last four obviously are random, out of millions of possible values. 
After the few first millions of records, the disks can't keep up with the amount of random writing in the indexes.
Workaround:the table is partitioned every 15 minutes, and indexes created in bulk after we "start" the new 15-minutes
partition.Searches on current 15 minutes are not allowed (as it is not indexed), and searches on older data are
K*log(N)(where K is the number of partitions).  
Yes, I could throw more disks, use ssd, sharding more, etc etc. But I still think that btree just aren't fit for this
kindof problem. I don't delete data, I don't update data, there's not that much concurrency going on. I would sacrifice
searchspeed (K*log(N) is already much slower than "regular" btree usage) for realtime insertion. 

I don't think I'm the only one having a big system to be indexed by random values. 

In fact, I didn't want to turn this thread into a "help me with this workload" thread. I just wanted to know if there
wassome other known issues with these "different indexes" other than "not enough time to implement them correctly": I
wasafraid that someone already dismissed them as "good in theory, bad in practice"... 




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Fast insertion indexes: why no developments
Next
From: Simon Riggs
Date:
Subject: Re: Fast insertion indexes: why no developments