Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Leonardo Francalanci
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id 1384333676597-5778092.post@n5.nabble.com
Whole thread Raw
In response to Re: Fast insertion indexes: why no developments  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Fast insertion indexes: why no developments  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Fast insertion indexes: why no developments  (Jeremy Harris <jgh@wizmail.org>)
List pgsql-hackers
Simon Riggs wrote
> On 5 November 2013 14:28, Leonardo Francalanci <

> m_lists@

> > wrote:
> 
>> Either my sql is not correct (likely), or my understanding of the minmax
>> index is
>> not correct (even more likely), or the minmax index is not usable in a
>> random inputs
>> scenario.
> 
> Please show the real world SQL you intend to run, so we can comment on
> it. Inventing a use case that breaks effectiveness of any optimisation
> is always easy, but the question is whether the use case is likely or
> even desirable.

The use case is pretty simple.
Think it as the NSA, as it would be much easier.
Collect all calls made/received by any user on a mobile network.
(in fact, it's something more than calls, so in fact is 2-10 rows per call).
Keep the data for 20 days.
That's the insert part.

Query:
search calls made/received by the user using IMSI (caller id) or IMEI
(phone id). Date range is usually days (past 4 days, from 10 days ago
to 5 days ago...)

The result is just a very small percentage of the rows present in the
table: a single user doesn't call that much!
Searches are made by a human, so no that many request per second.

It's not a "write mostly" scenario, it's a 99% write 1% read scenario.

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.
Solution so far? partition every 15 minutes, create the indexes in bulk.


Simon Riggs wrote
> If we have a query to show the most recent calls by a particular caller
> 
> SELECT *
> FROM cdr
> WHERE callerid = X
> ORDER BY call_timestamp DESC
> LIMIT 100
> 
> then this could potentially be optimised using a minmax index, by
> traversing the data ranges in call_timestamp order. That is not part
> of the code in this initial release, since the main use case is for
> WHERE call_timestamp >= X, or WHERE primarykey = Y

I don't understand how a index on call_timestamp would help
in the query above.


Simon Riggs wrote
> I don't believe there is a credible business case for running that
> same query but without the ORDER BY and LIMIT, since it could
> potentially return gazillions of rows


Gazillion of rows??? We're talking about calls made/received by
one user here. How many calls do you make in 10 days???


Simon Riggs wrote
>  so it isn't surprising at all
> that it would access a large % of the table. 

In fact, the query I use return a fraction of the table, and only
a very small amount of users get searched.

Simon, you keep on talking about these minmax indexes, and
I still don't see any reference to some performance tests.

And, again, I think that random values insertion is the worst
use case for minmax indexes.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



pgsql-hackers by date:

Previous
From: Nicolas Barbier
Date:
Subject: Re: Fast insertion indexes: why no developments
Next
From: "Colin 't Hart"
Date:
Subject: Re: TABLE not synonymous with SELECT * FROM?