Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers
From | Leonardo Francalanci |
---|---|
Subject | Re: Fast insertion indexes: why no developments |
Date | |
Msg-id | 1384333676597-5778092.post@n5.nabble.com Whole thread Raw |
In response to | Re: Fast insertion indexes: why no developments (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Fast insertion indexes: why no developments
Re: Fast insertion indexes: why no developments |
List | pgsql-hackers |
Simon Riggs wrote > On 5 November 2013 14:28, Leonardo Francalanci < > m_lists@ > > wrote: > >> Either my sql is not correct (likely), or my understanding of the minmax >> index is >> not correct (even more likely), or the minmax index is not usable in a >> random inputs >> scenario. > > Please show the real world SQL you intend to run, so we can comment on > it. Inventing a use case that breaks effectiveness of any optimisation > is always easy, but the question is whether the use case is likely or > even desirable. The use case is pretty simple. Think it as the NSA, as it would be much easier. Collect all calls made/received by any user on a mobile network. (in fact, it's something more than calls, so in fact is 2-10 rows per call). Keep the data for 20 days. That's the insert part. Query: search calls made/received by the user using IMSI (caller id) or IMEI (phone id). Date range is usually days (past 4 days, from 10 days ago to 5 days ago...) The result is just a very small percentage of the rows present in the table: a single user doesn't call that much! Searches are made by a human, so no that many request per second. It's not a "write mostly" scenario, it's a 99% write 1% read scenario. Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Solution so far? partition every 15 minutes, create the indexes in bulk. Simon Riggs wrote > If we have a query to show the most recent calls by a particular caller > > SELECT * > FROM cdr > WHERE callerid = X > ORDER BY call_timestamp DESC > LIMIT 100 > > then this could potentially be optimised using a minmax index, by > traversing the data ranges in call_timestamp order. That is not part > of the code in this initial release, since the main use case is for > WHERE call_timestamp >= X, or WHERE primarykey = Y I don't understand how a index on call_timestamp would help in the query above. Simon Riggs wrote > I don't believe there is a credible business case for running that > same query but without the ORDER BY and LIMIT, since it could > potentially return gazillions of rows Gazillion of rows??? We're talking about calls made/received by one user here. How many calls do you make in 10 days??? Simon Riggs wrote > so it isn't surprising at all > that it would access a large % of the table. In fact, the query I use return a fraction of the table, and only a very small amount of users get searched. Simon, you keep on talking about these minmax indexes, and I still don't see any reference to some performance tests. And, again, I think that random values insertion is the worst use case for minmax indexes. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
pgsql-hackers by date: