Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers
From | Leonardo Francalanci |
---|---|
Subject | Re: Fast insertion indexes: why no developments |
Date | |
Msg-id | 1384345889267-5778124.post@n5.nabble.com Whole thread Raw |
In response to | Re: Fast insertion indexes: why no developments (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Fast insertion indexes: why no developments
|
List | pgsql-hackers |
Simon Riggs wrote > So in the use case you describe, the min max index would require a > scan of only 25% of the table, not the 80% described earlier for > random inserts. In my experience, people wish to keep data for much > longer periods and so the percentage of scan required would drop lower > than 25%, possibly to 5% or less for many applications. > > The plan would use sequential I/O so could still work quickly; given > the low read rate, longer query times could be acceptable. "Quickly"??? Seq scan 25% of a table of TB is not "quick". Simon Riggs wrote > Minmax indexes are simply a way to make this use case happen > automatically, without the need for manual partitioning of the table. You logic assumes that we don't index anything but the call_timestamp. That would lead to huge query response times. Plus: btree doesn't have a big problem to keep up in sequential insertion scenario (such as the call_timestamp). So I still don't see the gain in using the minmax indexes: again, could you point me to some performance tests of *any* use case? Simon Riggs wrote > They are not the answer to every prayer, but with respect they are > better than you had claimed they would be. (25% not 80%, in your use > case). I saw this was likely to be the case and this is why I > challenged you to describe in more detail. Thank you. I claimed they would scan the 80% of the table because I assumed I had to use them in the random fields; not in the call_timestamp field. I don't need a better index in the call_timestamp, because it's sequential, I don't have problems with that. But it's useless: I don't want to seq scan 25% of a multi-TB table. Simon Riggs wrote > Performance tests are only possible with a clear use case. Well, so I can add my weird_index patch in postgresql source code, and it would be committed right away??? I assumed you had to prove somehow that the new index was better than what is already available, at least for some cases. Or, in other words: what are you going to write in the minmax index documentation, "try and see if they work better for you"? Simon Riggs wrote > Please see that Alvaro and I have gone out of our way to provide a new > facility to help you and others, and that it requires changing how we > think about the solution. I accept it may not provide for every case > but it requires careful analysis before deciding that is so. If I came out too rough, I ask your pardon. I always appreciate people taking their time to help someone else for free. Plus, I'm *very* interested in the minmax index, especially for call_timestamp (some queries are date-range only, such as "give me all the calls in this particular 2 secs range) or the id column I have. But, at the same time, I don't see any evidence that they work better than btrees (except for the size of the index). I would like to see some numbers. I worked a little in the bitmap index implementation, and I stopped because on the large tables these indexes are supposed to be used, the heap lookup took so much time that the (slightly) faster index access didn't really help, because it was a fraction of the query time... I'm afraid it would be the same with minmax indexes, that's why I wanted to see some numbers... Simon Riggs wrote >> And, again, I think that random values insertion is the worst >> use case for minmax indexes. > > I agree, it is. What we have disagreed on is the extent to which that > scenario exists for use cases on very large tables, which are > typically "append-mostly" with most queries accessing a subset of the > table, e.g. date range. Mhh... maybe this is this point we don't understand each other? I query the table by userid + date range. The date range is *not* selective enough (it's, as you said, 25% of the multi-TB table). The userid is selective *a lot*. I'm looking for a "better" index for the userid column(s). The "new" indexes I mentioned in the OP claim they are better in this scenario (but I don't blindly believe them....) I don't see how indexing the call_timestamp only could be of any interest, since it would require seq-scanning 25% of a huge table for every query. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778124.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
pgsql-hackers by date: