Re: Including Snapshot Info with Indexes - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: Including Snapshot Info with Indexes |
Date | |
Msg-id | 1193124048.28269.19.camel@hannu-laptop Whole thread Raw |
In response to | Re: Including Snapshot Info with Indexes ("Luke Lonergan" <llonergan@greenplum.com>) |
List | pgsql-hackers |
Ühel kenal päeval, L, 2007-10-20 kell 10:19, kirjutas Luke Lonergan: > Hi Hannu, > > On 10/14/07 12:58 AM, "Hannu Krosing" <hannu@skype.net> wrote: > > > What has happened in reality, is that the speed difference between CPU, > > RAM and disk speeds has _increased_ tremendously > > Yes. > > > which makes it even > > more important to _decrease_ the size of stored data if you want good > > performance > > Or bring the cpu processing closer to the data it's using (or both). > > By default, the trend you mention first will continue in an unending way - > the consequence is that the "distance" between a processor and it's target > data will continue to increase ad-infinitum. the emergence of solid-state (flash) disks may help a little here, but in general it is true. > By contrast, you can only decrease the data volume so much - so in the end > you'll be left with the same problem - the data needs to be closer to the > processing. This is the essence of parallel / shared nothing architecture. > > Note that we've done this at Greenplum. We're also implementing a DSM-like > capability and are investigating a couple of different hybrid row / column > store approaches. Have you tried moving the whole visibility part of tuples out to a separate heap ? Especially in OLAP/ETL scenarios the distribution of tuples loaded in one transaction should be very good for visibility-info compression. I'd suspect that you could crush hundreds of pages worth of visibility into single RLE encoding unit (xmin=N, xmax=no_yet, start_ctid = X, end_ctid=Y), and it will stay in L1 cache most of the time you process the corresponding relation. and the relation itself will be smaller, and index-only (actually index-only + lookup inside L1 cache) access can happen, and so on . OTOH, if you load it in millions of small transactions, you can run VACUUM FREEZE _on_ the visibility heap only, which will make all visibility infoe look similar and thus RLE-compressable and again make it fit in L1 cache, if you dont have lots of failed loads interleaved with successful ones. > Bitmap index with index-only access does provide nearly all of the > advantages of a column store from a speed standpoint BTW. Even though > Vertica is touting speed advantages - our parallel engine plus bitmap index > will crush them in benchmarks when they show up with real code. > > Meanwhile they're moving on to new ideas - I kid you not "Horizontica" is > Dr. Stonebraker's new idea :-) Sounds like a result of a marketroid brainstorming session :P > So - bottom line - some ideas from column store make sense, but it's not a > cure-all. > > > There is also a MonetDB/X100 project, which tries to make MonetOD > > order(s) of magnitude faster by doing in-page compression in order to > > get even more performance, see: > > Actually, the majority of the points made by the MonetDB team involve > decreasing the abstractions in the processing path to improve the IPC > (instructions per clock) efficiency of the executor. The X100 part was about doing in-page compression, so the efficiency of disk to L1 cache pathway would increase. so for 1/2 compression the CPU would get twice the data threoughput. > We are also planning to do this by operating on data in vectors of projected > rows in the executor, which will increase the IPC by reducing I-cache misses > and improving D-cache locality. Tight loops will make a much bigger > difference when long runs of data are the target operands. > > - Luke > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate
pgsql-hackers by date: