Re: [PATCHES] Including Snapshot Info with Indexes - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: [PATCHES] Including Snapshot Info with Indexes |
Date | |
Msg-id | 1193143233.17735.21.camel@hannu-laptop Whole thread Raw |
In response to | Re: [PATCHES] Including Snapshot Info with Indexes ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Responses |
Re: [PATCHES] Including Snapshot Info with Indexes
Re: [PATCHES] Including Snapshot Info with Indexes |
List | pgsql-hackers |
Ühel kenal päeval, T, 2007-10-23 kell 13:04, kirjutas Heikki Linnakangas: > Gokulakannan Somasundaram wrote: > > Say, with a normal index, you need to goto the table for checking the > > snapshot. So you would be loading both the index pages + table pages, in > > order to satisfy a certain operations. Whereas in thick index you occupy 16 > > bytes per tuple more in order to avoid going to the table. So memory > > management is again better. But i can run the load test, if that's > > required. > > Yes, performance testing is required for any performance-related patch. > > Remember that you're competing against DSM. We're going to want some > kind of a DSM anyway because it allows skipping unmodified parts of the > heap in vacuum. I would suggest that you use just an additional heap with decoupled visibility fields as DSM. For a large number of usage scenarios this will be highly compressible and will mostly stay in processor caches . You can start slow, and have the info duplicated in both main heap and visibility heap (aka DSM). There are several advantages to keeping a separate visibility heap: 1) it is usually higly compressible, at least you can throw away cmin/cmax quite soon, usually also FREEZE and RLE encode the rest. 2) faster access, more tightly packed data pages. 3) index-only scans 4) superfast VACUUM FREEZE 5) makes VACUUM faster even for worst cases (interleaving live and dead tuples) 6) any index scan will be faster due to fetching only visible rows from main heap. > > Even when all the tuples are in memory, index only scans are > > almost 40-60% faster than the index scans with thin indexes. > > Have you actually benchmarked that? What kind of query was that? I don't > believe for a second that fetching the heap tuple when the page is in > memory accounts for 40-60% of the overhead of regular index scans. It depends heavily on the type of memory (postgresql page or disk cache) it is in. I remember doing Slony sobscribes in early days, and the speed difference on loading a table with active PK index was several times, depending on shared_buffers setting. That was for a table, where both heap and index did fit in the 2G memory which was available, the difference being only shuffling the pages between postgresql buffer and linux system cache or not. > BTW, another issue you'll have to tackle, that a DSM-based patch will > have to solve as well, is how to return tuples from an index. In b-tree, > we scan pages page at a time, keeping a list of all tids that match the > scanquals in BTScanOpaque. If we need to return not only the tids of the > matching tuples, but the tuples as well, where do we store them? You > could make a palloc'd copy of them all, but that seems quite expensive. Have you considered returning them as "already visibility-checked pages" similar to what views or set-returning functions return ? ------------------- Hannu
pgsql-hackers by date: