Re: [PATCHES] Including Snapshot Info with Indexes - Mailing list pgsql-hackers

From Gokulakannan Somasundaram
Subject Re: [PATCHES] Including Snapshot Info with Indexes
Date
Msg-id 9362e74e0710230435r4553454eg4258e0fea2a03198@mail.gmail.com
Whole thread Raw
In response to Re: [PATCHES] Including Snapshot Info with Indexes  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Responses Re: [PATCHES] Including Snapshot Info with Indexes
List pgsql-hackers


On 10/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Please keep the list cc'd.

Gokulakannan Somasundaram wrote:
> On 10/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>> Gokulakannan Somasundaram wrote:
>> I have also enabled the display of Logical Reads. In order to see that,
>> set
>>> log_statement_stats on.
>> You should start benchmarking, to verify that you're really getting the
>> kind of speed up you're looking for, before you spend any more effort on
>> that. Reduction in logical reads alone isn't enough. Remember that for a
>> big change like that, the gain has to be big as well.
>
> I have done the benchmark. I have done the benchmark with Logical reads, as
> they turn out to be potential physical reads. Try turning on the
> log_statement_stats in postgresql.conf. try firing some queries, which can
> satisfied by the index. You would see the difference.

I would see a decrease in the number of logical reads, that's all. You
need to demonstrate a real increase in throughput and/or reduction in
response times.

Note that even though you reduce the number of logical reads, with a
thick index a logical read is *more* likely to be a physical read,
because the index is larger and therefore consumes more cache.

Say, with a normal index, you need to goto the table for checking the snapshot. So you would be loading both the index pages + table pages, in order to satisfy a certain operations. Whereas in thick index you occupy 16 bytes per tuple more in order to avoid going to the table. So memory management is again better. But i can run the load test, if that's required.  Even when all the tuples are in memory, index only scans are almost 40-60% faster than the index scans with thin indexes.

> As a first test, I'd like to see results from SELECTs on different sized
>> tables. On tables that fit in cache, and on tables that don't. Tables
>> large enough that the index doesn't fit in cache. And as a special case,
>> on a table just the right size that a normal index fits in cache, but a
>> thick one doesn't.
>
> I have not done a Load test. That's a good idea. Are you guys using Apache
> JMeter?

You can use whatever you want, as long as you can get the relevant
numbers out of it. contrib/pgbench is a good place to start.

DBT-2 is another test people often use for patches like this. It's quite
tedious to set up and operate, but it'll give you nice very graphs.

Make sure you control vacuums, checkpoints etc., so that you get
repeatable results.


Sure i will do that. Thanks for the advice.

> Also i think you might have noted that the thick indexes are not affected by
> updates, if the updated column is not in the index. I think that add on to
> one more advantage of thick indexes against DSM.

That cannot possibly work. Imagine that you have a table

ctid | id | data
-----+----+-----
(0,1)| 1  | foo
(0,2)| 1  | bar

where (0,2) is an updated version of (0,1). If you don't update the
index, there will be no index pointer to (0,2), so a regular index scan,
not an index-only scan, will not find the updated tuple.

Or did you mean that the index is not updated on HOT updates? That's an
interesting observation. We could do index-only scans with the DSM as
well, even if there's HOT updates, if we define the bit in the bitmap to
mean "all tuples in this page are visible to everyone, or there's only
HOT updates". That works, because an index-only-scan doesn't access any
of the updated columns. It probably isn't worth it, though. Seems like a
pretty narrow use case, and makes it more complicated.


I think i was not understood. An update transaction is not degraded by thick index. Update = Delete + insert. If you don't update the columns in index, then we would goto the same index page for both delete and insert. i have done a small optimization there to cache the BTStack. you do not need to do any more I/O. So effectively update performance in thick index = update performance in thin index (if indexed columns are not updated).
Hope i am clear..

What do you thick about not maintaining pins in case of thick indexes?

Thanks,
Gokul,
CertoSQL Project,
Allied Solution Groups.
( www.alliedgroups.com)

pgsql-hackers by date:

Previous
From: Rafael Martinez
Date:
Subject: Re: PostgreSQL performance issues
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: [PATCHES] Including Snapshot Info with Indexes