Re: Performance improvement hints + measurement - Mailing list pgsql-hackers

From devik@cdi.cz
Subject Re: Performance improvement hints + measurement
Date
Msg-id 39BF8314.8A790C34@cdi.cz
Whole thread Raw
In response to Performance improvement hints  (devik@cdi.cz)
Responses Re: Performance improvement hints + measurement
List pgsql-hackers
> >    But indexscan always lookups actual record in heap even if
> >    all needed attributes are contained in the index.
> >    Oracle and even MSSQL reads attributes directly from index
> >    without looking for actual tuple at heap.
> 
> Doesn't work in Postgres' storage management scheme --- the heap
> tuple must be consulted to see if it's still valid.

yes, I just spent another day by looking into sources and
it seems that we need xmin, xmax stuff.
What do you think about this approach:

1) add all validity & tx fields from heap tuple into   index tuple too
2) when generating plan for index scan try to determine  whether we can satisfy target list using only data  from index
tuples,if yes then compute cost without  accounting random heap page reads - it will lead into  much lower cost
 
3) whenever you update/delete heap tuple's tx fields, update  then also in indices (you don't have to delete them from
index)

It will cost more storage space and slightly more work when
updating indices but should give excelent performance when
index is used. 

Measurements:
I've table with about 2 mil. rows declared as
bigrel(namex varchar(50),cnt integer,sale datetime). 
I regulary need to run this query against it:
select nazev,sum(cnt) from bigrel group by name;
It took (in seconds):

Server\Index   YES      NO
pg7.01 linux   58       264
MSSQL7 winnt   17       22

I compared on the same machine (PII/375,128RAM) using
WINNT under VMWARE and native linux 2.2. pq was 
vaccum analyzed.
Why is pgsql so slow ? The mssql plan without index uses
hash aggregating but pg sorts while relation.
With index, in pg there is a big overhead of heap tuple
reading - mssql uses data directly from scanned index.

Also I noticed another problem, when I added 
where nazev<'0' it took 110ms on pg when I used
set enable_seqscan=on;.
Without is, planner still tried to use seqscan+sort
which took 27s in this case.

I'm not sure how complex the proposed changes are. Another
way would be to implement another aggregator like HashAgg
which will use hashing. 
But it could be even more complicated as one has to use
temp relation to store all hash buckets ..

Still I think that direct index reads should give us huge
speed improvement for all indexed queries.
I'm prepared to implement it but I'd like to know your
hints/complaints.

Regards, devik



pgsql-hackers by date:

Previous
From: Denis Perchine
Date:
Subject: Fwd: Problems inserting data
Next
From: Philip Warner
Date:
Subject: RE: current is broken