Home > mailing lists

Re: Performance improvement hints + measurement - Mailing list pgsql-hackers

From	devik@cdi.cz
Subject	Re: Performance improvement hints + measurement
Date	September 13, 2000 07:41:34
Msg-id	39BF8314.8A790C34@cdi.cz Whole thread Raw
In response to	Performance improvement hints (devik@cdi.cz)
Responses	Re: Performance improvement hints + measurement
List	pgsql-hackers

Tree view

> >    But indexscan always lookups actual record in heap even if
> >    all needed attributes are contained in the index.
> >    Oracle and even MSSQL reads attributes directly from index
> >    without looking for actual tuple at heap.
> 
> Doesn't work in Postgres' storage management scheme --- the heap
> tuple must be consulted to see if it's still valid.

yes, I just spent another day by looking into sources and
it seems that we need xmin, xmax stuff.
What do you think about this approach:

1) add all validity & tx fields from heap tuple into   index tuple too
2) when generating plan for index scan try to determine  whether we can satisfy target list using only data  from index
tuples,if yes then compute cost without  accounting random heap page reads - it will lead into  much lower cost
 
3) whenever you update/delete heap tuple's tx fields, update  then also in indices (you don't have to delete them from
index)

It will cost more storage space and slightly more work when
updating indices but should give excelent performance when
index is used. 

Measurements:
I've table with about 2 mil. rows declared as
bigrel(namex varchar(50),cnt integer,sale datetime). 
I regulary need to run this query against it:
select nazev,sum(cnt) from bigrel group by name;
It took (in seconds):

Server\Index   YES      NO
pg7.01 linux   58       264
MSSQL7 winnt   17       22

I compared on the same machine (PII/375,128RAM) using
WINNT under VMWARE and native linux 2.2. pq was 
vaccum analyzed.
Why is pgsql so slow ? The mssql plan without index uses
hash aggregating but pg sorts while relation.
With index, in pg there is a big overhead of heap tuple
reading - mssql uses data directly from scanned index.

Also I noticed another problem, when I added 
where nazev<'0' it took 110ms on pg when I used
set enable_seqscan=on;.
Without is, planner still tried to use seqscan+sort
which took 27s in this case.

I'm not sure how complex the proposed changes are. Another
way would be to implement another aggregator like HashAgg
which will use hashing. 
But it could be even more complicated as one has to use
temp relation to store all hash buckets ..

Still I think that direct index reads should give us huge
speed improvement for all indexed queries.
I'm prepared to implement it but I'd like to know your
hints/complaints.

Regards, devik

pgsql-hackers by date:

From: Denis Perchine
Date: 13 September 2000, 07:02:13
Subject: Fwd: Problems inserting data

From: Philip Warner
Date: 13 September 2000, 07:42:12
Subject: RE: current is broken

Re: Performance improvement hints + measurement - Mailing list pgsql-hackers

Previous

Next