Thread: speeding up a select with C function?
Hi, we need to optimize / speed up a simple select: explain analyze select ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value)) from job_property t0, job_property t1 where t0.id_job_profile = 5 and t1.id_job_profile = 6 and t1.id_job_attribute = t0.id_job_attribute and t1.int_value < t0.int_value; the result from explain analyze is: first run: ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------------ Merge Join (cost=8314.36..8336.21 rows=258 width=8) (actual time=226.544..226.890 rows=43 loops=1) Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute) Join Filter: ("inner".int_value < "outer".int_value) -> Sort (cost=4157.18..4159.75 rows=1026 width=8) (actual time=113.781..113.826 rows=232 loops=1) Sort Key: t0.id_job_attribute -> Index Scan using job_property__id_job_profile__fk_index on job_property t0 (cost=0.00..4105.87 rows=1026 width=8) (actual time=0.045..113.244 rows=232 loops=1) Index Cond: (id_job_profile = 5) -> Sort (cost=4157.18..4159.75 rows=1026 width=8) (actual time=112.504..112.544 rows=254 loops=1) Sort Key: t1.id_job_attribute -> Index Scan using job_property__id_job_profile__fk_index on job_property t1 (cost=0.00..4105.87 rows=1026 width=8) (actual time=0.067..112.090 rows=254 loops=1) Index Cond: (id_job_profile = 6) Total runtime: 227.120 ms (12 rows) second run: ------------------------------------------------------------------------ ------------------------------------------------------------------------ ---------------------------- Merge Join (cost=8314.36..8336.21 rows=258 width=8) (actual time=4.323..4.686 rows=43 loops=1) Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute) Join Filter: ("inner".int_value < "outer".int_value) -> Sort (cost=4157.18..4159.75 rows=1026 width=8) (actual time=2.666..2.700 rows=232 loops=1) Sort Key: t0.id_job_attribute -> Index Scan using job_property__id_job_profile__fk_index on job_property t0 (cost=0.00..4105.87 rows=1026 width=8) (actual time=0.279..2.354 rows=232 loops=1) Index Cond: (id_job_profile = 5) -> Sort (cost=4157.18..4159.75 rows=1026 width=8) (actual time=1.440..1.477 rows=254 loops=1) Sort Key: t1.id_job_attribute -> Index Scan using job_property__id_job_profile__fk_index on job_property t1 (cost=0.00..4105.87 rows=1026 width=8) (actual time=0.040..1.133 rows=254 loops=1) Index Cond: (id_job_profile = 6) Total runtime: 4.892 ms (12 rows) I have run vacuum analyze before executing the statements. I wonder now if there is any chance to speed this up. Could we use a C function to access the indexes faster or is there any other chance to speed this up? The Server is a dual G5/2GHZ with 8 GB of RAM and a 3.5 TB fiberchannel RAID. The job_property table is about 1 GB large (checked with dbsize) and has about 6.800.000 rows. regards David
> I have run vacuum analyze before executing the statements. I wonder now > if there is any chance to speed this up. Is this an active table for writes? You may want to take a look at CLUSTER. In some circumstances, it can take an order of magnitude off the query time by allowing less pages to be retrieved from disk. Other than that, if you're willing to drop performance of all queries not hitting the table to speed up this one, you can pin the index and table into memory (cron job running a select periodically to ensure it sticks). Shrink the actual data size (Drop the OID column, use a smallint instead of an integer, etc.) One final option is to alter PostgreSQL into possibly doing a sudo-sequential scan on the table when reading indexes, rather than pulling data from the table in a random order as it is found in the index. This is a rather complex project, but doable. http://momjian.postgresql.org/cgi-bin/pgtodo?performance
> explain analyze select > ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value)) > from job_property t0, job_property t1 > where t0.id_job_profile = 5 > and t1.id_job_profile = 6 > and t1.id_job_attribute = t0.id_job_attribute > and t1.int_value < t0.int_value; Don't bother with C function, use SQL function instead. You could get a 50% speedup. Chris
On Sun, 7 Mar 2004, David Teran wrote: > we need to optimize / speed up a simple select: > > explain analyze select > ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value)) > from job_property t0, job_property t1 > where t0.id_job_profile = 5 > and t1.id_job_profile = 6 > and t1.id_job_attribute = t0.id_job_attribute > and t1.int_value < t0.int_value; Try to add an index on (id_job_profile, id_job_attribute) or maybe even (id_job_profile, id_job_attribute, int_value) -- /Dennis Björklund
Hi, On 08.03.2004, at 02:29, Christopher Kings-Lynne wrote: >> explain analyze select >> ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value)) >> from job_property t0, job_property t1 >> where t0.id_job_profile = 5 >> and t1.id_job_profile = 6 >> and t1.id_job_attribute = t0.id_job_attribute >> and t1.int_value < t0.int_value; > > Don't bother with C function, use SQL function instead. You could get > a 50% speedup. > Is this always the case when using SQL instead of the C API to get values or only the function 'call' itself? We are thinking to use C functions which are optimized for the G5 altivec unit. regards David
Hi Dennis, >> we need to optimize / speed up a simple select: >> >> explain analyze select >> ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value)) >> from job_property t0, job_property t1 >> where t0.id_job_profile = 5 >> and t1.id_job_profile = 6 >> and t1.id_job_attribute = t0.id_job_attribute >> and t1.int_value < t0.int_value; > > Try to add an index on (id_job_profile, id_job_attribute) or maybe even > (id_job_profile, id_job_attribute, int_value) > Tried this but the index is not used. I know the same problem was true with a FrontBase database so i wonder how i can force that the index is used. As i was not sure in which order the query is executed i decided to create indexes for all variations: id_job_profile, id_job_attribute, int_value id_job_profile, int_value, id_job_attribute int_value, id_job_attribute, id_job_profile, int_value, id_job_profile, id_job_attribute .... here is the output: ------------------------------------------------------------------------ ------------------------------------------------------------------------ --------------------------------------- Merge Join (cost=5369.08..5383.14 rows=150 width=4) (actual time=2.527..2.874 rows=43 loops=1) Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute) Join Filter: ("inner".int_value < "outer".int_value) -> Sort (cost=2684.54..2686.37 rows=734 width=6) (actual time=1.140..1.177 rows=232 loops=1) Sort Key: t0.id_job_attribute -> Index Scan using job_property_short__id_job_profile__fk_index on job_property_short t0 (cost=0.00..2649.60 rows=734 width=6) (actual time=0.039..0.820 rows=232 loops=1) Index Cond: (id_job_profile = 5) -> Sort (cost=2684.54..2686.37 rows=734 width=6) (actual time=1.175..1.223 rows=254 loops=1) Sort Key: t1.id_job_attribute -> Index Scan using job_property_short__id_job_profile__fk_index on job_property_short t1 (cost=0.00..2649.60 rows=734 width=6) (actual time=0.023..0.878 rows=254 loops=1) Index Cond: (id_job_profile = 6) Total runtime: 3.065 ms (12 rows) So the question is how to tell Postgres to use the index. regards David
>> Don't bother with C function, use SQL function instead. You could get >> a 50% speedup. >> > Is this always the case when using SQL instead of the C API to get > values or only the function 'call' itself? We are thinking to use C > functions which are optimized for the G5 altivec unit. SQL functions are stored prepared, so there is less per-call query planning overhead. I'm not sure there'd be much advantage to doing them in C... Chris
David Teran <david.teran@cluster9.com> writes: > Merge Join (cost=5369.08..5383.14 rows=150 width=4) (actual > time=2.527..2.874 rows=43 loops=1) > Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute) > Join Filter: ("inner".int_value < "outer".int_value) > -> Sort (cost=2684.54..2686.37 rows=734 width=6) (actual > time=1.140..1.177 rows=232 loops=1) > Sort Key: t0.id_job_attribute > -> Index Scan using > job_property_short__id_job_profile__fk_index on job_property_short t0 > (cost=0.00..2649.60 rows=734 width=6) (actual time=0.039..0.820 > rows=232 loops=1) > Index Cond: (id_job_profile = 5) > -> Sort (cost=2684.54..2686.37 rows=734 width=6) (actual > time=1.175..1.223 rows=254 loops=1) > Sort Key: t1.id_job_attribute > -> Index Scan using > job_property_short__id_job_profile__fk_index on job_property_short t1 > (cost=0.00..2649.60 rows=734 width=6) (actual time=0.023..0.878 > rows=254 loops=1) > Index Cond: (id_job_profile = 6) > Total runtime: 3.065 ms > (12 rows) > So the question is how to tell Postgres to use the index. Er, which part of that do you think is not using an index? More generally, it is not necessarily the case that a join *should* use an index. I'm a bit surprised that the above bothers to sort; I'd expect a hash join to be more appropriate. Have you tried experimenting with enable_mergejoin and the other planner-testing settings? regards, tom lane