Thread: speeding up a select with C function?

speeding up a select with C function?

From

David Teran

Date:

07 March 2004, 07:42:30

Hi,

  we need to optimize / speed up a simple select:

explain analyze select
((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value))
from job_property t0, job_property t1
where t0.id_job_profile = 5
and t1.id_job_profile = 6
and t1.id_job_attribute = t0.id_job_attribute
and t1.int_value < t0.int_value;

the result from explain analyze is:

first run:
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------
  Merge Join  (cost=8314.36..8336.21 rows=258 width=8) (actual
time=226.544..226.890 rows=43 loops=1)
    Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute)
    Join Filter: ("inner".int_value < "outer".int_value)
    ->  Sort  (cost=4157.18..4159.75 rows=1026 width=8) (actual
time=113.781..113.826 rows=232 loops=1)
          Sort Key: t0.id_job_attribute
          ->  Index Scan using job_property__id_job_profile__fk_index on
job_property t0  (cost=0.00..4105.87 rows=1026 width=8) (actual
time=0.045..113.244 rows=232 loops=1)
                Index Cond: (id_job_profile = 5)
    ->  Sort  (cost=4157.18..4159.75 rows=1026 width=8) (actual
time=112.504..112.544 rows=254 loops=1)
          Sort Key: t1.id_job_attribute
          ->  Index Scan using job_property__id_job_profile__fk_index on
job_property t1  (cost=0.00..4105.87 rows=1026 width=8) (actual
time=0.067..112.090 rows=254 loops=1)
                Index Cond: (id_job_profile = 6)
  Total runtime: 227.120 ms
(12 rows)

second run:
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------
  Merge Join  (cost=8314.36..8336.21 rows=258 width=8) (actual
time=4.323..4.686 rows=43 loops=1)
    Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute)
    Join Filter: ("inner".int_value < "outer".int_value)
    ->  Sort  (cost=4157.18..4159.75 rows=1026 width=8) (actual
time=2.666..2.700 rows=232 loops=1)
          Sort Key: t0.id_job_attribute
          ->  Index Scan using job_property__id_job_profile__fk_index on
job_property t0  (cost=0.00..4105.87 rows=1026 width=8) (actual
time=0.279..2.354 rows=232 loops=1)
                Index Cond: (id_job_profile = 5)
    ->  Sort  (cost=4157.18..4159.75 rows=1026 width=8) (actual
time=1.440..1.477 rows=254 loops=1)
          Sort Key: t1.id_job_attribute
          ->  Index Scan using job_property__id_job_profile__fk_index on
job_property t1  (cost=0.00..4105.87 rows=1026 width=8) (actual
time=0.040..1.133 rows=254 loops=1)
                Index Cond: (id_job_profile = 6)
  Total runtime: 4.892 ms
(12 rows)


I have run vacuum analyze before executing the statements. I wonder now
if there is any chance to speed this up. Could we use a C function to
access the indexes faster or is there any other chance to speed this
up?

The Server is a dual G5/2GHZ with 8 GB of RAM and a 3.5 TB fiberchannel
RAID. The job_property table is about 1 GB large (checked with dbsize)
and has about 6.800.000 rows.


regards David

Re: speeding up a select with C function?

From

Rod Taylor

Date:

07 March 2004, 11:02:14

> I have run vacuum analyze before executing the statements. I wonder now
> if there is any chance to speed this up.

Is this an active table for writes? You may want to take a look at
CLUSTER. In some circumstances, it can take an order of magnitude off
the query time by allowing less pages to be retrieved from disk.

Other than that, if you're willing to drop performance of all queries
not hitting the table to speed up this one, you can pin the index and
table into memory (cron job running a select periodically to ensure it
sticks).

Shrink the actual data size (Drop the OID column, use a smallint instead
of an integer, etc.)


One final option is to alter PostgreSQL into possibly doing a
sudo-sequential scan on the table when reading indexes, rather than
pulling data from the table in a random order as it is found in the
index. This is a rather complex project, but doable.
http://momjian.postgresql.org/cgi-bin/pgtodo?performance

Re: speeding up a select with C function?

From

Christopher Kings-Lynne

Date:

07 March 2004, 21:25:25

> explain analyze select
> ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value))
> from job_property t0, job_property t1
> where t0.id_job_profile = 5
> and t1.id_job_profile = 6
> and t1.id_job_attribute = t0.id_job_attribute
> and t1.int_value < t0.int_value;

Don't bother with C function, use SQL function instead.  You could get a
50% speedup.

Chris

Re: speeding up a select with C function?

From

Dennis Bjorklund

Date:

08 March 2004, 02:22:08

On Sun, 7 Mar 2004, David Teran wrote:

>   we need to optimize / speed up a simple select:
>
> explain analyze select
> ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value))
> from job_property t0, job_property t1
> where t0.id_job_profile = 5
> and t1.id_job_profile = 6
> and t1.id_job_attribute = t0.id_job_attribute
> and t1.int_value < t0.int_value;

Try to add an index on (id_job_profile, id_job_attribute) or maybe even
(id_job_profile, id_job_attribute, int_value)

--
/Dennis Björklund

Re: speeding up a select with C function?

From

David Teran

Date:

09 March 2004, 05:54:19

Hi,


On 08.03.2004, at 02:29, Christopher Kings-Lynne wrote:

>> explain analyze select
>> ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value))
>> from job_property t0, job_property t1
>> where t0.id_job_profile = 5
>> and t1.id_job_profile = 6
>> and t1.id_job_attribute = t0.id_job_attribute
>> and t1.int_value < t0.int_value;
>
> Don't bother with C function, use SQL function instead.  You could get
> a 50% speedup.
>
Is this always the case when using SQL instead of the C API to get
values or only the function 'call' itself? We are thinking to use C
functions which are optimized for the G5 altivec unit.

regards David

Re: speeding up a select with C function?

From

David Teran

Date:

09 March 2004, 08:02:50

Hi Dennis,

>>   we need to optimize / speed up a simple select:
>>
>> explain analyze select
>> ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value))
>> from job_property t0, job_property t1
>> where t0.id_job_profile = 5
>> and t1.id_job_profile = 6
>> and t1.id_job_attribute = t0.id_job_attribute
>> and t1.int_value < t0.int_value;
>
> Try to add an index on (id_job_profile, id_job_attribute) or maybe even
> (id_job_profile, id_job_attribute, int_value)
>

Tried this but the index is not used. I know the same problem was true
with a FrontBase database so i wonder how i can force that the index is
used. As i was not sure in which order the query is executed i decided
to create indexes for all variations:

id_job_profile, id_job_attribute, int_value
id_job_profile, int_value, id_job_attribute
int_value, id_job_attribute, id_job_profile,
int_value, id_job_profile, id_job_attribute
....


here is the output:

------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------
  Merge Join  (cost=5369.08..5383.14 rows=150 width=4) (actual
time=2.527..2.874 rows=43 loops=1)
    Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute)
    Join Filter: ("inner".int_value < "outer".int_value)
    ->  Sort  (cost=2684.54..2686.37 rows=734 width=6) (actual
time=1.140..1.177 rows=232 loops=1)
          Sort Key: t0.id_job_attribute
          ->  Index Scan using
job_property_short__id_job_profile__fk_index on job_property_short t0
(cost=0.00..2649.60 rows=734 width=6) (actual time=0.039..0.820
rows=232 loops=1)
                Index Cond: (id_job_profile = 5)
    ->  Sort  (cost=2684.54..2686.37 rows=734 width=6) (actual
time=1.175..1.223 rows=254 loops=1)
          Sort Key: t1.id_job_attribute
          ->  Index Scan using
job_property_short__id_job_profile__fk_index on job_property_short t1
(cost=0.00..2649.60 rows=734 width=6) (actual time=0.023..0.878
rows=254 loops=1)
                Index Cond: (id_job_profile = 6)
  Total runtime: 3.065 ms
(12 rows)




So the question is how to tell Postgres to use the index.

regards David

Re: speeding up a select with C function?

From

Christopher Kings-Lynne

Date:

09 March 2004, 10:46:35

>> Don't bother with C function, use SQL function instead.  You could get
>> a 50% speedup.
>>
> Is this always the case when using SQL instead of the C API to get
> values or only the function 'call' itself? We are thinking to use C
> functions which are optimized for the G5 altivec unit.

SQL functions are stored prepared, so there is less per-call query
planning overhead.  I'm not sure there'd be much advantage to doing them
in C...

Chris

Re: speeding up a select with C function?

From

Tom Lane

Date:

09 March 2004, 11:37:41

David Teran <david.teran@cluster9.com> writes:
>   Merge Join  (cost=5369.08..5383.14 rows=150 width=4) (actual
> time=2.527..2.874 rows=43 loops=1)
>     Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute)
>     Join Filter: ("inner".int_value < "outer".int_value)
>     ->  Sort  (cost=2684.54..2686.37 rows=734 width=6) (actual
> time=1.140..1.177 rows=232 loops=1)
>           Sort Key: t0.id_job_attribute
>           ->  Index Scan using
> job_property_short__id_job_profile__fk_index on job_property_short t0
> (cost=0.00..2649.60 rows=734 width=6) (actual time=0.039..0.820
> rows=232 loops=1)
>                 Index Cond: (id_job_profile = 5)
>     ->  Sort  (cost=2684.54..2686.37 rows=734 width=6) (actual
> time=1.175..1.223 rows=254 loops=1)
>           Sort Key: t1.id_job_attribute
>           ->  Index Scan using
> job_property_short__id_job_profile__fk_index on job_property_short t1
> (cost=0.00..2649.60 rows=734 width=6) (actual time=0.023..0.878
> rows=254 loops=1)
>                 Index Cond: (id_job_profile = 6)
>   Total runtime: 3.065 ms
> (12 rows)

> So the question is how to tell Postgres to use the index.

Er, which part of that do you think is not using an index?

More generally, it is not necessarily the case that a join *should* use
an index.  I'm a bit surprised that the above bothers to sort; I'd
expect a hash join to be more appropriate.  Have you tried experimenting
with enable_mergejoin and the other planner-testing settings?

            regards, tom lane