Interesting performance for function between 7.4.6 and 8.0rc2 - Mailing list pgsql-general

From Joshua D. Drake
Subject Interesting performance for function between 7.4.6 and 8.0rc2
Date
Msg-id 41C89680.2000209@commandprompt.com
Whole thread Raw
List pgsql-general
Hello,

First some specifics:

OS Linux FC1
1 Gig of Ram
IDE RAID 1 array
AMD 2200 XP

Running both 8.0rc2 and 7.4.6 for testing. The function is a plphp
function. The code for the function is below:


CREATE OR REPLACE FUNCTION get_users(bigint[]) RETURNS SETOF
resolved_users_type AS '
     $userData = array();
     $temp = array();
     foreach ($args[0] as $value) { $temp[] = $value."::bigint"; }
     $res = spi_exec_query("SELECT id, realname, email FROM users WHERE
id IN (".implode(",", $temp).")");
     if ($res[''processed''] > 0) {
         while ($row = spi_fetch_row($res)) {
             $userData[] = $row;
         }
     }
return ($userData) ? $userData : NULL;
' LANGUAGE 'plphp';


Basically we take an array of values and look up a list of users based
on the elements in the array and return the results. There is a static
cast for 7.4.6 sake which may not be required for 8.

Here is a sample dataset for the users table:

  id  |      realname      |            email
------+--------------------+------------------------------
     2 | Jonathan Daugherty | jdaugherty@commandprompt.com
     4 | Steven1 Klassen    | sklassen1@commandprompt.com
     8 | Steven1 Klassen    | sklassen1@commandprompt.com
     9 | Steven1 Klassen    | sklassen1@commandprompt.com
    12 | Steven1 Klassen    | sklassen1@commandprompt.com
    56 | Jonathan Daugherty | jdaugherty@commandprompt.com
  2000 | Steven Klassen     | sklassen@commandprompt.com
    23 | Steven1 Klassen    | sklassen1@commandprompt.com
   690 | Steven Klassen     | sklassen@commandprompt.com
  4000 | Steven Klassen     | sklassen@commandprompt.com
  1243 | Steven Klassen     | sklassen@commandprompt.com
     5 | Steven1 Klassen    | sklassen1@commandprompt.com
     6 | Steven1 Klassen    | sklassen1@commandprompt.com
   230 | Steven Klassen     | sklassen@commandprompt.com
   330 | Steven Klassen     | sklassen@commandprompt.com
   430 | Steven Klassen     | sklassen@commandprompt.com
   440 | Steven Klassen     | sklassen@commandprompt.com
   550 | Steven Klassen     | sklassen@commandprompt.com
   660 | Steven Klassen     | sklassen@commandprompt.com
   770 | Steven Klassen     | sklassen@commandprompt.com


Here is the functional_query we are running:

SELECT * FROM
get_users('{2,4,8,9,12,56,2000,23,690,4000,1243,5,6,230,330,430,440,550,660,770}');

Statistics is set to 50 on the id column (which is way to high). It has
about 51000 rows.

8.0 first run: 109.664 ms
8.0 second run: 2.939 ms

7.4.6 first run: 121.416ms
7.4.6 second run: 8.665 ms

So this shows a significant increase in functions performance for second
+ runs and a little difference for first runs. If I enable preload
libraries the numbers are the following:

8.0 first run: 107.689ms
8.0 second run: 2.915 ms

7.4.6 first run: 119.400 ms
7.4.6 second run: 8.629 ms

It doesn't look like preload libraries helps as much as I would like but
every little bit helps. The really good news of course is the amazing
increase in performance from 7.4.6 to 8.0. I assume this is because
we are now keeping statistics for functions.

The changes in 8.0 should help projects like TSearch2 quite a bit.

One thing I did note that is very odd is:

sklassen=# explain analyze select * from
get_users('{2,4,8,9,12,56,2000,23,690,4000,1243,5,6,230,330,430,440,550,660,770}');
                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------
  Function Scan on get_users  (cost=0.00..12.50 rows=1000 width=72)
(actual time=2.464..2.488 rows=20 loops=1)
  Total runtime: 2.520 ms


SELECT id, realname, email FROM users WHERE id IN

(2::bigint,4::bigint,8::bigint,9::bigint,12::bigint,56::bigint,2000::bigint,23::bigint,690::bigint,4000::bigint,1243::bigint,5::bigint,6::bigint,230::bigint,330::bigint,430::bigint,440::bigint,550::bigint,660::bigint,770::bigint)"
LOG:  duration: 2.937 ms


Notice the two durations, they are different but for the exact same run
on the query. Is the duration being calculated between psql and the backend?


Sincerely,

Joshua D. Drake

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

pgsql-general by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Scheduler in Postgres
Next
From: "Jim C. Nasby"
Date:
Subject: PostgreSQL training curriculum