Thread: Query performance problem

Query performance problem

From
Phil Daintree
Date:
Dear psqlers,

I need your help!

I administer/develop an open source PHP accounting software project (webERP)
that was originally mysql only. Since Christmas I and another member of the
team lower cased all the sql and changed some elements of the SQL to allow it
to use postgres as well. All appears to work beautifully with just a single
but important exception.....

I am using PostgreSQL 7.4.6 running on :5432 on Mandrake 10.1

We wish to avoid lock in to a single database and have a single code base. We
elected to commonise the sql to ansi standard so far as possible in
preference to using stored procs. Whilst this will mean some compromises to
the performance, the sql is quite simple througout as this is a design goal
and I like having the sql inside the code for readability purposes ie its all
in one place. (I know I am wrong from a computer scientist's perspective!)

There are 2 tables used in the sql we need to optimise .....

CREATE TABLE chartdetails (
    accountcode integer DEFAULT 0 NOT NULL,
    period integer DEFAULT 0 NOT NULL,
    budget double precision DEFAULT (0)::double precision NOT NULL,
    actual double precision DEFAULT (0)::double precision NOT NULL,
    bfwd double precision DEFAULT (0)::double precision NOT NULL,
    bfwdbudget double precision DEFAULT (0)::double precision NOT NULL
);


CREATE INDEX idxperiod ON chartdetails USING btree (period);
ALTER TABLE ONLY chartdetails
    ADD CONSTRAINT chartdetails_pkey PRIMARY KEY (accountcode, period);
ALTER TABLE ONLY chartdetails
    ADD CONSTRAINT cnt001251 FOREIGN KEY (accountcode) REFERENCES
chartmaster(accountcode);
ALTER TABLE ONLY chartdetails
    ADD CONSTRAINT cnt001252 FOREIGN KEY (period) REFERENCES
periods(periodno);



AND the second table:


CREATE TABLE gltrans (
    counterindex serial NOT NULL,
    "type" integer DEFAULT 0 NOT NULL,
    typeno bigint DEFAULT (1)::bigint NOT NULL,
    chequeno integer DEFAULT 0 NOT NULL,
    trandate date,
    periodno integer DEFAULT 0 NOT NULL,
    account integer DEFAULT 0 NOT NULL,
    narrative text DEFAULT ''::text NOT NULL,
    amount double precision DEFAULT (0)::double precision NOT NULL,
    posted integer DEFAULT 0 NOT NULL,
    jobref text DEFAULT ''::text NOT NULL
);


CREATE INDEX idxaccount ON gltrans USING btree (account);
CREATE INDEX idxchequeno ON gltrans USING btree (chequeno);
CREATE INDEX idxgtperiodno ON gltrans USING btree (periodno);
CREATE INDEX idxposted ON gltrans USING btree (posted);
CREATE INDEX idxgttrandate ON gltrans USING btree (trandate);
CREATE INDEX idxgttypeno ON gltrans USING btree (typeno);
CREATE INDEX idxtype_and_number ON gltrans USING btree ("type", typeno);
CREATE INDEX idxgtjobref ON gltrans USING btree (jobref);
ALTER TABLE ONLY gltrans
    ADD CONSTRAINT gltrans_pkey PRIMARY KEY (counterindex);
ALTER TABLE ONLY gltrans
    ADD CONSTRAINT cnt001296 FOREIGN KEY (account) REFERENCES
chartmaster(accountcode);
ALTER TABLE ONLY gltrans
    ADD CONSTRAINT cnt001297 FOREIGN KEY ("type") REFERENCES systypes(typeid);
ALTER TABLE ONLY gltrans
    ADD CONSTRAINT cnt001298 FOREIGN KEY (periodno) REFERENCES
periods(periodno);


So there is a chartdetail record for every period for every general ledger
account.  So if there are 5 years x 12 periods (months) and 200 general
ledger accounts this table will be 12,000 records.

There is a gltrans record for every side of a journal entry. This can get to
be quite a significant table - easily more than 200,000 per annum - depending
on the size of the business obviously.


Can anyone tell me why does the following code chokes ... literally - this
works almost invisbly under mysql - pg takes more than an hour even on a very
small 30 record database.

(The table chartmaster is just a list of general ledger accounts accountcode
and accountdescription. PK = accountcode)


    $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);

    While ($AccountRow = DB_fetch_array($ChartAccounts)){

        for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {

            echo '<LI>' . _('Period Number') . ' ' . $PeriodNo . '</LI>';

            // Check if there is an chart details record set up
            $sql = 'SELECT count(*) FROM chartdetails
                    WHERE accountcode='.$AccountRow['accountcode'].'
                    AND period=' . $PeriodNo;
            $InsChartDetails = DB_query($sql,$db,'','','',false);
            $CountRows = DB_fetch_row($InsChartDetails);
            $AccountExistsAlready = $CountRows[0];
            DB_free_result($InsChartDetails);
            if(! $AccountExistsAlready) {
                $sql = 'INSERT INTO chartdetails (accountcode,
                            period)
                    VALUES (' . $AccountRow['accountcode'] . ',
                        ' . $PeriodNo . ')';
                $InsChartDetails = DB_query($sql,$db);
                DB_free_result($InsChartDetails);
            }

        }

    /*Now run through each of the new chartdetail records created for each
account and update them with the B/Fwd and B/Fwd budget no updates would be
required where there were previously no chart details set up ie
FirstPeriodPostedTo > 0 */

        for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {

            $sql = 'SELECT accountcode,
                    period,
                    actual + bfwd AS cfwd,
                    budget + bfwdbudget AS cfwdbudget
                FROM chartdetails WHERE period =' . ($PeriodNo  - 1);
            $ChartDetailsCFwd = DB_query($sql,$db);

            while ($myrow = DB_fetch_array($ChartDetailsCFwd)){

                $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ',
                                bfwdbudget =' . $myrow['cfwdbudget'] . '
                        WHERE accountcode = ' . $myrow['accountcode'] . '
                        AND period >=' . $PeriodNo;
                $UpdChartDetails = DB_query($sql,$db, '', '', '', false);
                DB_free_result($UpdChartDetails);
            }
            DB_free_result($ChartDetailsCFwd);
        }

    }


function DB_query ($SQL,
        &$Conn,
        $ErrorMessage='',
        $DebugMessage= '',
        $Transaction=false,
        $TrapErrors=true){

    global $debug;

    $result = pg_query($Conn, $SQL);
    if ($DebugMessage == '') {
        $DebugMessage = _('The SQL that failed was:');
    }
    //if (DB_error_no($Conn) != 0){
    if ( !$result AND $TrapErrors){
        prnMsg($ErrorMessage.'<BR>' . DB_error_msg($Conn),'error', _('DB ERROR:'));
        if ($debug==1){
            echo '<BR>' . $DebugMessage. "<BR>$SQL<BR>";
        }
        if ($Transaction){
            $SQL = 'rollback';
            $Result = DB_query($SQL,$Conn);
            if (DB_error_no($Conn) !=0){
                prnMsg('<br />'.  _('Error Rolling Back Transaction!!'), '', _('DB
DEBUG:') );
            }
        }
        if ($TrapErrors){
            include('includes/footer.inc');
            exit;
        }
    }
    return $result;

}


I am hoping that someone will be able to see an alternative simpler method or
suggest a method of indexing the pg tables to optmise the required queries. I
would appreciate any help here men.

Many thanks in advance ....

--
Phil Daintree
webERP Project Admin

Re: Query performance problem

From
Richard Huxton
Date:
Phil Daintree wrote:
>
> There are 2 tables used in the sql we need to optimise .....
>
> CREATE TABLE chartdetails (
> CREATE TABLE gltrans (

> So there is a chartdetail record for every period for every general ledger
> account.  So if there are 5 years x 12 periods (months) and 200 general
> ledger accounts this table will be 12,000 records.
>
> There is a gltrans record for every side of a journal entry. This can get to
> be quite a significant table - easily more than 200,000 per annum - depending
> on the size of the business obviously.
>
>
> Can anyone tell me why does the following code chokes ... literally - this
> works almost invisbly under mysql - pg takes more than an hour even on a very
> small 30 record database.

There's nothing obvious in the code below. You could probably do it all
with one (more complex) sql query (maybe two), but it shouldn't be too
bad as it stands.

The code doesn't seem to use gltrans at all though (unless I'm missing
something). Is there something missing?

Anyway, the first step will be to get some accurate figures. Can you
turn statement logging on in postgresql.conf and also timestamps (unless
it's logging to syslogd, in which case you'll get them automatically).
That way we can identify exactly where the time is being spent.

--
   Richard Huxton
   Archonet Ltd

Re: Query performance problem

From
Paul Tillotson
Date:
See the syntax for INSERT ... SELECT shown here:
http://www.postgresql.org/docs/8.0/static/sql-insert.html

Instead of doing a nested loop to INSERT new records, do it like this:

For ($period = start; $period < end; $period++)
{
    INSERT INTO chartdetails (accountcode, period)
          SELECT accountcode, $period FROM chartdetails WHERE
(accountcode, $period) NOT IN (
              SELECT accountcode, period FROM chardetails WHERE period =
$period
          );
}

Or if you have some table that has 1 row for each period (call it
"periods") then you could simply do:

INSERT INTO chartdetails (accountcode, period)
     SELECT accountcode, period FROM accountcode, period
         WHERE (period BETWEEN $start AND $end) AND (accountcode,
period) NOT IN (
             SELECT accountcode, period FROM chartdetails WHERE period
BETWEEN $start AND $end
          );

Note to others: see the legitimate use of an unconstrained CROSS JOIN?

----------------------------------------------------------------------

Postgres's SELECT count(*) is slow if many records meet the WHERE clause
being used.  It looks like you're only using testing for 0 or >0 in your
query, so you could use:

SELECT EXISTS (SELECT 1 FROM chartdetails WHERE <where clause goes here>);

This will be much faster since with EXISTS, postgres only runs the query
long enough to find out whether even one row would be returned--if so,
it stops.

Regards,
Paul Tillotson


Phil Daintree wrote:

>Dear psqlers,
>
>I need your help!
>
>I administer/develop an open source PHP accounting software project (webERP) [snip....]
>
>
>    $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
>
>    While ($AccountRow = DB_fetch_array($ChartAccounts)){
>
>        for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
>
>            echo '<LI>' . _('Period Number') . ' ' . $PeriodNo . '</LI>';
>
>            // Check if there is an chart details record set up
>            $sql = 'SELECT count(*) FROM chartdetails
>                    WHERE accountcode='.$AccountRow['accountcode'].'
>                    AND period=' . $PeriodNo;
>            $InsChartDetails = DB_query($sql,$db,'','','',false);
>            $CountRows = DB_fetch_row($InsChartDetails);
>            $AccountExistsAlready = $CountRows[0];
>            DB_free_result($InsChartDetails);
>            if(! $AccountExistsAlready) {
>                $sql = 'INSERT INTO chartdetails (accountcode,
>                            period)
>                    VALUES (' . $AccountRow['accountcode'] . ',
>                        ' . $PeriodNo . ')';
>                $InsChartDetails = DB_query($sql,$db);
>                DB_free_result($InsChartDetails);
>            }
>
>        }
>
>    /*Now run through each of the new chartdetail records created for each
>account and update them with the B/Fwd and B/Fwd budget no updates would be
>required where there were previously no chart details set up ie
>FirstPeriodPostedTo > 0 */
>
>        for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
>
>            $sql = 'SELECT accountcode,
>                    period,
>                    actual + bfwd AS cfwd,
>                    budget + bfwdbudget AS cfwdbudget
>                FROM chartdetails WHERE period =' . ($PeriodNo  - 1);
>            $ChartDetailsCFwd = DB_query($sql,$db);
>
>            while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
>
>                $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ',
>                                bfwdbudget =' . $myrow['cfwdbudget'] . '
>                        WHERE accountcode = ' . $myrow['accountcode'] . '
>                        AND period >=' . $PeriodNo;
>                $UpdChartDetails = DB_query($sql,$db, '', '', '', false);
>                DB_free_result($UpdChartDetails);
>            }
>            DB_free_result($ChartDetailsCFwd);
>        }
>
>    }
>
>
>


Re: Query performance problem

From
Paul Tillotson
Date:
Note: If you want to know WHY this takes so long, please tell us how
many times each loop executes and how long each query takes.

Be sure to post an EXPLAIN ANALYZE for each of your queries that you are
running.  This will show what plans the planner is using and how long
they are actually taking.

----------------------

Advice:

Are you running this inside a transaction?  Do so, because if you don't,
then each UPDATE or INSERT or SELECT runs inside its own transaction,
and committing each transaction has overhead associated with it.

This block of code is INSIDE a while loop that loops once for each row
in chartmaster:

        for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {

            $sql = 'SELECT accountcode,
                    period,
                    actual + bfwd AS cfwd,
                    budget + bfwdbudget AS cfwdbudget
                FROM chartdetails WHERE period =' . ($PeriodNo  - 1);
            $ChartDetailsCFwd = DB_query($sql,$db);

            while ($myrow = DB_fetch_array($ChartDetailsCFwd)){

                $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ',
                                bfwdbudget =' . $myrow['cfwdbudget'] . '
                        WHERE accountcode = ' . $myrow['accountcode'] . '
                        AND period >=' . $PeriodNo;
                $UpdChartDetails = DB_query($sql,$db, '', '', '', false);
                DB_free_result($UpdChartDetails);
            }
            DB_free_result($ChartDetailsCFwd);
        }


It looks like you're updating the same row in chartmaster multiple
times.  In postgres, because of MVCC, this will cause unnecessary table
bloat, as EVERY TIME you update the same row, it will leave behind
another dead tuple that must be reclaimed by VACUUM.  (In other words,
if you update every row row in chartdetails 10 times without VACUUMing,
then this will cause the table to grow 10x.)

As I am still unsure what this code is supposed to do, please tell me:
how many times is each row in chartdetail getting updated?  If it is
getting updated more than once, could you find a way to update it only once?

First, why is this UPDATE statement inside the while loop at all?  It
doesn't look like it references any members of $AccountRow, which is the
variable that governs the WHILE loop that it is within.  Can you take it
out of the while loop and still get the same results?

Second, could you write that update statement to say WHERE period =
$periodno rather than WHERE period >= $period?  If not, why not?

Regards,
Paul Tillotson

Phil Daintree wrote:

>Dear psqlers,
>
>I need your help!
>
>I administer/develop an open source PHP accounting software project (webERP)
>that was originally mysql only. Since Christmas I and another member of the
>team lower cased all the sql and changed some elements of the SQL to allow it
>to use postgres as well. All appears to work beautifully with just a single
>but important exception.....
>
>
>
[snip]

>Can anyone tell me why does the following code chokes ... literally - this
>works almost invisbly under mysql - pg takes more than an hour even on a very
>small 30 record database.
>
>(The table chartmaster is just a list of general ledger accounts accountcode
>and accountdescription. PK = accountcode)
>
>
>    $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
>
>    While ($AccountRow = DB_fetch_array($ChartAccounts)){
>
>        for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
>
>            echo '<LI>' . _('Period Number') . ' ' . $PeriodNo . '</LI>';
>
>            // Check if there is an chart details record set up
>            $sql = 'SELECT count(*) FROM chartdetails
>                    WHERE accountcode='.$AccountRow['accountcode'].'
>                    AND period=' . $PeriodNo;
>            $InsChartDetails = DB_query($sql,$db,'','','',false);
>            $CountRows = DB_fetch_row($InsChartDetails);
>            $AccountExistsAlready = $CountRows[0];
>            DB_free_result($InsChartDetails);
>            if(! $AccountExistsAlready) {
>                $sql = 'INSERT INTO chartdetails (accountcode,
>                            period)
>                    VALUES (' . $AccountRow['accountcode'] . ',
>                        ' . $PeriodNo . ')';
>                $InsChartDetails = DB_query($sql,$db);
>                DB_free_result($InsChartDetails);
>            }
>
>        }
>
>    /*Now run through each of the new chartdetail records created for each
>account and update them with the B/Fwd and B/Fwd budget no updates would be
>required where there were previously no chart details set up ie
>FirstPeriodPostedTo > 0 */
>
>        for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
>
>            $sql = 'SELECT accountcode,
>                    period,
>                    actual + bfwd AS cfwd,
>                    budget + bfwdbudget AS cfwdbudget
>                FROM chartdetails WHERE period =' . ($PeriodNo  - 1);
>            $ChartDetailsCFwd = DB_query($sql,$db);
>
>            while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
>
>                $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ',
>                                bfwdbudget =' . $myrow['cfwdbudget'] . '
>                        WHERE accountcode = ' . $myrow['accountcode'] . '
>                        AND period >=' . $PeriodNo;
>                $UpdChartDetails = DB_query($sql,$db, '', '', '', false);
>                DB_free_result($UpdChartDetails);
>            }
>            DB_free_result($ChartDetailsCFwd);
>        }
>
>    }
>
>
>
>
[snip]

>I am hoping that someone will be able to see an alternative simpler method or
>suggest a method of indexing the pg tables to optmise the required queries. I
>would appreciate any help here men.
>
>Many thanks in advance ....
>
>
>


Re: Query performance problem

From
Paul Tillotson
Date:
Phil Daintree wrote:

>Appreciated you help Paul - many thanks for taking the time.
>
>
>
I view this as merely passing on all the pearls of wisdom I have gleaned
from this list.  : )

>>Advice:
>>
>>Are you running this inside a transaction?  Do so, because if you don't,
>>then each UPDATE or INSERT or SELECT runs inside its own transaction,
>>and committing each transaction has overhead associated with it.
>>
>>
>
>
>It looks like just putting this code inside a transaction has dramatically
>reduced the problem. Of course I knew this but it needed me to be embarassed
>to actually do it :-)
>
>
>
Glad to hear that this helped.  In case you are interested, the reason
this makes such a dramatic difference is that each transaction's commit
record must be logged to the commit log, which is a sequentially written
file.  (Thus, you can only commit one transaction per revolution of the
disk, and so if you have a 7200 rpm disk, you can't get more than 120
transactions / second on a safely configured system unless your drive
has a battery-backed write cache.)

>>This block of code is INSIDE a while loop that loops once for each row
>>in chartmaster:
>>
>>        for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
>>
>>            $sql = 'SELECT accountcode,
>>                    period,
>>                    actual + bfwd AS cfwd,
>>                    budget + bfwdbudget AS cfwdbudget
>>                FROM chartdetails WHERE period =' . ($PeriodNo  - 1);
>>            $ChartDetailsCFwd = DB_query($sql,$db);
>>
>>            while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
>>
>>                $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ',
>>                                bfwdbudget =' . $myrow['cfwdbudget'] . '
>>                        WHERE accountcode = ' . $myrow['accountcode'] . '
>>                        AND period >=' . $PeriodNo;
>>                $UpdChartDetails = DB_query($sql,$db, '', '', '', false);
>>                DB_free_result($UpdChartDetails);
>>            }
>>            DB_free_result($ChartDetailsCFwd);
>>        }
>>
>>
>>It looks like you're updating the same row in chartmaster multiple
>>times.
>>
>>
>
>chartmaster is not being updated ...
>
Sorry--I mean chartdetails.

One tip to remember:  If you have a statement like this:
UPDATE footable SET foocol = 'value' WHERE fooid = 'id';

And it is the case that foocol might already equal value, then write
this instead:
UPDATE footable SET foocol = 'value' WHERE fooid = 'id' AND foocol <>
'value';

This will help because no disk write will actually happen if foocol
happens to already equal value.

>chartdetails is - the chartdetails
>relating to each chartmaster record for all periods >= $PeriodNo I have to
>update all the following periods as the balance b/fwd for all successive
>periods has now increased if we post transaction back in time normally there
>might only be a few chartdetails records for the chartmaster account under
>review with chartdetails records with a period later than the one being
>posted.
>
>
>
Am I correct in thinking that bfwd is basically a running tally of
actual, and bfwdbudget is a running tally of budget, as one might
normally find in a spreadsheet?

If so, you could use this view to calculate the correct value for every
location in the table:

CREATE VIEW newtotals AS SELECT
    thismonth.accountcode,
    thismonth.periodno,
    (SELECT SUM(actual) FROM chartdetails AS q1 WHERE q1.accountcode =
accountcode AND q1.periodno < thismonth.periodno) as cfwd,
    (SELECT SUM(budget) FROM chartdetails AS q1 WHERE q1.accountcode =
accountcode AND q1.periodno < thismonth.periodno) as cfwdbudget,
FROM chartdetails AS thismonth;

And then you could use an update statement:

UPDATE chartdetails
SET bfwd = cfwd, bfwdbudget = cfwdbudget
FROM newtotals
WHERE chartdetails.accountcode = newtotals.accountcode AND
chartdetails.periodno = newtotals.periodno -- JOIN condition
    AND period BETWEEN $CreateTo AND $CreateFrom
    AND (bfwd <> cfwd OR bfwdbudget <> cfwdbudget); -- AVOID needless
updates.

Since I don't have your tables to work with, this might need minor
syntax tweaking, but I'm pretty sure it will work.
I think MySQL doesn't support views yet, but you could replace
"newtotals" in the above update with a big subselect (which I think they
are supposed to support in the latest 4.x version.)

Also: if you don't already have one, that UPDATE statement would
probably use a compound index on (accountcode, periodno).

Now I begin to comprehend why CompiereERP doesn't support MySQL.  ; )

>a row in chartdetails will be updated every time there is a gltrans posting to
>the period and account of the chartdetails ie quite often.
>
>
If it gets updated often it will need vacuuming often as well.

My rule of thumb is that if more than 10% of the data in a table is
getting updated, vacuum immediately before and immediately after the the
code that does all this updating.

Regards,

Paul Tillotson

Re: Query performance problem

From
Phil Daintree
Date:
First time I ran it it took 5127.243 ms ...... then I did a full vacuum.

then ...

SQL executed.

Total runtime: 33.707 ms


I am keen to just have the one lot of code all in the scripts ... so  I was
pleased when the identical sql also worked on mysql!!!

Your SQL-query has been executed successfully (Query took 0.0350 sec)

SQL-query : [Edit] [Explain SQL] [Create PHP Code]

SELECT chartmaster.accountcode, periods.periodno
FROM chartmaster, periods
WHERE (
periods.periodno
BETWEEN 1 AND 12
) AND (
chartmaster.accountcode, periods.periodno
) NOT IN
(SELECT accountcode, period
FROM chartdetails
WHERE period
BETWEEN 1 AND 12
) LIMIT 0 , 30

You'll notice the discrepancy on the timings though!

Whilst pg is not performing the way mysql does with innodb - it is at least
usable this way. I am guessing there is some gremlin with my install - I'll
try an upgrade to v 8.

Phil


On Fri, 18 Mar 2005 14:07, you wrote:
> See the syntax for INSERT ... SELECT shown here:
> http://www.postgresql.org/docs/8.0/static/sql-insert.html
>
> Instead of doing a nested loop to INSERT new records, do it like this:
>
> For ($period = start; $period < end; $period++)
> {
>     INSERT INTO chartdetails (accountcode, period)
>           SELECT accountcode, $period FROM chartdetails WHERE
> (accountcode, $period) NOT IN (
>               SELECT accountcode, period FROM chardetails WHERE period =
> $period
>           );
> }
>
> Or if you have some table that has 1 row for each period (call it
> "periods") then you could simply do:
>
> INSERT INTO chartdetails (accountcode, period)
>      SELECT accountcode, period FROM accountcode, period
>          WHERE (period BETWEEN $start AND $end) AND (accountcode,
> period) NOT IN (
>              SELECT accountcode, period FROM chartdetails WHERE period
> BETWEEN $start AND $end
>           );
>
> Note to others: see the legitimate use of an unconstrained CROSS JOIN?
>
> ----------------------------------------------------------------------
>
> Postgres's SELECT count(*) is slow if many records meet the WHERE clause
> being used.  It looks like you're only using testing for 0 or >0 in your
> query, so you could use:
>
> SELECT EXISTS (SELECT 1 FROM chartdetails WHERE <where clause goes here>);
>
> This will be much faster since with EXISTS, postgres only runs the query
> long enough to find out whether even one row would be returned--if so,
> it stops.
>
> Regards,
> Paul Tillotson
>
> Phil Daintree wrote:
> >Dear psqlers,
> >
> >I need your help!
> >
> >I administer/develop an open source PHP accounting software project
> > (webERP) [snip....]
> >
> >
> >    $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
> >
> >    While ($AccountRow = DB_fetch_array($ChartAccounts)){
> >
> >        for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
> >
> >            echo '<LI>' . _('Period Number') . ' ' . $PeriodNo . '</LI>';
> >
> >            // Check if there is an chart details record set up
> >            $sql = 'SELECT count(*) FROM chartdetails
> >                    WHERE accountcode='.$AccountRow['accountcode'].'
> >                    AND period=' . $PeriodNo;
> >            $InsChartDetails = DB_query($sql,$db,'','','',false);
> >            $CountRows = DB_fetch_row($InsChartDetails);
> >            $AccountExistsAlready = $CountRows[0];
> >            DB_free_result($InsChartDetails);
> >            if(! $AccountExistsAlready) {
> >                $sql = 'INSERT INTO chartdetails (accountcode,
> >                            period)
> >                    VALUES (' . $AccountRow['accountcode'] . ',
> >                        ' . $PeriodNo . ')';
> >                $InsChartDetails = DB_query($sql,$db);
> >                DB_free_result($InsChartDetails);
> >            }
> >
> >        }
> >
> >    /*Now run through each of the new chartdetail records created for each
> >account and update them with the B/Fwd and B/Fwd budget no updates would
> > be required where there were previously no chart details set up ie
> >FirstPeriodPostedTo > 0 */
> >
> >        for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
> >
> >            $sql = 'SELECT accountcode,
> >                    period,
> >                    actual + bfwd AS cfwd,
> >                    budget + bfwdbudget AS cfwdbudget
> >                FROM chartdetails WHERE period =' . ($PeriodNo  - 1);
> >            $ChartDetailsCFwd = DB_query($sql,$db);
> >
> >            while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
> >
> >                $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ',
> >                                bfwdbudget =' . $myrow['cfwdbudget'] . '
> >                        WHERE accountcode = ' . $myrow['accountcode'] . '
> >                        AND period >=' . $PeriodNo;
> >                $UpdChartDetails = DB_query($sql,$db, '', '', '', false);
> >                DB_free_result($UpdChartDetails);
> >            }
> >            DB_free_result($ChartDetailsCFwd);
> >        }
> >
> >    }

--
Phil Daintree
webERP Project Admin

Re: Query performance problem

From
Phil Daintree
Date:
I can also do the same thing without sub-queries - I messed about some more
since I was keen to ensure backward compatibility with prior versions of
mysql that have left/right joins but no subqueries ... quite a bit quicker
still!

Query took 0.0037 sec - 1/10th of the sub-query time.

SELECT chartmaster.accountcode, periods.periodno
FROM chartmaster INNER JOIN periods ON True
LEFT JOIN chartdetails ON chartmaster.accountcode = chartdetails.accountcode
AND periods.periodno = chartdetails.period
WHERE periods.periodno >=1 AND periods.periodno <=63 AND
chartdetails.accountcode IS NULL  LIMIT 0 , 30


In postgres:

SQL executed.

Total runtime: 12.241 ms

Still this is a third of the time of the sub-query route but 4 times longer
than mysql - this must be an install issue?


Thanks again for this idea Paul

phil


On Fri, 18 Mar 2005 14:07, you wrote:
> See the syntax for INSERT ... SELECT shown here:
> http://www.postgresql.org/docs/8.0/static/sql-insert.html
>
> Instead of doing a nested loop to INSERT new records, do it like this:
>
> For ($period = start; $period < end; $period++)
> {
>     INSERT INTO chartdetails (accountcode, period)
>           SELECT accountcode, $period FROM chartdetails WHERE
> (accountcode, $period) NOT IN (
>               SELECT accountcode, period FROM chardetails WHERE period =
> $period
>           );
> }
>
> Or if you have some table that has 1 row for each period (call it
> "periods") then you could simply do:
>
> INSERT INTO chartdetails (accountcode, period)
>      SELECT accountcode, period FROM accountcode, period
>          WHERE (period BETWEEN $start AND $end) AND (accountcode,
> period) NOT IN (
>              SELECT accountcode, period FROM chartdetails WHERE period
> BETWEEN $start AND $end
>           );
>
> Note to others: see the legitimate use of an unconstrained CROSS JOIN?
>
> ----------------------------------------------------------------------
>
> Postgres's SELECT count(*) is slow if many records meet the WHERE clause
> being used.  It looks like you're only using testing for 0 or >0 in your
> query, so you could use:
>
> SELECT EXISTS (SELECT 1 FROM chartdetails WHERE <where clause goes here>);
>
> This will be much faster since with EXISTS, postgres only runs the query
> long enough to find out whether even one row would be returned--if so,
> it stops.
>
> Regards,
> Paul Tillotson
>
> Phil Daintree wrote:
> >Dear psqlers,
> >
> >I need your help!
> >
> >I administer/develop an open source PHP accounting software project
> > (webERP) [snip....]
> >
> >
> >    $ChartAccounts = DB_query('SELECT accountcode FROM chartmaster',$db);
> >
> >    While ($AccountRow = DB_fetch_array($ChartAccounts)){
> >
> >        for ($PeriodNo=$CreateFrom;$PeriodNo <= $CreateTo;$PeriodNo++) {
> >
> >            echo '<LI>' . _('Period Number') . ' ' . $PeriodNo . '</LI>';
> >
> >            // Check if there is an chart details record set up
> >            $sql = 'SELECT count(*) FROM chartdetails
> >                    WHERE accountcode='.$AccountRow['accountcode'].'
> >                    AND period=' . $PeriodNo;
> >            $InsChartDetails = DB_query($sql,$db,'','','',false);
> >            $CountRows = DB_fetch_row($InsChartDetails);
> >            $AccountExistsAlready = $CountRows[0];
> >            DB_free_result($InsChartDetails);
> >            if(! $AccountExistsAlready) {
> >                $sql = 'INSERT INTO chartdetails (accountcode,
> >                            period)
> >                    VALUES (' . $AccountRow['accountcode'] . ',
> >                        ' . $PeriodNo . ')';
> >                $InsChartDetails = DB_query($sql,$db);
> >                DB_free_result($InsChartDetails);
> >            }
> >
> >        }
> >
> >    /*Now run through each of the new chartdetail records created for each
> >account and update them with the B/Fwd and B/Fwd budget no updates would
> > be required where there were previously no chart details set up ie
> >FirstPeriodPostedTo > 0 */
> >
> >        for ($PeriodNo=$CreateFrom;$PeriodNo<=$CreateTo; $PeriodNo++) {
> >
> >            $sql = 'SELECT accountcode,
> >                    period,
> >                    actual + bfwd AS cfwd,
> >                    budget + bfwdbudget AS cfwdbudget
> >                FROM chartdetails WHERE period =' . ($PeriodNo  - 1);
> >            $ChartDetailsCFwd = DB_query($sql,$db);
> >
> >            while ($myrow = DB_fetch_array($ChartDetailsCFwd)){
> >
> >                $sql = 'UPDATE chartdetails SET bfwd =' . $myrow['cfwd'] . ',
> >                                bfwdbudget =' . $myrow['cfwdbudget'] . '
> >                        WHERE accountcode = ' . $myrow['accountcode'] . '
> >                        AND period >=' . $PeriodNo;
> >                $UpdChartDetails = DB_query($sql,$db, '', '', '', false);
> >                DB_free_result($UpdChartDetails);
> >            }
> >            DB_free_result($ChartDetailsCFwd);
> >        }
> >
> >    }

--
Phil Daintree
webERP Project Admin

Re: Query performance problem

From
Paul Tillotson
Date:
Phil,

Just about any query will usually take a few milliseconds (try SELECT 1;
to see the absolute lowest), and so 12 ms is probably about as good as
you can  get.  For my own part, I consider 50 ms good enough for any
query that is not run inside of a loop.  If you want to write suitably
efficient code/SQL for this, I suggest filling your tables with more
data (say, 10 times as much as you have now) and then see how the
timings work.

Are you already working with what you would consider a "typical" data
size?  Or is it smaller than what someone would typically have?

If you post any more timings on this list, please post the EXPLAIN
ANALYZE as well.  This allows us to see what plan the planner picked,
how much time each step took, and how many rows were actually affected.
To get the EXPLAIN ANALYZE, just type EXPLAIN ANALYZE <your query goes
here> and copy the output.

Regards,

Paul Tillotson

Phil Daintree wrote:

>I can also do the same thing without sub-queries - I messed about some more
>since I was keen to ensure backward compatibility with prior versions of
>mysql that have left/right joins but no subqueries ... quite a bit quicker
>still!
>
>Query took 0.0037 sec - 1/10th of the sub-query time.
>
>SELECT chartmaster.accountcode, periods.periodno
>FROM chartmaster INNER JOIN periods ON True
>LEFT JOIN chartdetails ON chartmaster.accountcode = chartdetails.accountcode
>AND periods.periodno = chartdetails.period
>WHERE periods.periodno >=1 AND periods.periodno <=63 AND
>chartdetails.accountcode IS NULL  LIMIT 0 , 30
>
>
>In postgres:
>
>SQL executed.
>
>Total runtime: 12.241 ms
>
>Still this is a third of the time of the sub-query route but 4 times longer
>than mysql - this must be an install issue?
>
>
>Thanks again for this idea Paul
>
>phil
>
>
>

Re: Query performance problem

From
Greg Stark
Date:
Paul Tillotson <pntil@shentel.net> writes:

> >Total runtime: 12.241 ms
> >
> > Still this is a third of the time of the sub-query route but 4 times longer
> > than mysql - this must be an install issue?
>
> Just about any query will usually take a few milliseconds (try SELECT 1; to see
> the absolute lowest), and so 12 ms is probably about as good as you can  get.
> For my own part, I consider 50 ms good enough for any query that is not run
> inside of a loop.

Consider that typical drive seek times are on the order of 10ms. So if you're
getting anything better than that from MySQL from *any* query it's purely
because all the data is cached in RAM. If you can afford to keep your entire
data set cached in RAM and are worried about guaranteeing response like 1-3ms
then perhaps you should consider whether a database is the right solution for
you. Perhaps something simpler like libdb or memcached would be more
appropriate.

--
greg