Thread: bad select performance for where (x=1 or x=3)

bad select performance for where (x=1 or x=3)

From

George Young

Date:

20 July 1999, 10:03:12

[PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66]

I have a 'select' including 'where (x=1 or x=3)' that takes 16 times as long
as with just 'where x=1'.  Here's a (somewhat simplified) example:

table opset_steps
      (name text, id int2, ver int2) [1400 rows]
      non-unique index is on (id, ver)

table run_opsets
      (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows]
      pkey is (id, seq), second index on(status, id, ver, run_id)
      select count(*) from run_opsets where status=1; --> 187
      select count(*) from run_opsets where status=3; --> 10564

table runs
      (run_name text, run_id int2, status int2) [900 rows]
      pkey is run_name, second index(run_id, status)

I have vacuum analyzed all relevant tables.

This query takes 16 seconds (without the explain of course):
  explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where (ro.status=1 or
ro.status=3)and ro.opset_id=os.opset_id and ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; 

Hash Join  (cost=1793.58 rows=14560 width=38)
  ->  Hash Join  (cost=1266.98 rows=14086 width=24)
        ->  Seq Scan on run_opsets ro  (cost=685.51 rows=13903 width=8)
        ->  Hash  (cost=70.84 rows=1389 width=16)
              ->  Seq Scan on opset_steps os  (cost=70.84 rows=1389 width=16)
  ->  Hash  (cost=47.43 rows=374 width=14)
        ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)


This query takes just under one second:[diff is status=1 instead of (1 or 3)]
  explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where ro.status=1 and
ro.opset_id=os.opset_idand ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; 

Hash Join  (cost=1359.57 rows=7719 width=38)
  ->  Hash Join  (cost=1051.39 rows=7467 width=24)
        ->  Seq Scan on run_opsets ro  (cost=685.51 rows=7370 width=8)
        ->  Hash  (cost=70.84 rows=1389 width=16)
              ->  Seq Scan on opset_steps os  (cost=70.84 rows=1389 width=16)
  ->  Hash  (cost=47.43 rows=374 width=14)
        ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)


Why should it take over 16 times as long for (status=1 or 3) as for status=1?
I have indexes on exactly all the fields used in the where clause.
I tried "status in (1,3)" with no improvement.


George Young,  Rm. L-204        gry@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St.
Lexington, Massachusetts  02420-9108    (781) 981-2756

Re: [SQL] bad select performance for where (x=1 or x=3)

From

Tom Lane

Date:

20 July 1999, 10:54:56

George Young <gry@ll.mit.edu> writes:
> table run_opsets
>       (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows]
>       pkey is (id, seq), second index on(status, id, ver, run_id)

>       select count(*) from run_opsets where status=1; --> 187
>       select count(*) from run_opsets where status=3; --> 10564

> Why should it take over 16 times as long for (status=1 or 3) as for status=1?

Offhand it looks like the former would produce 57 times as many possible
rows from the run_opsets table as the latter (187+10564 vs 187), which
the system would then have to try to match against the other tables.
You didn't say how many tuples actually get returned, but certainly the
number of iterations through each of the join loops is likely to be much
higher.  I'm surprised the cost differential isn't more than 16:1.

A more interesting question might be "why doesn't the system's cost
estimator realize that the second case will be much cheaper?"  The
answer to that is that VACUUM ANALYZE doesn't keep sufficiently detailed
statistics to let it realize that there are far more x=3 than x=1 rows.

            regards, tom lane

Bad update performance?

From

"Gunnar Ingvi Thorisson"

Date:

20 July 1999, 20:04:02

Hi there,

I´ve a table with about 142000 rows like shown below and I want to set field
"divis" to "unknown" by executing following update command:
update ipacct set divis = 'unknown';

However this seems to take hours, I´ve a PII 350MHz with 192Mb memory and
Quantum UIDE disk, is there a way for me to speed up the update process
except  for switching to a SCSI hard drive? Does indexing the field "divis"
speed up the update performance?

The load on the machine is about 1,5 to 2 all the time and it´s hacking on
the hard drive like hell.
I´ve the same problem when I do "UPDATE ipacct SET traf = (src+dst)", it
takes hours. Is this normal? Is there another way to do this?

However reading from the database doing select is very fast.

Many thanks in advance for any hint,
best regards,
Gunnar Ingvi Thorisson

CCIE 4826
Iceland
gunni@if.is



venus:~$ psql ipacct
Welcome to the POSTGRESQL interactive sql monitor:
  Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.0 on i686-pc-linux-gnulibc1, compiled by gcc 2.7.2.3]

[cut]

ipacct=> select * from ipacct where id = 1;
id| datetime|pcol|saddr            |sport|daddr       |dport|pcnt| size|if
|login  |mac         |src|dst|traf|depart|divis
--+---------+----+-----------------+-----+------------+-----+----+-----+----
+-------+------------+---+---+----+------+-----
 1|928368167|   6|209.85.127.151/32|   80|10.10.2.3/32| 2448|
57|74856|eth0|unknown|00805FC1525C|100|  3| 103|      |
(1 row)

ipacct=> update ipacct set divis = 'unknown';

The table:
   ID               INT4  UNIQUE,
   DATETIME         TEXT,
   PCOL             INTEGER,
   SADDR            CIDR,
   SPORT            INTEGER,
   DADDR            CIDR,
   DPORT            INTEGER,
   PCNT             INT4
   SIZE             INT4;
   IF               TEXT;
   LOGIN            TEXT;
   MAC              TEXT;
   SRC              INTEGER;
   DST              INTEGER;
   TRAF             INTEGER;
   DEPART           TEXT;
   DIVIS            TEXT;

Re: [SQL] Bad update performance?

From

"tjk@tksoft.com"

Date:

20 July 1999, 22:29:04

Gunnar,
You would be better off separating the field to its
own table.

The table is stored in one chunk of a file, and when
postgres modifies it, it takes a long time because
it has to do a heck of a lot of seeking and modifying
in the file. This ends up being very time consuming.

When you use a separate table for the field you are
isolating the field to its own file, which means
that when it's modified only a much smaller file
needs to be worked on.

It might be a nice idea to split large tables into
smaller pieces within postgres, in order to speed
up updates. (This is a really a question to
somebody more knowleadgeable about the backend
as to whether this would make sense.)

Because you are indexing the field, searches are fast,
as postgres knows where to look inside the file (using seek)
using the index as a guide.



Troy

Troy Korjuslommi                Tksoft OY, Inc.
tjk@tksoft.com                  Software Development
                                Open Source Solutions
                                Hosting Services







>
> Hi there,
>
> I´ve a table with about 142000 rows like shown below and I want to set field
> "divis" to "unknown" by executing following update command:
> update ipacct set divis = 'unknown';
>
> However this seems to take hours, I´ve a PII 350MHz with 192Mb memory and
> Quantum UIDE disk, is there a way for me to speed up the update process
> except  for switching to a SCSI hard drive? Does indexing the field "divis"
> speed up the update performance?
>
> The load on the machine is about 1,5 to 2 all the time and it´s hacking on
> the hard drive like hell.
> I´ve the same problem when I do "UPDATE ipacct SET traf = (src+dst)", it
> takes hours. Is this normal? Is there another way to do this?
>
> However reading from the database doing select is very fast.
>
> Many thanks in advance for any hint,
> best regards,
> Gunnar Ingvi Thorisson
>
> CCIE 4826
> Iceland
> gunni@if.is
>
>
>
> venus:~$ psql ipacct
> Welcome to the POSTGRESQL interactive sql monitor:
>   Please read the file COPYRIGHT for copyright terms of POSTGRESQL
> [PostgreSQL 6.5.0 on i686-pc-linux-gnulibc1, compiled by gcc 2.7.2.3]
>
> [cut]
>
> ipacct=> select * from ipacct where id = 1;
> id| datetime|pcol|saddr            |sport|daddr       |dport|pcnt| size|if
> |login  |mac         |src|dst|traf|depart|divis
> --+---------+----+-----------------+-----+------------+-----+----+-----+----
> +-------+------------+---+---+----+------+-----
>  1|928368167|   6|209.85.127.151/32|   80|10.10.2.3/32| 2448|
> 57|74856|eth0|unknown|00805FC1525C|100|  3| 103|      |
> (1 row)
>
> ipacct=> update ipacct set divis = 'unknown';
>
> The table:
>    ID               INT4  UNIQUE,
>    DATETIME         TEXT,
>    PCOL             INTEGER,
>    SADDR            CIDR,
>    SPORT            INTEGER,
>    DADDR            CIDR,
>    DPORT            INTEGER,
>    PCNT             INT4
>    SIZE             INT4;
>    IF               TEXT;
>    LOGIN            TEXT;
>    MAC              TEXT;
>    SRC              INTEGER;
>    DST              INTEGER;
>    TRAF             INTEGER;
>    DEPART           TEXT;
>    DIVIS            TEXT;
>
>
>
>
>

Re: [SQL] Bad update performance?

From

Herouth Maoz

Date:

21 July 1999, 05:52:09

At 05:28 +0300 on 21/07/1999, tjk@tksoft.com wrote:

> The table is stored in one chunk of a file, and when
> postgres modifies it, it takes a long time because
> it has to do a heck of a lot of seeking and modifying
> in the file. This ends up being very time consuming.
>
> When you use a separate table for the field you are
> isolating the field to its own file, which means
> that when it's modified only a much smaller file
> needs to be worked on.

This will give a penalty in the selects, however. When you split a table
into two, you have to have a relation between the field you separated and
the original table, and from then on, you can see them together only using
a join. A join takes more time than a search on one table.

I think that update is slow, because it has to read each record, and write
it in its new form in the end of the file. It does sequential scanning plus
jump to the end of the file and writing. That's essentially what you said,
too. In addition, it has to re-index each new entry.

But in my opinion, the chance to make this faster is to select the new
values into a new table,  drop the old table, rename the new table, and add
indices after that. If your table is called A:

CREATE TABLE B ...; -- Same schema and constraints as A
INSERT INTO B SELECT f1, f2, 'unknown', f4... FROM A; -- Note new value
DROP TABLE A;
ALTER TABLE B RENAME TO A;
CREATE INDEX .....; -- Recreate the indices

When defining the table, don't use SERIAL and PRIMARY KEY, as these create
sequences and indices. You would do better to control these yourself, so as
not to break the sequences used, and not to create an index before all
values are inserted.

One last comment: indices always accelerate retrieval, at the expense of
slowing inserts and updates.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

Re: [SQL] Bad update performance?

From

Tom Lane

Date:

21 July 1999, 09:40:14

"Gunnar Ingvi Thorisson" <gunni@if.is> writes:
> I�ve a table with about 142000 rows like shown below and I want to
> set field "divis" to "unknown" by executing following update command:
> update ipacct set divis = 'unknown';
> However this seems to take hours,

Well, updating 142000 rows is going to take a little while...

Are you starting the postmaster with -o -F ?  That makes for a pretty
considerable speedup in most cases (at the cost of trouble if you
have a system crash during an update).

> Does indexing the field "divis" speed up the update performance?

No, it would not help a query like that --- though if you added a clause
like "where divis = 'oldvalue'" then an index would help to find the
rows that need updated.  Actually, every index you add *slows down*
updates, since all the indexes must be updated along with the table.

            regards, tom lane