Thread: Speed or configuration

Speed or configuration

From

"Franz J Fortuny"

Date:

20 August 2000, 11:59:55

At our company we are presently using a commercial
database that generates results from this query:

select xx1,xx2,xx3 from tableX
where field1 in
(select field1 from tableY where
field2=NNN and field3=NNN2 and field4=NNN4)

tableX has 790,000 rows, and an index on field1
tableY has abou 175,000 rows and an index that includes
field2,field3,field4 (and 2 other fields not being used
here)

Of course, the order in the indexes is the logical one.

I have made copies of the tables from the commercial SQL
server to PostgreSQL and PostgreSQL is consistently
faster in things like count(*) and certain other
queries.

But when it comes to the above mentioned query,
PostgreSQL simply stays there, forever. The postgres
backend must be killed in order to free the client
program.

Any suggestions?

Franz J Fortuny

Re: Speed or configuration

From

The Hermit Hacker

Date:

20 August 2000, 14:00:00

On Sun, 20 Aug 2000, Franz J Fortuny wrote:

> At our company we are presently using a commercial
> database that generates results from this query:
> 
> select xx1,xx2,xx3 from tableX
> where field1 in
> (select field1 from tableY where
> field2=NNN and field3=NNN2 and field4=NNN4)
> 
> tableX has 790,000 rows, and an index on field1
> tableY has abou 175,000 rows and an index that includes
> field2,field3,field4 (and 2 other fields not being used
> here)
> 
> Of course, the order in the indexes is the logical one.
> 
> I have made copies of the tables from the commercial SQL
> server to PostgreSQL and PostgreSQL is consistently
> faster in things like count(*) and certain other
> queries.
> 
> But when it comes to the above mentioned query,
> PostgreSQL simply stays there, forever. The postgres
> backend must be killed in order to free the client
> program.
> 
> Any suggestions?

First question(s) I have is what version of PostgreSQL?  what hardware is
the commercial SQL server running on?  PostgreSQL?  memory, cpu, hard
drives, etc?

I run something similar on a 7.0.2 machine with a table consisting of
~9million rows and another of ~100k, and results come back in <1min ... in
fact, the exact query I've tried looks something like:

SELECT url_id,intag FROM ndictWHERE word_id=-903076670  AND url_id IN ( SELECT distinct url_id                    FROM
ndict                  WHERE word_id=-70722333                     AND url_id IN ( SELECT distinct url_id
                       FROM ndict,url                                      WHERE ndict.word_id=572517542
                       AND url.rec_id=ndict.url_id                                        AND ((url.url || '') LIKE
'http://www.postgresql.org/%')) );
 


with subsequent run results looking like:

0.000u 0.014s 0:24.12 0.0%      768+1116k 0+0io 0pf+0w
%time psql -h db -o tmpfile udmsearch < query
0.006u 0.006s 0:08.56 0.0%      0+0k 1+0io 0pf+0w
%time psql -h db -o tmpfile udmsearch < query
0.005u 0.000s 0:09.10 0.0%      0+0k 0+0io 0pf+0w
%time psql -h db -o tmpfile udmsearch < query
0.000u 0.012s 0:11.07 0.0%      300+540k 0+0io 0pf+0w

Re: Speed or configuration

From

Tom Lane

Date:

20 August 2000, 14:20:20

"Franz J Fortuny" <ffortuny@ivsol.com> writes:
> [ this query is slow: ]
> select xx1,xx2,xx3 from tableX
> where field1 in
> (select field1 from tableY where
> field2=NNN and field3=NNN2 and field4=NNN4)

What version of Postgres are you using, and what does EXPLAIN show
as the query plan for this query?  How many tableY rows is the sub-
query likely to produce, and how many matches do you expect to get
from tableX?

The Postgres planner is not currently very smart about optimizing
sub-SELECTs.  We hope to do something about that by 7.2 or so,
but in the meantime you should look at transforming the query to
some other form.  You might find that EXISTS will help: select xx1,xx2,xx3 from tableX where exists (select 1 from
tableYwhere field1=tableX.field1 and field2=NNN and field3=NNN3 and field4=NNN4)

However this is unlikely to help much unless you change the index
structure for tableY.  Perhaps you could write it as a join: select xx1,xx2,xx3 from tableX, tableY where
tableX.field1=tableY.field1and field2=NNN   and field3=NNN3 and field4=NNN4

although this won't give quite the same results if there can be multiple
matching rows in tableY for a tableX row.  (DISTINCT might help if so.)
        regards, tom lane

Re: Speed or configuration

From

Stephan Szabo

Date:

20 August 2000, 14:25:25

(It won't really be forever, just probably
a really long time)

You can usually get around it by rewriting the
query to use EXISTS rather than IN.

Stephan Szabo
sszabo@bigpanda.com

On Sun, 20 Aug 2000, Franz J Fortuny wrote:

> At our company we are presently using a commercial
> database that generates results from this query:
> 
> select xx1,xx2,xx3 from tableX
> where field1 in
> (select field1 from tableY where
> field2=NNN and field3=NNN2 and field4=NNN4)
> 
> tableX has 790,000 rows, and an index on field1
> tableY has abou 175,000 rows and an index that includes
> field2,field3,field4 (and 2 other fields not being used
> here)
> 
> Of course, the order in the indexes is the logical one.
> 
> I have made copies of the tables from the commercial SQL
> server to PostgreSQL and PostgreSQL is consistently
> faster in things like count(*) and certain other
> queries.
> 
> But when it comes to the above mentioned query,
> PostgreSQL simply stays there, forever. The postgres
> backend must be killed in order to free the client
> program.

RE: Speed or configuration

From

"Franz J Fortuny"

Date:

20 August 2000, 18:04:31


"What version of Postgres are you using, and what does
EXPLAIN show
as the query plan for this query?  How many tableY rows
is the sub-
query likely to produce, and how many matches do you
expect to get
from tableX?"

Version: postgresql-7.0.2-2.i386.rpm
Explain: Scan table, scan table. (Plus the costs, etc.)

About 9,000 matches from tableX are expected, and a lot
less from
tableY (about 2000, at the most).

The index structure of tableY includes 5 fields. Most of
the time, we need only the first and second fields. Less
often, up to the third field and a lot less often the
other 2 left fields (in the "where" clauses).

However, field1 of tableX references field1 of tableY.
If I use a program to get the results, they come out
incredibly fast (faster in postgreSQL than with the
commercial program)

(A select to get a first record set from tableY, then
navigate the rows, executing a select for tableX for
each row...)

These the structures of the tables:

create table tableX
(col0 integer not null primary key,col1 integer not null,col2 integer not null,col3 char(20),col4 char(8),col5
char(8),unique(col1,col2,col3,col4,col5)
);

create table tableY
( col0 integer not null references tableX(col0), col1 integer, col2 integer, col3 float, col4 float
);

The slow query is:

select col1,col2,sum(col3),sum(col4)
from tableY
where col0 in
(select col0 from tableX where col1=:col1 and
col2=:col2)
group by col1,col2;

Explain plan shows ALWAYS scan table, scan table, scan
table.

tableY contains about 900,000 rows, and tableX about
65,000.

Any more light about the slowness?

Franz J Fortuny

RE: Speed or configuration

From

"Franz J Fortuny"

Date:

20 August 2000, 18:08:37

"First question(s) I have is what version of PostgreSQL?
what hardware is
the commercial SQL server running on?  PostgreSQL?
memory, cpu, hard
drives, etc?"

Version: postgresql-7.0.2-2.i386.rpm
Hardware: Same Hardware for all SQL Servers (same
machine, of course, one SQL Server is idle while the
other one is active).
Memory: 512 MB RAM, 16GB Hard disk. Pentium II (old)

I am glad to hear that it's possible to get such good
results.  I always thought there would be a way. I'll
try this query you have suggested...

Franz J Fortuny

RE: Speed or configuration

From

The Hermit Hacker

Date:

20 August 2000, 19:01:20

On Sun, 20 Aug 2000, Franz J Fortuny wrote:

> 
> 
> 
> "What version of Postgres are you using, and what does
> EXPLAIN show
> as the query plan for this query?  How many tableY rows
> is the sub-
> query likely to produce, and how many matches do you
> expect to get
> from tableX?"
> 
> Version: postgresql-7.0.2-2.i386.rpm
> Explain: Scan table, scan table. (Plus the costs, etc.)

use cut-n-paste please, and send us the results of the EXPLAIN ... stuff
like the cost estimates and whatnot tell us *so* much ...

Copying data with triggers

From

Keith Wong

Date:

21 August 2000, 19:04:37

Hi all,

In the database I'm designing, I want to have audit tables that keep a log 
of all inserts, updates and deletes that occur
on any table.

e.g. If i had a table Info,
create table Info ( info_id SERIAL,                          some_data text
)

I would also have a corresponding audit table

create table AudInfo (aud_key_id SERIAL,                               info_id int4,
some_datatext,                               aud_operation_type,                               aud_log_time timestamp
defaultnow()
 
)

now I tried creating a trigger on Info, so that whenever an insert occurs, 
the records are copied to the audit table.

create function tp_info () returns opaque as 'begin    -- insert into audit table    insert into AudInfo (info_id,
some_data,aud_operation_type) values 
 
(new.info_id, new.some_data, ''i'');    return new;end;
' language 'plpgsql';

create trigger tp_info before insert on Infofor each row execute procedure tp_info();

This doesn't work however. A parse error occurs in the first line.
I suspect that I cannot insert a record into another table in a trigger. 
I'm not sure why though.

Anybody else done similar operations within a trigger procedure? Or know of 
a work around?

Cheers,
Keith.

Re: Copying data with triggers

From

Tom Lane

Date:

21 August 2000, 19:27:05

Keith Wong <keith@e-magine.com.au> writes:
> create function tp_info () returns opaque as '
>     begin
>         -- insert into audit table
>         insert into AudInfo (info_id, some_data, aud_operation_type) values 
> (new.info_id, new.some_data, ''i'');
>         return new;
>     end;
> ' language 'plpgsql';

> create trigger tp_info before insert on Info
>     for each row execute procedure tp_info();

> This doesn't work however. A parse error occurs in the first line.
> I suspect that I cannot insert a record into another table in a trigger. 

You certainly can --- I have apps that do it all the time.  I suspect
maybe the plpgsql parser gets confused by SQL-style comments?  Your
example seems to work fine if I leave out the comment.
        regards, tom lane