Thread: Speed or configuration
At our company we are presently using a commercial database that generates results from this query: select xx1,xx2,xx3 from tableX where field1 in (select field1 from tableY where field2=NNN and field3=NNN2 and field4=NNN4) tableX has 790,000 rows, and an index on field1 tableY has abou 175,000 rows and an index that includes field2,field3,field4 (and 2 other fields not being used here) Of course, the order in the indexes is the logical one. I have made copies of the tables from the commercial SQL server to PostgreSQL and PostgreSQL is consistently faster in things like count(*) and certain other queries. But when it comes to the above mentioned query, PostgreSQL simply stays there, forever. The postgres backend must be killed in order to free the client program. Any suggestions? Franz J Fortuny
On Sun, 20 Aug 2000, Franz J Fortuny wrote: > At our company we are presently using a commercial > database that generates results from this query: > > select xx1,xx2,xx3 from tableX > where field1 in > (select field1 from tableY where > field2=NNN and field3=NNN2 and field4=NNN4) > > tableX has 790,000 rows, and an index on field1 > tableY has abou 175,000 rows and an index that includes > field2,field3,field4 (and 2 other fields not being used > here) > > Of course, the order in the indexes is the logical one. > > I have made copies of the tables from the commercial SQL > server to PostgreSQL and PostgreSQL is consistently > faster in things like count(*) and certain other > queries. > > But when it comes to the above mentioned query, > PostgreSQL simply stays there, forever. The postgres > backend must be killed in order to free the client > program. > > Any suggestions? First question(s) I have is what version of PostgreSQL? what hardware is the commercial SQL server running on? PostgreSQL? memory, cpu, hard drives, etc? I run something similar on a 7.0.2 machine with a table consisting of ~9million rows and another of ~100k, and results come back in <1min ... in fact, the exact query I've tried looks something like: SELECT url_id,intag FROM ndictWHERE word_id=-903076670 AND url_id IN ( SELECT distinct url_id FROM ndict WHERE word_id=-70722333 AND url_id IN ( SELECT distinct url_id FROM ndict,url WHERE ndict.word_id=572517542 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/%')) ); with subsequent run results looking like: 0.000u 0.014s 0:24.12 0.0% 768+1116k 0+0io 0pf+0w %time psql -h db -o tmpfile udmsearch < query 0.006u 0.006s 0:08.56 0.0% 0+0k 1+0io 0pf+0w %time psql -h db -o tmpfile udmsearch < query 0.005u 0.000s 0:09.10 0.0% 0+0k 0+0io 0pf+0w %time psql -h db -o tmpfile udmsearch < query 0.000u 0.012s 0:11.07 0.0% 300+540k 0+0io 0pf+0w
"Franz J Fortuny" <ffortuny@ivsol.com> writes: > [ this query is slow: ] > select xx1,xx2,xx3 from tableX > where field1 in > (select field1 from tableY where > field2=NNN and field3=NNN2 and field4=NNN4) What version of Postgres are you using, and what does EXPLAIN show as the query plan for this query? How many tableY rows is the sub- query likely to produce, and how many matches do you expect to get from tableX? The Postgres planner is not currently very smart about optimizing sub-SELECTs. We hope to do something about that by 7.2 or so, but in the meantime you should look at transforming the query to some other form. You might find that EXISTS will help: select xx1,xx2,xx3 from tableX where exists (select 1 from tableYwhere field1=tableX.field1 and field2=NNN and field3=NNN3 and field4=NNN4) However this is unlikely to help much unless you change the index structure for tableY. Perhaps you could write it as a join: select xx1,xx2,xx3 from tableX, tableY where tableX.field1=tableY.field1and field2=NNN and field3=NNN3 and field4=NNN4 although this won't give quite the same results if there can be multiple matching rows in tableY for a tableX row. (DISTINCT might help if so.) regards, tom lane
(It won't really be forever, just probably a really long time) You can usually get around it by rewriting the query to use EXISTS rather than IN. Stephan Szabo sszabo@bigpanda.com On Sun, 20 Aug 2000, Franz J Fortuny wrote: > At our company we are presently using a commercial > database that generates results from this query: > > select xx1,xx2,xx3 from tableX > where field1 in > (select field1 from tableY where > field2=NNN and field3=NNN2 and field4=NNN4) > > tableX has 790,000 rows, and an index on field1 > tableY has abou 175,000 rows and an index that includes > field2,field3,field4 (and 2 other fields not being used > here) > > Of course, the order in the indexes is the logical one. > > I have made copies of the tables from the commercial SQL > server to PostgreSQL and PostgreSQL is consistently > faster in things like count(*) and certain other > queries. > > But when it comes to the above mentioned query, > PostgreSQL simply stays there, forever. The postgres > backend must be killed in order to free the client > program.
"What version of Postgres are you using, and what does EXPLAIN show as the query plan for this query? How many tableY rows is the sub- query likely to produce, and how many matches do you expect to get from tableX?" Version: postgresql-7.0.2-2.i386.rpm Explain: Scan table, scan table. (Plus the costs, etc.) About 9,000 matches from tableX are expected, and a lot less from tableY (about 2000, at the most). The index structure of tableY includes 5 fields. Most of the time, we need only the first and second fields. Less often, up to the third field and a lot less often the other 2 left fields (in the "where" clauses). However, field1 of tableX references field1 of tableY. If I use a program to get the results, they come out incredibly fast (faster in postgreSQL than with the commercial program) (A select to get a first record set from tableY, then navigate the rows, executing a select for tableX for each row...) These the structures of the tables: create table tableX (col0 integer not null primary key,col1 integer not null,col2 integer not null,col3 char(20),col4 char(8),col5 char(8),unique(col1,col2,col3,col4,col5) ); create table tableY ( col0 integer not null references tableX(col0), col1 integer, col2 integer, col3 float, col4 float ); The slow query is: select col1,col2,sum(col3),sum(col4) from tableY where col0 in (select col0 from tableX where col1=:col1 and col2=:col2) group by col1,col2; Explain plan shows ALWAYS scan table, scan table, scan table. tableY contains about 900,000 rows, and tableX about 65,000. Any more light about the slowness? Franz J Fortuny
"First question(s) I have is what version of PostgreSQL? what hardware is the commercial SQL server running on? PostgreSQL? memory, cpu, hard drives, etc?" Version: postgresql-7.0.2-2.i386.rpm Hardware: Same Hardware for all SQL Servers (same machine, of course, one SQL Server is idle while the other one is active). Memory: 512 MB RAM, 16GB Hard disk. Pentium II (old) I am glad to hear that it's possible to get such good results. I always thought there would be a way. I'll try this query you have suggested... Franz J Fortuny
On Sun, 20 Aug 2000, Franz J Fortuny wrote: > > > > "What version of Postgres are you using, and what does > EXPLAIN show > as the query plan for this query? How many tableY rows > is the sub- > query likely to produce, and how many matches do you > expect to get > from tableX?" > > Version: postgresql-7.0.2-2.i386.rpm > Explain: Scan table, scan table. (Plus the costs, etc.) use cut-n-paste please, and send us the results of the EXPLAIN ... stuff like the cost estimates and whatnot tell us *so* much ...
Hi all, In the database I'm designing, I want to have audit tables that keep a log of all inserts, updates and deletes that occur on any table. e.g. If i had a table Info, create table Info ( info_id SERIAL, some_data text ) I would also have a corresponding audit table create table AudInfo (aud_key_id SERIAL, info_id int4, some_datatext, aud_operation_type, aud_log_time timestamp defaultnow() ) now I tried creating a trigger on Info, so that whenever an insert occurs, the records are copied to the audit table. create function tp_info () returns opaque as 'begin -- insert into audit table insert into AudInfo (info_id, some_data,aud_operation_type) values (new.info_id, new.some_data, ''i''); return new;end; ' language 'plpgsql'; create trigger tp_info before insert on Infofor each row execute procedure tp_info(); This doesn't work however. A parse error occurs in the first line. I suspect that I cannot insert a record into another table in a trigger. I'm not sure why though. Anybody else done similar operations within a trigger procedure? Or know of a work around? Cheers, Keith.
Keith Wong <keith@e-magine.com.au> writes: > create function tp_info () returns opaque as ' > begin > -- insert into audit table > insert into AudInfo (info_id, some_data, aud_operation_type) values > (new.info_id, new.some_data, ''i''); > return new; > end; > ' language 'plpgsql'; > create trigger tp_info before insert on Info > for each row execute procedure tp_info(); > This doesn't work however. A parse error occurs in the first line. > I suspect that I cannot insert a record into another table in a trigger. You certainly can --- I have apps that do it all the time. I suspect maybe the plpgsql parser gets confused by SQL-style comments? Your example seems to work fine if I leave out the comment. regards, tom lane