Thread: optimal sql
Hi, I am running postgresql 7.2.3 on a test server (with potential of becoming my production server). On the server I have a perl script, that is grabbing some data from a inventory database (local) - with some subselects. The query is like this: <query> my $sth = $ppdb->prepare(" select partno, create_date, mfg, condition, gescode, qty, cmup,(SELECT partno_main FROM partno_lookup where partno_lookup.partno_alias ilike (?|| inventory.partno ||?) and mfg ilike ? limit 1) as partno_main, (SELECT subcat FROM partno_lookup where partno_lookup.partno_alias ilike (?|| inventory.partno ||?) and mfg ilike ? limit 1) as subcat, (SELECT key_searchFROM partno_lookup where partno_lookup.partno_alias ilike (?|| inventory.partno ||?) and mfg ilike ? limit 1) as key_search, (SELECT text_descFROM descriptions where descriptions.partno=(SELECT partno_main FROM partno_lookup where partno_lookup.partno_alias ilike (?|| inventory.partno||?) and mfg ilike ? limit 1) limit 1) as descri from inventory where mfg ilike ? and ? < create_date $refurbed order by key_search, subcat, partno_main, status DESC "); </query> It takes quite a while for the query to get processed - and the script to return my values. The inventory table has approx. 23000 records - and the partno_lookup has approx. 1100. Is there anyway I can optimize the sql - og perhaps optimize my postgresql db settings ? ( I am running my postgresql on FreeBSD, on a fairly adequite machine with 1GB RAM) I look forward to any pointers or hints you might have. Thanks. /mich -- Best Regards,Michael Landin Hostbaek FreeBSDCluster.org - an International Community */ PGP-key available upon request /*
On Wed, 22 Jan 2003, Michael Hostbaek wrote: > Hi, I would suggest looking at the problem in three directions: a) PostgreSQL system wise b) PostgreSQL sql wise c) FreeBSD wise. For a) do all the necessary tuning on PostgreSQL. With 1GB of Mem, you could set a value of shared_buffers to 100000. Also check the fsync setting. Minimising logging may be a good idea. Read the docs on the site. For b) do explain analyze to be sure you have the right index usage, or create indexes where appropriate. Check the statitistics of your tables, distributions, counts etc... For c) check all kern.ipc.shm* and kern.ipc.sem* kernel variables. (you will need to set some of those in order to get the desired shared_buffers in a)) Rebuild a custom kernel fitting your needs. Check http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/config-tuning.html Also do man 7 tuning. And, check http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/index.html (Look at DMA access in your kernel CONFIG, consider turning on IDE write caching, etc....). Also during your perl script, a good idea is to have iostat 3 , vmstat 3, running. This will give you hints of where your system starves. If for instance your system cache is small, and CPU usage is small and you have a lot of IO, then increase shared_buffers, and tune your disks. (also do man 8 tunefs) IF you have nearly ~ 100% CPU usage, then the system may look healthier but your query not. > > I am running postgresql 7.2.3 on a test server (with potential of > becoming my production server). > > On the server I have a perl script, that is grabbing some data from a > inventory database (local) - with some subselects. > The query is like this: > > <query> > my $sth = $ppdb->prepare(" > select partno, create_date, mfg, condition, gescode, qty, > cmup,(SELECT partno_main FROM partno_lookup where > partno_lookup.partno_alias ilike (?|| inventory.partno ||?) and mfg > ilike ? limit 1) > as partno_main, (SELECT subcat FROM partno_lookup where > partno_lookup.partno_alias ilike > (?|| inventory.partno ||?) and mfg ilike ? limit 1) as subcat, > (SELECT key_search FROM partno_lookup where > partno_lookup.partno_alias ilike (?|| inventory.partno ||?) and > mfg ilike ? limit 1) as key_search, > (SELECT text_desc FROM descriptions where > descriptions.partno=(SELECT partno_main FROM partno_lookup > where partno_lookup.partno_alias ilike (?|| inventory.partno ||?) > and mfg ilike ? limit 1) > limit 1) as descri from inventory where mfg ilike ? and ? < > create_date $refurbed order by key_search, > subcat, partno_main, status DESC "); > </query> > > It takes quite a while for the query to get processed - and the script > to return my values. > The inventory table has approx. 23000 records - and the partno_lookup > has approx. 1100. > > Is there anyway I can optimize the sql - og perhaps optimize my > postgresql db settings ? ( I am running my postgresql on FreeBSD, on a > fairly adequite machine with 1GB RAM) > > I look forward to any pointers or hints you might have. > > Thanks. > > /mich > > -- > Best Regards, > Michael Landin Hostbaek > FreeBSDCluster.org - an International Community > > */ PGP-key available upon request /* > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-10-8981112 fax: +30-10-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > For a) do all the necessary tuning on PostgreSQL. > With 1GB of Mem, you could set a value of shared_buffers to 100000. Perhaps just a type, but that is way to much! It would mean about 800 Mb shared buffers! I would rather suggest a value between 1000 and 10000. See recent descussions on -performance and -hackers mailing lists. Best Regards, Michael Paesold
Tomasz Myrta (jasiek) writes: > > 3. Explain analyze would be helpful like in most performance cases... > The same with SQL query instead of Perl script. Explain analyze: <explain> NOTICE: QUERY PLAN: Limit (cost=27.55..27.55 rows=1 width=183) (actual time=35364.89..35365.04 rows=10 loops=1) -> Sort (cost=27.55..27.55 rows=1 width=183) (actual time=35364.87..35364.92 rows=11 loops=1) -> Group (cost=27.51..27.54 rows=1 width=183) (actual time=35350.49..35359.96 rows=411 loops=1) -> Sort (cost=27.51..27.51 rows=1 width=183) (actual time=35350.43..35352.52 rows=411 loops=1) -> Seq Scan on inventory (cost=0.00..27.50 rows=1 width=183) (actual time=168.52..35342.92 rows=411 loops=1) SubPlan -> Limit (cost=0.00..30.00 rows=1 width=48) (actual time=4.99..6.14 rows=0 loops=411) -> Seq Scan on partno_lookup (cost=0.00..30.00 rows=1 width=48) (actual time=4.96..6.11 rows=1 loops=411) -> Limit (cost=0.00..30.00 rows=1 width=93) (actual time=4.97..6.13 rows=0 loops=411) -> Seq Scan on partno_lookup (cost=0.00..30.00 rows=1 width=93) (actual time=4.95..6.10 rows=1 loops=411) -> Limit (cost=0.00..4.50 rows=1 width=32) (actual time=57.94..73.46 rows=0 loops=411) InitPlan -> Limit (cost=0.00..30.00 rows=1 width=48) (actual time=5.00..6.16 rows=0 loops=411) -> Seq Scan on partno_lookup (cost=0.00..30.00 rows=1 width=48) (actual time=4.98..6.13 rows=1 loops=411) -> Seq Scan on descriptions (cost=0.00..22.50 rows=5 width=32) (actual time=57.91..73.43 rows=1 loops=411) Total runtime: 35365.50 msec EXPLAIN </explain> <real query> explain analyze select partno, create_date, mfg, condition, gescode, qty, cmup,(SELECT partno_main FROM partno_lookup where partno_lookup.partno_alias ilike '%'||inventory.partno||'%' and mfg ilike 'CISCO' limit 1) as partno_main, (SELECT subcat FROM partno_lookup where partno_lookup.partno_alias ilike '%'||inventory.partno||'%' and mfg ilike 'CISCO' limit 1) as subcat, (SELECT text_desc FROM descriptions where descriptions.partno=(SELECT partno_main FROM partno_lookup where partno_lookup.partno_alias ilike '%'||inventory.partno||'%' and mfg ilike 'CISCO' limit 1) limit 1) as descri from inventory where mfg ilike 'CISCO' and '2003-01-15' < create_date and condition not like 'REFURB' group by partno_main, partno, create_date, mfg, condition, gescode, qty, cmup, subcat, descri, status order by subcat, partno_main, status DESC limit 10; </real query> Here is a sample of how a partno_lookup record looks like: ppdb=> select * from partno_lookup where partno_main = 'WIC-2T';partno_main | partno_alias | mfg | subcat | key_search -------------+---------------------------------------------+-------+-------------+------------WIC-2T | WIC2TB,WIC-2T,WIC-2T=,WIC2T,WIC2T=,WIC2TREF| CISCO | WIC MODULES | A (1 row) Any help very much appreciated. /mich -- Best Regards,Michael Landin Hostbaek FreeBSDCluster.org - an International Community */ PGP-key available upon request /*
Michael Hostbaek wrote: >Tomasz Myrta (jasiek) writes: > >>3. Explain analyze would be helpful like in most performance cases... >>The same with SQL query instead of Perl script. > > >Explain analyze: > > >NOTICE: QUERY PLAN: > >Limit (cost=27.55..27.55 rows=1 width=183) (actual >time=35364.89..35365.04 rows=10 loops=1) > -> Sort (cost=27.55..27.55 rows=1 width=183) (actual >time=35364.87..35364.92 rows=11 loops=1) > -> Group (cost=27.51..27.54 rows=1 width=183) (actual >time=35350.49..35359.96 rows=411 loops=1) > -> Sort (cost=27.51..27.51 rows=1 width=183) (actual >time=35350.43..35352.52 rows=411 loops=1) > -> Seq Scan on inventory (cost=0.00..27.50 rows=1 >width=183) (actual time=168.52..35342.92 rows=411 loops=1) > SubPlan > -> Limit (cost=0.00..30.00 rows=1 >width=48) (actual time=4.99..6.14 rows=0 loops=411) > -> Seq Scan on partno_lookup >(cost=0.00..30.00 rows=1 width=48) (actual time=4.96..6.11 rows=1 >loops=411) > -> Limit (cost=0.00..30.00 rows=1 >width=93) (actual time=4.97..6.13 rows=0 loops=411) > -> Seq Scan on partno_lookup >(cost=0.00..30.00 rows=1 width=93) (actual time=4.95..6.10 rows=1 >loops=411) > -> Limit (cost=0.00..4.50 rows=1 width=32) >(actual time=57.94..73.46 rows=0 loops=411) > InitPlan > -> Limit (cost=0.00..30.00 rows=1 >width=48) (actual time=5.00..6.16 rows=0 loops=411) > -> Seq Scan on partno_lookup >(cost=0.00..30.00 rows=1 width=48) (actual time=4.98..6.13 rows=1 >loops=411) > -> Seq Scan on descriptions >(cost=0.00..22.50 rows=5 width=32) (actual time=57.91..73.43 rows=1 >loops=411) >Total runtime: 35365.50 msec > >EXPLAIN > > > >explain analyze select partno, create_date, mfg, condition, gescode, >qty, cmup,(SELECT partno_main FROM >partno_lookup where partno_lookup.partno_alias ilike >'%'||inventory.partno||'%' and mfg ilike 'CISCO' limit 1) >as partno_main, (SELECT subcat FROM partno_lookup where >partno_lookup.partno_alias ilike >'%'||inventory.partno||'%' and mfg ilike 'CISCO' limit 1) as subcat, >(SELECT text_desc FROM descriptions where descriptions.partno=(SELECT >partno_main FROM partno_lookup >where partno_lookup.partno_alias ilike '%'||inventory.partno||'%' and >mfg ilike 'CISCO' limit 1) limit 1) as >descri from inventory where mfg ilike 'CISCO' and '2003-01-15' < >create_date and condition not like 'REFURB' >group by partno_main, partno, create_date, mfg, condition, gescode, qty, >cmup, subcat, descri, status order by >subcat, partno_main, status DESC limit 10; > > >Here is a sample of how a partno_lookup record looks like: Main problem of your query is this: Seq Scan on inventory (cost=0.00..27.50 rows=1 width=183) (actual time=168.52..35342.92 rows=411 loops=1) Do you have to use "ilike" condition in all cases? Database won't use index on this table at all, which compared to thousands of records isn't good. Next problem - your table isn't too normalized... I don't know, how much have you done to your database, but I think, you should reorganize it. Example: Create table manufacturers ( mfgid integer, name varchar (for example "Cisco") ) In table inventory change field mfg into mfgid. In table partno_aliases change field mfg into mfgid. Your query would have something like this: select ... from manufacturers M join inventory I using (mfgid) join partno_aliases PA using (mfgid) where M.name ilike 'Cisco' and ... After this create index on inventory(mfgid,createdate) If you don't want to change anything, create at least index on inventory(createdate). This will speed up queries with recent products - for not too old createdate. Regards, Tomasz Myrta
Michael Hostbaek wrote: >Hi, > >I am running postgresql 7.2.3 on a test server (with potential of >becoming my production server). > >On the server I have a perl script, that is grabbing some data from a >inventory database (local) - with some subselects. >The query is like this: > > >my $sth = $ppdb->prepare(" > select partno, create_date, mfg, condition, gescode, qty, >cmup,(SELECT partno_main FROM partno_lookup where > partno_lookup.partno_alias ilike (?|| inventory.partno ||?) and mfg >ilike ? limit 1) > as partno_main, (SELECT subcat FROM partno_lookup where >partno_lookup.partno_alias ilike > (?|| inventory.partno ||?) and mfg ilike ? limit 1) as subcat, > (SELECT key_search FROM partno_lookup where >partno_lookup.partno_alias ilike (?|| inventory.partno ||?) and > mfg ilike ? limit 1) as key_search, > (SELECT text_desc FROM descriptions where >descriptions.partno=(SELECT partno_main FROM partno_lookup > where partno_lookup.partno_alias ilike (?|| inventory.partno ||?) >and mfg ilike ? limit 1) > limit 1) as descri from inventory where mfg ilike ? and ? < >create_date $refurbed order by key_search, > subcat, partno_main, status DESC "); 1. Probably your query can't use index on table partno_lookup.partno_alias. Consider creating table aliases which contains all possible parts aliases. You can change then "ilike" into "=" which will use indexes. 2. You don't need subselects in your query. You can change them into ordinarytable joins and use "group by" or "distincton". In your case selecting from partno_lookup is executed several times per one row. 3. Explain analyze would be helpful like in most performance cases... The same with SQL query instead of Perl script. 4. This is rather a sql problem, than hardware/configuration one. Regards, Tomasz Myrta
On Wed, 22 Jan 2003, Michael Paesold wrote: > Achilleus Mantzios <achill@matrix.gatewaynet.com> wrote: > > > For a) do all the necessary tuning on PostgreSQL. > > With 1GB of Mem, you could set a value of shared_buffers to 100000. > > Perhaps just a type, but that is way to much! It would mean about 800 Mb > shared buffers! I would rather suggest a value between 1000 and 10000. See > recent descussions on -performance and -hackers mailing lists. Personally i found only performance improvement when increasing shared_buffers. (but then again i speak for me and my queries). The 100,000 value was certainly not a typo (provided he doesnt run X11,KDE, mozilla, etc... on his server) but maybe too high. Some people say 25% of the total Mem is a good rule of thumb, but testing for his specific query must be made. > > Best Regards, > Michael Paesold > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-10-8981112 fax: +30-10-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr