Re: Success: Finished porting application to postgreSQL - Mailing list pgsql-general
From | Ralph Graulich |
---|---|
Subject | Re: Success: Finished porting application to postgreSQL |
Date | |
Msg-id | Pine.LNX.4.21.0208190936420.479-100000@shauny.shauny.de Whole thread Raw |
In response to | Re: Success: Finished porting application to postgreSQL (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Success: Finished porting application to postgreSQL
Re: Success: Finished porting application to postgreSQL Re: Success: Finished porting application to postgreSQL |
List | pgsql-general |
Hi Tom, this will get a little lengthy posting, as there is lots of information to talk about. I hope it's interesting nonetheless. > Cool. We keep hearing about how MySQL is faster than Postgres, so it's > always nice to see hard evidence that it ain't necessarily so. Do you > have any thoughts about where the speedup came from --- for example, > were you able to rewrite queries into more efficient forms using PG's > extra features? I have to say that my daily job is to administer a large, distributed Oracle database, so I am experienced in taking advantage of advanced database techniques which go beyond writing simple "SELECT city FROM adress WHERE name='Smith'". *g* I sure do not want to start a quarrel about the pro and cons of mySQL compared to postgreSQL. Our ISP I worked for uses mySQL for large projects and it does pretty well. So for me, being both limited in budget (which means hardware) and time, my first choice was mySQL. It did rather well for the last three years. But with more and more people using my application, the server went slower and slower each months and coding around the lack of certain SQL features made things even worse. The main advantages I took heavy use of are: (1) As I do a complete version history of every change done in the database with the possibility to step back into older versions, there is a higher UPDATE/INSERT-to-SELECT ratio compared to the usual web database. (2) This versioning led to rather "expensive" SQLs even for simple tasks. The basics for example are: a) If I step back in the history to 14-MAR-2002 for a specific entry, I have to check what the latest entry for this day was (like: AND vno=(SELECT t1.vno FROM t1.table WHERE t.id=t1.id AND t1.date='2002-03-14') - for all the tables in a complex join. b) To show a change history, I have to check wether there exists an active entry for an old change (maybe the entry was "deleted" completely) date. If so, get the newest version showing, but use the old dates. Check wether there exists a version 1 for this day (means mark the entry "NEW") or wether the lowest version for this entry on this day is larger 1 (mark "CHANGED"). c) Using lot's of tablename.active='Y' conditions everywhere to only use the current versions for the frontend user. This is where the partitioned indexes - how I call them - come into play: CREATE INDEX ix_dam_dam_ctr_ba ON dam(ctr, ba) WHERE active='Y' This simple feature saves me lots of index scan time, as less than 10% of the data consists of current versions. (3) Encapsulating most of the so called business logic into pl/pgsql-procedures saved me sending dozens of large SQLs from PHP to postgreSQL over and over again. I count an SQL statement as large, if it's sized more than 4 KByte in raw ASCII. (4) Being able to use sub queries saved me using temp tables, which can be quite slow if there are a couple of them and they are quite large. An essential one in my example was about 120 MB, which took more than 35 seconds to create. Same issue solved with sub queries takes about 600 to 800 ms approximately on average. (5) Doing lots of DESCending sorts, which postgres uses backward index scans for, counts for another speed up compared to mySQL. (6) Easier logic for updates and inserts as I can count on transactions as I am used to by e.g. Oracle. - Meanwhilst mySQL does support transactions, too, but still lacking an easy use of stored procedures (see above). (7) Possibility to use triggers, which I do quite a lot in places where it saves time for rare complex operations or enhances user comfort: a) accounting subtotals in separate tables - instead of doing SUM(), AVG(), MIN(), MAX() and GROUP over and over again for each detail page. Easy and convenient. b) maintaining a search index: This is a little hack, as I need german sort order using locale setting, which prevents LIKE comparisons from using indexes. So instead of writing: SELECT t1.name, t1.id FROM table1 t1 WHERE t1.name LIKE 'String%'; I do: SELECT t1.name, t1.id FROM table1 t1, table2 t2 WHERE t1.id=t2.id AND t2.searchkey='String' AND t2.strlen=6; where table2 has the entries: id strlen searchkey 27 1 S 27 2 St 27 3 Str 27 4 Stri 27 5 Strin 27 6 String 27 7 String 27 8 String w 27 9 String wi and so on. This led to a large helper table, but the lookup is magnitudes faster than a full table scan on table1, as I have partitionally indexed again: CREATE ix_len1 ON table2 (searchkey) WHERE strlen=1; CREATE ix_len2 ON table2 (searchkey) WHERE strlen=2; and so on. This means only a very small amount of data has to be scanned. (8) My application seems to take full advantages of the optimizer using nested loops or merge joins depending on certain search criterias (search this day, the last three days, last week, last month etc.) - with the possibility to turn off certain execution plans explicitely. (9) Due to historical reasons I have to check four main tables if I do a "complete search" on certain criterias, which also means an entry can show up more than once from different sources - which is completely legal in this case. But for search purposes of the frontend user, I have to represent each unqiue entry. This led to heavy use of UNION (not UNION ALL, see above, though more expensive) of four, eight, or sixteen tables. The following advantages are combined here: a) use of small partitioned indexes b) use of precalculated subtotals in the WHERE clause c) fast execution of MAX/MIN checks with SORT-LIMIT-1-workaround d) trigger to store search results with hash code in a separate table and bail them out, if one or more of the basics table get changed Conclusion: * postgreSQL and its features gave me the power to easily implement the transition from mysql to postgres and speed up my application ten fold. * postgreSQL made it easy encapsulating business logic into the database. * postgreSQL scales better with concurrent complex updates and easy selects in an multi user web environment. * postgreSQL is more complex than mySQL, which means more learning for the basic user, but gives the needed power to any advanced user. * postgreSQL feels like a real database, whereas mySQL is an easy data storage containter (both may have its advantage in a specific environment). For the future (the personal I-want-to's): * I want to look into defining my own datatypes and objects, which I think can make things for me even more easy. * I want to check how to overload operators, if possible, for doing my own math. * I want to look into using C to write my own algorithms for further performance improvement, though I am still lacking tons of C experience ;-) * I want to see better documentation for postgreSQL, especially for pl/pgSQL. I'd love to contribute to the documentation, but as I am not a native speaker of the english language, I'm afraid, I can't. Though a place with lots of pl/pgSQL-snippets would be nice (knowing of the cookbook already). I hope this little insight was interesting to read. If there are any questions on specific topics left, feel free to ask. I'll be glad to answer all of them. Kind regards ... Ralph ...
pgsql-general by date: