Re: Success: Finished porting application to postgreSQL - Mailing list pgsql-general

From Ralph Graulich
Subject Re: Success: Finished porting application to postgreSQL
Date
Msg-id Pine.LNX.4.21.0208190936420.479-100000@shauny.shauny.de
Whole thread Raw
In response to Re: Success: Finished porting application to postgreSQL  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Success: Finished porting application to postgreSQL
Re: Success: Finished porting application to postgreSQL
Re: Success: Finished porting application to postgreSQL
List pgsql-general
Hi Tom,


this will get a little lengthy posting, as there is lots of information to
talk about. I hope it's interesting nonetheless.

> Cool.  We keep hearing about how MySQL is faster than Postgres, so it's
> always nice to see hard evidence that it ain't necessarily so.  Do you
> have any thoughts about where the speedup came from --- for example,
> were you able to rewrite queries into more efficient forms using PG's
> extra features?

I have to say that my daily job is to administer a large, distributed
Oracle database, so I am experienced in taking advantage of advanced
database techniques which go beyond writing simple "SELECT city FROM
adress WHERE name='Smith'". *g*

I sure do not want to start a quarrel about the pro and cons of mySQL
compared to postgreSQL. Our ISP I worked for uses mySQL for large projects
and it does pretty well. So for me, being both limited in budget (which
means hardware) and time, my first choice was mySQL. It did rather well
for the last three years. But with more and more people using my
application, the server went slower and slower each months and coding
around the lack of certain SQL features made things even worse.

The main advantages I took heavy use of are:

(1) As I do a complete version history of every change done in the
database with the possibility to step back into older versions, there is a
higher UPDATE/INSERT-to-SELECT ratio compared to the usual web database.

(2) This versioning led to rather "expensive" SQLs even for simple
tasks. The basics for example are:

    a) If I step back in the history to 14-MAR-2002 for a specific
entry, I have to check what the latest entry for this day was (like: AND
vno=(SELECT t1.vno FROM t1.table WHERE t.id=t1.id AND
t1.date='2002-03-14') - for all the tables in a complex join.

    b) To show a change history, I have to check wether there exists
an active entry for an old change (maybe the entry was "deleted"
completely) date. If so, get the newest version showing, but use the old
dates. Check wether there exists a version 1 for this day (means mark the
entry "NEW") or wether the lowest version for this entry on this day is
larger 1 (mark "CHANGED").

    c) Using lot's of tablename.active='Y' conditions everywhere to
only use the current versions for the frontend user. This is where the
partitioned indexes - how I call them - come into play:
    CREATE INDEX ix_dam_dam_ctr_ba ON dam(ctr, ba) WHERE active='Y'
    This simple feature saves me lots of index scan time, as less than
10% of the data consists of current versions.

(3) Encapsulating most of the so called business logic into
pl/pgsql-procedures saved me sending dozens of large SQLs from PHP to
postgreSQL over and over again. I count an SQL statement as large, if it's
sized more than 4 KByte in raw ASCII.

(4) Being able to use sub queries saved me using temp tables, which can be
quite slow if there are a couple of them and they are quite large. An
essential one in my example was about 120 MB, which took more than 35
seconds to create. Same issue solved with sub queries takes about 600 to
800 ms approximately on average.

(5) Doing lots of DESCending sorts, which postgres uses backward index
scans for, counts for another speed up compared to mySQL.

(6) Easier logic for updates and inserts as I can count on transactions as
I am used to by e.g. Oracle. - Meanwhilst mySQL does support transactions,
too, but still lacking an easy use of stored procedures (see above).

(7) Possibility to use triggers, which I do quite a lot in places where it
saves time for rare complex operations or enhances user comfort:

    a) accounting subtotals in separate tables - instead of doing
SUM(), AVG(), MIN(), MAX() and GROUP over and over again for each detail
page. Easy and convenient.

    b) maintaining a search index: This is a little hack, as I need
german sort order using locale setting, which prevents LIKE comparisons
from using indexes. So instead of writing:

    SELECT t1.name, t1.id FROM table1 t1 WHERE t1.name LIKE 'String%';

    I do:

    SELECT t1.name, t1.id FROM table1 t1, table2 t2 WHERE t1.id=t2.id
AND t2.searchkey='String' AND t2.strlen=6;

    where table2 has the entries:

    id    strlen    searchkey
    27    1    S
    27    2    St
    27    3    Str
    27    4    Stri
    27    5    Strin
    27    6    String
    27    7    String
    27    8    String w
    27    9    String wi

    and so on. This led to a large helper table, but the lookup is
magnitudes faster than a full table scan on table1, as I have
partitionally indexed again:

    CREATE ix_len1 ON table2 (searchkey) WHERE strlen=1;
    CREATE ix_len2 ON table2 (searchkey) WHERE strlen=2;

    and so on. This means only a very small amount of data has to be
scanned.

(8) My application seems to take full advantages of the optimizer using
nested loops or merge joins depending on certain search criterias (search
this day, the last three days, last week, last month etc.) - with the
possibility to turn off certain execution plans explicitely.

(9) Due to historical reasons I have to check four main tables if I do a
"complete search" on certain criterias, which also means an entry can show
up more than once from different sources - which is completely legal in
this case. But for search purposes of the frontend user, I have to
represent each unqiue entry. This led to heavy use of UNION (not UNION
ALL, see above, though more expensive) of four, eight, or sixteen
tables. The following advantages are combined here:

    a) use of small partitioned indexes
    b) use of precalculated subtotals in the WHERE clause
    c) fast execution of MAX/MIN checks with SORT-LIMIT-1-workaround
    d) trigger to store search results with hash code in a separate
table and bail them out, if one or more of the basics table get changed



Conclusion:

* postgreSQL and its features gave me the power to easily implement the
transition from mysql to postgres and speed up my application ten fold.

* postgreSQL made it easy encapsulating business logic into the database.

* postgreSQL scales better with concurrent complex updates and easy
selects in an multi user web environment.

* postgreSQL is more complex than mySQL, which means more learning for the
basic user, but gives the needed power to any advanced user.

* postgreSQL feels like a real database, whereas mySQL is an easy data
storage containter (both may have its advantage in a specific
environment).



For the future (the personal I-want-to's):

* I want to look into defining my own datatypes and objects, which I think
can make things for me even more easy.

* I want to check how to overload operators, if possible, for doing my own
math.

* I want to look into using C to write my own algorithms for further
performance improvement, though I am still lacking tons of C experience
;-)

* I want to see better documentation for postgreSQL, especially for
pl/pgSQL. I'd love to contribute to the documentation, but as I am not a
native speaker of the english language, I'm afraid, I can't. Though a
place with lots of pl/pgSQL-snippets would be nice (knowing of the
cookbook already).


I hope this little insight was interesting to read. If there are any
questions on specific topics left, feel free to ask. I'll be glad to
answer all of them.


Kind regards
... Ralph ...



pgsql-general by date:

Previous
From: Thomas O'Dowd
Date:
Subject: postgresql euc/sjis utf8 mappings
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Success: Finished porting application to postgreSQL