Thread: Comparative performance
I'm converting a relatively small database (2 MB) from MySQL to PostgreSQL. It is used to generate web pages using PHP. Although the actual website runs under Linux, the development is done under XP. I've completed most of the data conversion and rewrite of the PHP scripts, so now I'm comparing relative performance. It appears that PostgreSQL is two to three times slower than MySQL. For example, some pages that have some 30,000 characters (when saved as HTML) take 1 to 1 1/2 seconds with MySQL but 3 to 4 seconds with PostgreSQL. I had read that the former was generally faster than the latter, particularly for simple web applications but I was hoping that Postgres' performance would not be that noticeably slower. I'm trying to determine if the difference can be attributed to anything that I've done or missed. I've run VACUUM ANALYZE on the two main tables and I'm looking at the results of EXPLAIN on the query that drives the retrieval of probably 80% of the data for the pages in question. Before I post the EXPLAIN and the table schema I'd appreciate confirmation that this list is the appropriate forum. I'm a relative newcomer to PostgreSQL (but not to relational databases), so I'm not sure if this belongs in the novice or general lists. Joe
On Wed, 28 Sep 2005, Joe wrote: > I'm converting a relatively small database (2 MB) from MySQL to PostgreSQL. It > is used to generate web pages using PHP. Although the actual website runs under > Linux, the development is done under XP. I've completed most of the data > conversion and rewrite of the PHP scripts, so now I'm comparing relative > performance. > > It appears that PostgreSQL is two to three times slower than MySQL. For > example, some pages that have some 30,000 characters (when saved as HTML) take 1 > to 1 1/2 seconds with MySQL but 3 to 4 seconds with PostgreSQL. I had read that > the former was generally faster than the latter, particularly for simple web > applications but I was hoping that Postgres' performance would not be that > noticeably slower. Are you comparing PostgreSQL on XP to MySQL on XP or PostgreSQL on Linux to MySQL on Linux? Our performance on XP is not great. Also, which version of PostgreSQL are you using? > > I'm trying to determine if the difference can be attributed to anything that > I've done or missed. I've run VACUUM ANALYZE on the two main tables and I'm > looking at the results of EXPLAIN on the query that drives the retrieval of > probably 80% of the data for the pages in question. Good. > > Before I post the EXPLAIN and the table schema I'd appreciate confirmation that > this list is the appropriate forum. I'm a relative newcomer to PostgreSQL (but > not to relational databases), so I'm not sure if this belongs in the novice or > general lists. You can post the results of EXPLAIN ANALYZE here. Please including schema definitions and the query string(s) themselves. Thanks, Gavin
On Wed, 28 Sep 2005, Joe wrote: > Before I post the EXPLAIN and the table schema I'd appreciate > confirmation that this list is the appropriate forum. It is and and useful things to show are * the slow query * EXPLAIN ANALYZE of the query * the output of \d for each table involved in the query * the output of SHOW ALL; * The amount of memory the machine have The settings that are the most important to tune in postgresql.conf for performance is in my opinion; shared_buffers, effective_cache_size and (to a lesser extent) work_mem. -- /Dennis Björklund
> > It appears that PostgreSQL is two to three times slower > than MySQL. > > For example, some pages that have some 30,000 characters > (when saved > > as HTML) take 1 to 1 1/2 seconds with MySQL but 3 to 4 seconds with > > PostgreSQL. I had read that the former was generally > faster than the > > latter, particularly for simple web applications but I was > hoping that > > Postgres' performance would not be that noticeably slower. > > Are you comparing PostgreSQL on XP to MySQL on XP or > PostgreSQL on Linux to MySQL on Linux? Our performance on XP > is not great. Also, which version of PostgreSQL are you using? That actually depends a lot on *how* you use it. I've seen pg-on-windows deployments that come within a few percent of the linux performance. I've also seen those that are absolutely horrible compared. One sure way to kill the performance is to do a lot of small connections. Using persistent connection is even more important on Windows than it is on Unix. It could easily explain a difference like this. //Magnus
> It appears that PostgreSQL is two to three times slower than MySQL. For > example, some pages that have some 30,000 characters (when saved as > HTML) take 1 to 1 1/2 seconds with MySQL but 3 to 4 seconds with > PostgreSQL. I had read that the former was generally faster than the > latter, particularly for simple web applications but I was hoping that > Postgres' performance would not be that noticeably slower. From my experience, the postgres libraries in PHP are a piece of crap, and add a lot of overhead even from small queries. For instance, a simple query like "SELECT * FROM table WHERE primary_key_id=1234" can take the following time, on my laptop, with data in the filesystem cache of course : EXPLAIN ANALYZE <0.1 ms python + psycopg 2 0.1 ms (damn fast) php + mysql 0.3 ms php + postgres 1-2 ms (damn slow) So, if your pages are designed in The PHP Way (ie. a large number of small queries), I might suggest using a language with a decent postgres interface (python, among others), or rewriting your bunches of small queries as Stored Procedures or Joins, which will provide large speedups. Doing >50 queries on a page is always a bad idea, but it's tolerable in php-mysql, not in php-postgres. If it's only one large query, there is a problem, as postgres is usually a lot smarter about query optimization. If you use the usual mysql techniques (like, storing a page counter in a row in a table, or storing sessions in a table) beware, these are no-nos for postgres, these things should NOT be done with a database anyway, try memcached for instance.
Magnus Hagander wrote: > That actually depends a lot on *how* you use it. I've seen pg-on-windows > deployments that come within a few percent of the linux performance. > I've also seen those that are absolutely horrible compared. > > One sure way to kill the performance is to do a lot of small > connections. Using persistent connection is even more important on > Windows than it is on Unix. It could easily explain a difference like > this. I just tried using pg_pconnect() and I didn't notice any significant improvement. What bothers me most is that with Postgres I tend to see jerky behavior on almost every page: the upper 1/2 or 2/3 of the page is displayed first and you can see a blank bottom (or you can see a half-filled completion bar). With MySQL each page is generally displayed in one swoop. Joe
On Thu, Sep 29, 2005 at 08:16:11AM -0400, Joe wrote: > I just tried using pg_pconnect() and I didn't notice any significant > improvement. PHP persistent connections are not really persistent -- or so I've been told. Anyhow, what was discussed here was pg_query, not pg_connect. You really want to reduce the number of pg_query() calls in any case; you haven't told us how many there are yet, but it sounds like there are a lot of them. > What bothers me most is that with Postgres I tend to see jerky behavior on > almost every page: the upper 1/2 or 2/3 of the page is displayed first and > you can see a blank bottom (or you can see a half-filled completion bar). > With MySQL each page is generally displayed in one swoop. This might just be your TCP/IP stack finding out that the rest of the page isn't likely to come anytime soon, and start sending it out... or something else. I wouldn't put too much weight on it, it's likely to go away as soon as you fix the rest of the problem. /* Steinar */ -- Homepage: http://www.sesse.net/
On Thu, 29 Sep 2005, Joe wrote: > Magnus Hagander wrote: > > That actually depends a lot on *how* you use it. I've seen pg-on-windows > > deployments that come within a few percent of the linux performance. > > I've also seen those that are absolutely horrible compared. > > > > One sure way to kill the performance is to do a lot of small > > connections. Using persistent connection is even more important on > > Windows than it is on Unix. It could easily explain a difference like > > this. > > I just tried using pg_pconnect() and I didn't notice any significant > improvement. What bothers me most is that with Postgres I tend to see jerky > behavior on almost every page: the upper 1/2 or 2/3 of the page is displayed > first and you can see a blank bottom (or you can see a half-filled completion > bar). With MySQL each page is generally displayed in one swoop. Please post the table definitions, queries and explain analyze results so we can tell you why the performance is poor. Gavin
PFC wrote: > From my experience, the postgres libraries in PHP are a piece of > crap, and add a lot of overhead even from small queries. > For instance, a simple query like "SELECT * FROM table WHERE > primary_key_id=1234" can take the following time, on my laptop, with > data in the filesystem cache of course : > > EXPLAIN ANALYZE <0.1 ms > python + psycopg 2 0.1 ms (damn fast) > php + mysql 0.3 ms > php + postgres 1-2 ms (damn slow) As a Trac user I was considering moving to Python, so it's good to know that, but the rewrite is a longer term project. > So, if your pages are designed in The PHP Way (ie. a large number > of small queries), I might suggest using a language with a decent > postgres interface (python, among others), or rewriting your bunches of > small queries as Stored Procedures or Joins, which will provide large > speedups. Doing >50 queries on a page is always a bad idea, but it's > tolerable in php-mysql, not in php-postgres. The pages do use a number of queries to collect all the data for display but nowhere near 50. I'd say it's probably less than a dozen. As an aside, one of my tasks (before the conversion) was to analyze the queries and see where they could be tweaked for performance, but with MySQL that was never a top priority. The schema is fairly simple having two main tables: topic and entry (sort of like account and transaction in an accounting scenario). There are two additional tables that perhaps could be merged into the entry table (and that would reduce the number of queries) but I do not want to make major changes to the schema (and the app) for the PostgreSQL conversion. Joe
Gavin Sherry wrote: > Please post the table definitions, queries and explain analyze results so > we can tell you why the performance is poor. I did try to post that last night but apparently my reply didn't make it to the list. Here it is again: Matthew Nuzum wrote: > This is the right list. Post detail and I'm sure you'll get some suggestions. Thanks, Matthew (and Chris and Gavin). The main table used in the query is defined as follows: CREATE TABLE entry ( entry_id serial PRIMARY KEY, title VARCHAR(128) NOT NULL, subtitle VARCHAR(128), subject_type SMALLINT, subject_id INTEGER REFERENCES topic, actor_type SMALLINT, actor_id INTEGER REFERENCES topic, actor VARCHAR(64), actor_role VARCHAR(64), rel_entry_id INTEGER, rel_entry VARCHAR(64), description VARCHAR(255), quote text, url VARCHAR(255), entry_date CHAR(10), created DATE NOT NULL DEFAULT CURRENT_DATE, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP) WITHOUT OIDS; CREATE INDEX entry_actor_id ON entry (actor_id); CREATE INDEX entry_subject_id ON entry (subject_id); It has 3422 rows at this time. The query for one of the pages is the following: SELECT entry_id, subject_type AS type, subject_type, subject_id, actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, updated FROM entry WHERE subject_id = 1079 UNION SELECT entry_id, actor_type AS type, subject_type, subject_id, actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, updated FROM entry WHERE actor_id = 1079 ORDER BY type, title, subtitle; The output of EXPLAIN ANALYZE is: Sort (cost=158.98..159.14 rows=62 width=568) (actual time=16.000..16.000 rows=59 loops=1) Sort Key: "type", title, subtitle -> Unique (cost=153.57..157.14 rows=62 width=568) (actual time=16.000..16.000 rows=59 loops=1) -> Sort (cost=153.57..153.73 rows=62 width=568) (actual time=16.000..16.000 rows=59 loops=1) Sort Key: entry_id, "type", subject_type, subject_id, actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, description, url, main_quote, rel_quote, dom, date_ymd, date_ym, date_y, created, updated -> Append (cost=0.00..151.73 rows=62 width=568) (actual time=0.000..16.000 rows=59 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..17.21 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1) -> Index Scan using entry_subject_id on entry (cost=0.00..17.17 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1) Index Cond: (subject_id = 1079) -> Subquery Scan "*SELECT* 2" (cost=0.00..134.52 rows=58 width=568) (actual time=0.000..16.000 rows=56 loops=1) -> Seq Scan on entry (cost=0.00..133.94 rows=58 width=568) (actual time=0.000..16.000 rows=56 loops=1) Filter: (actor_id = 1079) Total runtime: 16.000 ms (13 rows) What I don't quite understand is why it's doing a sequential scan on actor_id instead of using the entry_actor_id index. Note that actor_id has 928 non-null values (27%), whereas subject_id has 3089 non-null values (90%). Note that the entry_date column was originally a MySQL date but it had partial dates, i.e., some days and months are set to zero. Eventually I hope to define a PostgreSQL datatype for it and to simplify the substring retrievals. However, I don't think the extra computational time should affect the overall runtime significantly. Gavin, I'm using PostgreSQL 8.0.3, Apache 1.3.28, PHP 4.3.4, MySQL 4.0.16 and I'm comparing both databases on XP (on a Pentium 4, 1.6 GHz, 256 MB RAM). Thanks for any feedback. Joe
Joe wrote: > > > The pages do use a number of queries to collect all the data for display > but nowhere near 50. I'd say it's probably less than a dozen. > > The schema is fairly simple having two main tables: topic and entry > (sort of like account and transaction in an accounting scenario). There > are two additional tables that perhaps could be merged into the entry > table Hm, if you only have 4 tables, why do you need 12 queries? To reduce queries, join them in the query; no need to merge them physically. If you have only two main tables, I'd bet you only need 1-2 queries for the whole page. Regards, Andreas
> I just tried using pg_pconnect() and I didn't notice any significant > improvement. What bothers me most is that with Postgres I tend to see > jerky behavior on almost every page: the upper 1/2 or 2/3 of the page > is displayed first and you can see a blank bottom (or you can see a > half-filled completion bar). With MySQL each page is generally > displayed in one swoop. Persistent connections are useful when your page is fast and the connection time is an important part of your page time. It is mandatory if you want to serve more than 20-50 hits/s without causing unnecessary load on the database. This is not your case, which is why you don't notice any improvement...
> Total runtime: 16.000 ms Even though this query isn't that optimized, it's still only 16 milliseconds. Why does it take this long for PHP to get the results ? Can you try pg_query'ing this exact same query, FROM PHP, and timing it with getmicrotime() ? You can even do an EXPLAIN ANALYZE from pg_query and display the results in your webpage, to check how long the query takes on the server. You can also try it on a Linux box. This smells like a TCP communication problem.
Andreas Pflug wrote: > Hm, if you only have 4 tables, why do you need 12 queries? > To reduce queries, join them in the query; no need to merge them > physically. If you have only two main tables, I'd bet you only need 1-2 > queries for the whole page. There are more than four tables and the queries are not functionally overlapping. As an example, allow me to refer to the page www.freedomcircle.com/topic.php/Economists. The top row of navigation buttons (Life, Liberty, etc.) is created from a query of the 'topic' table. It could've been hard-coded as a PHP array, but with less flexibility. The alphabetical links are from a SELECT DISTINCT substring from topic. It could've been generated by a PHP for loop (originally implemented that way) but again with less flexibility. The listing of economists is another SELECT from topic. The subheadings (Articles, Books) come from a SELECT of an entry_type table --which currently has 70 rows-- and is read into a PHP array since we don't know what headings will be used in a given page. The detail of the entries comes from that query that I posted earlier, but there are three additional queries that are used for specialized entry types (relationships between topics --e.g., Prof. Williams teaches at George Mason, events, and multi-author or multi-subject articles and books). And there's yet another table for the specific book information. Once the data is retrieved it's sorted internally with PHP, at the heading level, before display. Maybe there is some way to merge all the queries (some already fairly complex) that fetch the data for the entries box but I believe it would be a monstrosity with over 100 lines of SQL. Thanks, Joe
PFC wrote: > Even though this query isn't that optimized, it's still only 16 > milliseconds. > Why does it take this long for PHP to get the results ? > > Can you try pg_query'ing this exact same query, FROM PHP, and timing > it with getmicrotime() ? Thanks, that's what I was looking for. It's microtime(), BTW. It'll take me some time to instrument it, but that way I can pinpoint what is really slow. > You can even do an EXPLAIN ANALYZE from pg_query and display the > results in your webpage, to check how long the query takes on the server. > > You can also try it on a Linux box. My current host only supports MySQL. I contacted hub.org to see if they could assist in this transition but I haven't heard back. > This smells like a TCP communication problem. I'm puzzled by that remark. How much does TCP get into the picture in a local Windows client/server environment? Joe
> > This smells like a TCP communication problem. > > I'm puzzled by that remark. How much does TCP get into the > picture in a local Windows client/server environment? Windows has no Unix Domain Sockets (no surprise there), so TCP connections over the loopback interface are used to connect to the server. //Magnus
PFC wrote: > Even though this query isn't that optimized, it's still only 16 > milliseconds. > Why does it take this long for PHP to get the results ? > > Can you try pg_query'ing this exact same query, FROM PHP, and timing > it with getmicrotime() ? That query took about 27 msec in actual PHP execution time. It turns out the real culprit is the following query, which interestingly enough retrieves zero rows in the case of the Economists page that I've been using for testing, yet it uses up about 1370 msec in actual runtime: SELECT topic_id1, topic_id2, topic_name, categ_id, list_name, t.title, url, page_type, rel_type, inverse_id, r.description AS rel_descrip, r.created, r.updated FROM relationship r, topic t, entry_type e WHERE ((topic_id1 = topic_id AND topic_id2 = 1252) OR (topic_id2 = topic_id and topic_id1 = 1252)) AND rel_type = type_id AND e.class_id = 2 ORDER BY rel_type, list_name; The EXPLAIN ANALYZE output, after I ran VACUUM ANALYZE on the three tables, is: Sort (cost=4035.55..4035.56 rows=1 width=131) (actual time=2110.000..2110.000 rows=0 loops=1) Sort Key: r.rel_type, t.list_name -> Nested Loop (cost=36.06..4035.54 rows=1 width=131) (actual time=2110.000..2110.000 rows=0 loops=1) Join Filter: ((("inner".topic_id1 = "outer".topic_id) AND ("inner".topic_id2 = 1252)) OR (("inner".topic_id2 = "outer".topic_id) AND ("inner".topic_id1 = 1252))) -> Seq Scan on topic t (cost=0.00..38.34 rows=1234 width=90) (actual time=0.000..15.000 rows=1234 loops=1) -> Materialize (cost=36.06..37.13 rows=107 width=45) (actual time=0.000..0.509 rows=466 loops=1234) -> Merge Join (cost=30.31..35.96 rows=107 width=45) (actual time=0.000..0.000 rows=466 loops=1) Merge Cond: ("outer".type_id = "inner".rel_type) -> Index Scan using entry_type_pkey on entry_type e (cost =0.00..3.94 rows=16 width=4) (actual time=0.000..0.000 rows=15 loops=1) Filter: (class_id = 2) -> Sort (cost=30.31..31.48 rows=466 width=43) (actual time=0.000..0.000 rows=466 loops=1) Sort Key: r.rel_type -> Seq Scan on relationship r (cost=0.00..9.66 rows=466 width=43) (actual time=0.000..0.000 rows=466 loops=1) Total runtime: 2110.000 ms (14 rows) The tables are as follows: CREATE TABLE entry_type ( type_id SMALLINT NOT NULL PRIMARY KEY, title VARCHAR(32) NOT NULL, rel_title VARCHAR(32), class_id SMALLINT NOT NULL DEFAULT 1, inverse_id SMALLINT, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP) WITHOUT OIDS; CREATE TABLE topic ( topic_id serial PRIMARY KEY, topic_name VARCHAR(48) NOT NULL UNIQUE, categ_id SMALLINT NOT NULL, parent_entity INTEGER, parent_concept INTEGER, crossref_id INTEGER, list_name VARCHAR(80) NOT NULL, title VARCHAR(80), description VARCHAR(255), url VARCHAR(64), page_type SMALLINT NOT NULL, dark_ind BOOLEAN NOT NULL DEFAULT FALSE, ad_code INTEGER, created DATE NOT NULL DEFAULT CURRENT_DATE, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP) WITHOUT OIDS; CREATE TABLE relationship ( topic_id1 INTEGER NOT NULL REFERENCES topic, topic_id2 INTEGER NOT NULL REFERENCES topic, rel_type INTEGER NOT NULL, description VARCHAR(255), created DATE NOT NULL DEFAULT CURRENT_DATE, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (topic_id1, topic_id2, rel_type)) WITHOUT OIDS; I'm thinking that perhaps I need to set up another index with topic_id2 first and topic_id1 second. In addition, an index on entry_type.class_id may improve things. Another possibility would be to rewrite the query as a UNION. Of course, this doesn't explain how MySQL manages to execute the query in about 9 msec. The only minor differences in the schema are: entry_type.title and rel_title are char(32) in MySQL, entry_type.class_id is a tinyint, and topic.categ_id, page_type and dark_ind are also tinyints. MySQL also doesn't have the REFERENCES. A couple of interesting side notes from my testing. First is that pg_connect() took about 39 msec but mysql_connect() took only 4 msec, however, pg_pconnect() took 0.14 msec while mysql_pconnect() took 0.99 msec (all tests were repeated five times and the quoted results are averages). Second, is that PostgreSQL's performance appears to be much more consistent in certain queries. For example, the query that retrieves the list of subtopics (the names and description of economists), took 17 msec in PG, with a low of 15 (three times) and a high of 21, whereas MySQL took 60 msec on average but had a low of 22 and a high of 102 msec. Joe
It's more understandable if the table names are in front of the column names : SELECT relationship.topic_id1, relationship.topic_id2, topic.topic_name, topic.categ_id, topic.list_name, topic.title, topic.url, topic.page_type, relationship.rel_type, entry_type.inverse_id, relationship.description AS rel_descrip, relationship.created, relationship.updated FROM relationship, topic, entry_type WHERE ((relationship.topic_id1 = topic.topic_id AND relationship.topic_id2 = 1252) OR (relationship.topic_id2 = topic.topic_id and relationship.topic_id1 = 1252)) AND relationship.rel_type = entry_type.type_id AND entry_type.class_id = 2 ORDER BY rel_type, list_name; I see a few problems in your schema. - topic_id1 and topic_id2 play the same role, there is no constraint to determine which is which, hence it is possible to define the same relation twice. - as you search on two columns with OR, you need UNION to use indexes. - lack of indexes - I don't understand why the planner doesn't pick up hash joins... - if you use a version before 8, type mismatch will prevent use of the indexes. I'd suggest rewriting the query like this : SELECT topic.*, foo.* FROM topic, (SELECT topic_id2 as fetch_id, topic_id1, topic_id2, rel_type, description as rel_descrip, created, updated FROM relationship WHERE rel_type IN (SELECT type_id FROM entry_type WHERE class_id = 2) AND topic_id1 = 1252 UNION SELECT topic_id1 as fetch_id, topic_id1, topic_id2, rel_type, description as rel_descrip, created, updated FROM relationship WHERE rel_type IN (SELECT type_id FROM entry_type WHERE class_id = 2) AND topic_id2 = 1252) AS foo WHERE topic.topic_id = foo.fetch_id CREATE INDEX'es ON entry_type( class_id ) relationship( topic_id1, rel_type, topic_id2 ) which becomes your new PRIMARY KEY relationship( topic_id2, rel_type, topic_id1 ) > Of course, this doesn't explain how MySQL manages to execute the query > in about 9 msec. The only minor differences in the schema are: > entry_type.title and rel_title are char(32) in MySQL, > entry_type.class_id is a tinyint, and topic.categ_id, page_type and > dark_ind are also tinyints. MySQL also doesn't have the REFERENCES. Can you post the result from MySQL EXPLAIN ? You might be interested in the following code. Just replace mysql_ by pg_, it's quite useful. $global_queries_log = array(); function _getmicrotime() { list($u,$s) = explode(' ',microtime()); return $u+$s; } /* Formats query, with given arguments, escaping all strings as needed. db_quote_query( 'UPDATE junk SET a=%s WHERE b=%s', array( 1,"po'po" ) ) => UPDATE junk SET a='1 WHERE b='po\'po' */ function db_quote_query( $sql, $params=false ) { // if no params, send query raw if( !$params ) return $sql; // quote params foreach( $params as $key => $val ) { if( is_array( $val )) $val = implode( ',', $val ); $params[$key] = "'".mysql_real_escape_string( $val )."'"; } return vsprintf( $sql, $params ); } /* Formats query, with given arguments, escaping all strings as needed. Runs query, logging its execution time. Returns the query, or dies with error. */ function db_query( $sql, $params=false ) { // it's already a query if( is_resource( $sql )) return $sql; $sql = db_quote_query( $sql, $params ); $t = _getmicrotime(); $r = mysql_query( $sql ); if( !$r ) { echo "<div class=bigerror><b>Erreur MySQL :</b><br>".mysql_error()."<br><br><b>Requte</b> :<br>".$sql."<br><br><b>Traceback </b>:<pre>"; foreach( debug_backtrace() as $t ) xdump( $t ); echo "</pre></div>"; die(); } global $global_queries_log; $global_queries_log[] = array( _getmicrotime()-$t, $sql ); return $r; } At the end of your page, display the contents of $global_queries_log.
On Thu, Sep 29, 2005 at 08:44:16AM -0400, Joe wrote: > CREATE TABLE entry ( > entry_id serial PRIMARY KEY, > title VARCHAR(128) NOT NULL, > subtitle VARCHAR(128), > subject_type SMALLINT, > subject_id INTEGER REFERENCES topic, > actor_type SMALLINT, > actor_id INTEGER REFERENCES topic, > actor VARCHAR(64), > actor_role VARCHAR(64), > rel_entry_id INTEGER, > rel_entry VARCHAR(64), > description VARCHAR(255), > quote text, > url VARCHAR(255), > entry_date CHAR(10), > created DATE NOT NULL DEFAULT CURRENT_DATE, > updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP) > WITHOUT OIDS; > CREATE INDEX entry_actor_id ON entry (actor_id); > CREATE INDEX entry_subject_id ON entry (subject_id); A few tips... Fields in PostgreSQL have alignment requirements, so the smallints aren't saving you anything right now. If you put both of them together though, you'll save 4 bytes on most hardware. You'll also get some minor gains from putting all the variable-length fields at the end, as well as nullable fields. If you search the archives for 'field order' you should be able to find some useful info. Make sure these indexes exist if you'll be updating or inserting into entry: CREATE INDEX topic__subject_id ON topic(subject_id); CREATE INDEX topic__actor_id ON topic(actor_id); Also, the fact that subject and actor both point to topic along with subject_type and actor_type make me suspect that your design is de-normalized. Of course there's no way to know without more info. FWIW, I usually use timestamptz for both created and updated fields. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Sep 29, 2005 at 04:39:36PM -0400, Joe wrote: > Andreas Pflug wrote: > >Hm, if you only have 4 tables, why do you need 12 queries? > >To reduce queries, join them in the query; no need to merge them > >physically. If you have only two main tables, I'd bet you only need 1-2 > >queries for the whole page. > > There are more than four tables and the queries are not functionally > overlapping. As an example, allow me to refer to the page > www.freedomcircle.com/topic.php/Economists. > > The top row of navigation buttons (Life, Liberty, etc.) is created from a > query of the 'topic' table. It could've been hard-coded as a PHP array, > but with less flexibility. The alphabetical links are from a SELECT > DISTINCT substring from topic. It could've been generated by a PHP for > loop (originally implemented that way) but again with less flexibility. > The listing of economists is another SELECT from topic. The subheadings > (Articles, Books) come from a SELECT of an entry_type table --which > currently has 70 rows-- and is read into a PHP array since we don't know > what headings will be used in a given page. The detail of the entries I suspect this might be something better done in a join. > comes from that query that I posted earlier, but there are three additional > queries that are used for specialized entry types (relationships between > topics --e.g., Prof. Williams teaches at George Mason, events, and > multi-author or multi-subject articles and books). And there's yet another Likewise... > table for the specific book information. Once the data is retrieved it's > sorted internally with PHP, at the heading level, before display. It's often better to let the database sort and/or aggregate data. > Maybe there is some way to merge all the queries (some already fairly > complex) that fetch the data for the entries box but I believe it would be > a monstrosity with over 100 lines of SQL. Also, just because no one else has mentioned it, remember that it's very easy to get MySQL into a mode where you have no data integrity. If that's the case it's going to be faster than PostgreSQL (though I'm not sure how much that affects the performance of SELECTs). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hi Jim, Jim C. Nasby wrote: > Also, just because no one else has mentioned it, remember that it's very > easy to get MySQL into a mode where you have no data integrity. If > that's the case it's going to be faster than PostgreSQL (though I'm not > sure how much that affects the performance of SELECTs). Yes indeed. When I added the REFERENCES to the schema and reran the conversion scripts, aside from having to reorder the table creation and loading (they used to be in alphabetical order), I also found a few referential integrity errors in the MySQL data. Joe
PFC wrote: > - if you use a version before 8, type mismatch will prevent use of the > indexes. I'm using 8.0.3, but the type mismatch between relationship.rel_type and entry_type.type_id was unintended. The current databases use SMALLINT for both. The PostgreSQL schema was derived from an export script stored in Subversion, apparently before the column datatypes were changed. > CREATE INDEX'es ON > entry_type( class_id ) > > relationship( topic_id1, rel_type, topic_id2 ) which becomes your > new PRIMARY KEY > relationship( topic_id2, rel_type, topic_id1 ) Creating the second relationship index was sufficient to modify the query plan to cut down runtime to zero: Sort (cost=75.94..75.95 rows=2 width=381) (actual time=0.000..0.000 rows=0 loops=1) Sort Key: r.rel_type, t.list_name -> Nested Loop (cost=16.00..75.93 rows=2 width=381) (actual time=0.000..0.000 rows=0 loops=1) Join Filter: ((("outer".topic_id1 = "inner".topic_id) AND ("outer".topic_id2 = 1252)) OR (("outer".topic_id2 = "inner".topic_id) AND ("outer".topic_id1 = 1252))) -> Nested Loop (cost=16.00..35.11 rows=1 width=169) (actual time=0.000..0.000 rows=0 loops=1) Join Filter: ("inner".rel_type = "outer".type_id) -> Seq Scan on entry_type e (cost=0.00..18.75 rows=4 width=4) (actual time=0.000..0.000 rows=15 loops=1) Filter: (class_id = 2) -> Materialize (cost=16.00..16.04 rows=4 width=167) (actual time=0.000..0.000 rows=0 loops=15) -> Seq Scan on relationship r (cost=0.00..16.00 rows=4 width=167) (actual time=0.000..0.000 rows=0 loops=1) Filter: ((topic_id2 = 1252) OR (topic_id1 = 1252)) -> Seq Scan on topic t (cost=0.00..30.94 rows=494 width=216) (never executed) Total runtime: 0.000 ms (13 rows) The overall execution time for the Economists page for PostgreSQL is within 4% of the MySQL time, so for the time being I'll leave the query in its current form. Thanks for your help. Joe
Jim C. Nasby wrote: > Make sure these indexes exist if you'll be updating or inserting into > entry: > > CREATE INDEX topic__subject_id ON topic(subject_id); > CREATE INDEX topic__actor_id ON topic(actor_id); Actually, topic's primary key is topic_id. > Also, the fact that subject and actor both point to topic along with > subject_type and actor_type make me suspect that your design is > de-normalized. Of course there's no way to know without more info. Yes, the design is denormalized. The reason is that a book or article is usually by a single author (an "actor" topic) and it will be listed under one main topic (a "subject" topic). There's a topic_entry table where additional actors and subjects can be added. It's somewhat ironic because I used to teach and/or preach normalization and the "goodness" of a 3NF+ design (also about having the database do aggregation and sorting as you mentioned in your other email). > FWIW, I usually use timestamptz for both created and updated fields. IIRC 'created' ended up as a DATE because MySQL 4 has a restriction about a single TIMESTAMP column per table taking the default value of current_timestamp. Joe
On Tue, Oct 04, 2005 at 05:11:19PM -0400, Joe wrote: > Hi Jim, > > Jim C. Nasby wrote: > >Also, just because no one else has mentioned it, remember that it's very > >easy to get MySQL into a mode where you have no data integrity. If > >that's the case it's going to be faster than PostgreSQL (though I'm not > >sure how much that affects the performance of SELECTs). > > Yes indeed. When I added the REFERENCES to the schema and reran the > conversion scripts, aside from having to reorder the table creation and > loading (they used to be in alphabetical order), I also found a few > referential integrity errors in the MySQL data. Data integrity != refferential integrity. :) It's very easy to accidentally get MyISAM tables in MySQL, which means you are nowhere near ACID which also means you can't get anything close to an apples to apples comparison to PostgreSQL. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Postgresql uses MVCC to ensure data integrity. Server must choose the right version of tuple, according to transaction ID of statement. Even for a select (ACID features of postgresql, I think C and I apply here), it must accomplish some extra work. -----Mensaje original----- De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]En nombre de Joe Enviado el: martes, 04 de octubre de 2005 18:11 Para: Jim C. Nasby CC: Andreas Pflug; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] Comparative performance Hi Jim, Jim C. Nasby wrote: > Also, just because no one else has mentioned it, remember that it's very > easy to get MySQL into a mode where you have no data integrity. If > that's the case it's going to be faster than PostgreSQL (though I'm not > sure how much that affects the performance of SELECTs). Yes indeed. When I added the REFERENCES to the schema and reran the conversion scripts, aside from having to reorder the table creation and loading (they used to be in alphabetical order), I also found a few referential integrity errors in the MySQL data. Joe ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster