Thread: Abnormal performance difference between Postgres and MySQL
Hello,
I am doing a performance comparison between running Jena with MySQL and Postgres. I used the 8.3-community version of Postgres and MySQL 5.0.67. I have run several queries to both MySQL and Postgres and all of them took similar amount of time to execute except one. For the following query to a table having 10,003,728 rows, MySQL takes 0.11 seconds to return results whereas Postgres takes like 1 hour and 20 minutes!
Query:
select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
Table:
Table "public.jena_g1t1_stmt"
Column | Type | Modifiers
---------+--------------------
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
I am doing a performance comparison between running Jena with MySQL and Postgres. I used the 8.3-community version of Postgres and MySQL 5.0.67. I have run several queries to both MySQL and Postgres and all of them took similar amount of time to execute except one. For the following query to a table having 10,003,728 rows, MySQL takes 0.11 seconds to return results whereas Postgres takes like 1 hour and 20 minutes!
Query:
select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
Table:
Table "public.jena_g1t1_stmt"
Column | Type | Modifiers
---------+--------------------
----+-----------
subj | character varying(250) | not null
prop | character varying(250) | not null
obj | character varying(250) | not null
graphid | integer |
Indexes:
"jena_g1t1_stmt_ixo" btree (obj)
"jena_g1t1_stmt_ixsp" btree (subj, prop)
Machine: SunOS 5.10 Generic_127111-11 sun4u sparc SUNW, Sun-Fire-880
Memory: 4 GB
Number of physical processors: 2
I tried to re-arrage the query but each time the amount of time needed is the same. Can anyone help me find the answer to why Postgres is taking so much time?
I can provide any other information needed and also the data if anyone wants.
Thanks and regards,
subj | character varying(250) | not null
prop | character varying(250) | not null
obj | character varying(250) | not null
graphid | integer |
Indexes:
"jena_g1t1_stmt_ixo" btree (obj)
"jena_g1t1_stmt_ixsp" btree (subj, prop)
Machine: SunOS 5.10 Generic_127111-11 sun4u sparc SUNW, Sun-Fire-880
Memory: 4 GB
Number of physical processors: 2
I tried to re-arrage the query but each time the amount of time needed is the same. Can anyone help me find the answer to why Postgres is taking so much time?
I can provide any other information needed and also the data if anyone wants.
Thanks and regards,
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
Farhan Husain <russoue@gmail.com> writes: > I can provide any other information needed and also the data if anyone > wants. What did the query plans look like in both databases? In Postgres you can get the query plan with EXPLAIN ANALYZE select ... You can leave out the ANALYZE if you can't wait until the query completes but it will have much less information to diagnosis any problems. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
On Mon, Feb 23, 2009 at 4:16 PM, Farhan Husain <russoue@gmail.com> wrote: > Hello, > > I am doing a performance comparison between running Jena with MySQL and > Postgres. I used the 8.3-community version of Postgres and MySQL 5.0.67. I > have run several queries to both MySQL and Postgres and all of them took > similar amount of time to execute except one. For the following query to a > table having 10,003,728 rows, MySQL takes 0.11 seconds to return results > whereas Postgres takes like 1 hour and 20 minutes! > > Query: > > select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, > jena_g1t1_stmt A2 Where > A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND > A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' > AND A0.GraphID=1 AND A0.Subj=A1.Subj AND > A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND > A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND > A1.GraphID=1 AND A0.Subj=A2.Subj AND > A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND > A2.GraphID=1; > > Table: > > Table "public.jena_g1t1_stmt" > Column | Type | Modifiers > ---------+-------------------- > ----+----------- > subj | character varying(250) | not null > prop | character varying(250) | not null > obj | character varying(250) | not null > graphid | integer | > Indexes: > "jena_g1t1_stmt_ixo" btree (obj) > "jena_g1t1_stmt_ixsp" btree (subj, prop) > > Machine: SunOS 5.10 Generic_127111-11 sun4u sparc SUNW, Sun-Fire-880 > Memory: 4 GB > Number of physical processors: 2 > > I tried to re-arrage the query but each time the amount of time needed is > the same. Can anyone help me find the answer to why Postgres is taking so > much time? > > I can provide any other information needed and also the data if anyone > wants. What is the locale of your database? I.e.: # show lc_collate ; lc_collate ------------- en_US.UTF-8 (1 row) If it's not C then string compares are going to probably need special indexes to work the way you expect them. (varchar pattern ops). Look here for more information: http://www.postgresql.org/docs/8.3/static/indexes-opclass.html
On Mon, Feb 23, 2009 at 5:27 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
What is the locale of your database? I.e.:On Mon, Feb 23, 2009 at 4:16 PM, Farhan Husain <russoue@gmail.com> wrote:
> Hello,
>
> I am doing a performance comparison between running Jena with MySQL and
> Postgres. I used the 8.3-community version of Postgres and MySQL 5.0.67. I
> have run several queries to both MySQL and Postgres and all of them took
> similar amount of time to execute except one. For the following query to a
> table having 10,003,728 rows, MySQL takes 0.11 seconds to return results
> whereas Postgres takes like 1 hour and 20 minutes!
>
> Query:
>
> select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1,
> jena_g1t1_stmt A2 Where
> A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND
> A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'
> AND A0.GraphID=1 AND A0.Subj=A1.Subj AND
> A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND
> A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND
> A1.GraphID=1 AND A0.Subj=A2.Subj AND
> A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND
> A2.GraphID=1;
>
> Table:
>
> Table "public.jena_g1t1_stmt"
> Column | Type | Modifiers
> ---------+--------------------
> ----+-----------
> subj | character varying(250) | not null
> prop | character varying(250) | not null
> obj | character varying(250) | not null
> graphid | integer |
> Indexes:
> "jena_g1t1_stmt_ixo" btree (obj)
> "jena_g1t1_stmt_ixsp" btree (subj, prop)
>
> Machine: SunOS 5.10 Generic_127111-11 sun4u sparc SUNW, Sun-Fire-880
> Memory: 4 GB
> Number of physical processors: 2
>
> I tried to re-arrage the query but each time the amount of time needed is
> the same. Can anyone help me find the answer to why Postgres is taking so
> much time?
>
> I can provide any other information needed and also the data if anyone
> wants.
# show lc_collate ;
lc_collate
-------------
en_US.UTF-8
(1 row)
If it's not C then string compares are going to probably need special
indexes to work the way you expect them. (varchar pattern ops). Look
here for more information:
http://www.postgresql.org/docs/8.3/static/indexes-opclass.html
Here it is:
ingentadb=# show lc_collate;
lc_collate
-----------------
en_US.ISO8859-1
(1 row)
Do you think this is the source of the problem?
Thanks,
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Tue, Feb 24, 2009 at 12:27 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > If it's not C then string compares are going to probably need special > indexes to work the way you expect them. (varchar pattern ops). Look > here for more information: > > http://www.postgresql.org/docs/8.3/static/indexes-opclass.html It's only relevant for pattern matching (eg LIKE or regexp). AFAICS, the OP only uses plain equals in his query. An EXPLAIN ANALYZE output would be nice to diagnose the problem. -- Guillaume
On Mon, Feb 23, 2009 at 5:27 PM, Gregory Stark <stark@enterprisedb.com> wrote:
Farhan Husain <russoue@gmail.com> writes:What did the query plans look like in both databases?
> I can provide any other information needed and also the data if anyone
> wants.
In Postgres you can get the query plan with
EXPLAIN ANALYZE select ...
You can leave out the ANALYZE if you can't wait until the query completes but
it will have much less information to diagnosis any problems.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
I am doing the EXPLAIN ANALYZE now, it will take about 1 hour and 20 minutes again. I will get back to you once it is finished. Do you know how to get the query plan in MySQL?
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Mon, Feb 23, 2009 at 4:33 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Tue, Feb 24, 2009 at 12:27 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> If it's not C then string compares are going to probably need special >> indexes to work the way you expect them. (varchar pattern ops). Look >> here for more information: >> >> http://www.postgresql.org/docs/8.3/static/indexes-opclass.html > > It's only relevant for pattern matching (eg LIKE or regexp). AFAICS, > the OP only uses plain equals in his query. True, I had a bit of a headache trying to read that unindented query. (OP here's a hint, if you want people to read your queries / code, indent it in some way that makes it fairly readable Note that varchar_pattern_ops indexes can't be used for straight equal compares either.
On Mon, Feb 23, 2009 at 4:35 PM, Farhan Husain <russoue@gmail.com> wrote: > > > On Mon, Feb 23, 2009 at 5:27 PM, Gregory Stark <stark@enterprisedb.com> > wrote: >> >> Farhan Husain <russoue@gmail.com> writes: >> >> > I can provide any other information needed and also the data if anyone >> > wants. >> >> What did the query plans look like in both databases? >> >> In Postgres you can get the query plan with >> >> EXPLAIN ANALYZE select ... >> >> You can leave out the ANALYZE if you can't wait until the query completes >> but >> it will have much less information to diagnosis any problems. >> >> -- >> Gregory Stark >> EnterpriseDB http://www.enterprisedb.com >> Ask me about EnterpriseDB's Slony Replication support! > > I am doing the EXPLAIN ANALYZE now, it will take about 1 hour and 20 minutes > again. I will get back to you once it is finished. Do you know how to get > the query plan in MySQL? Explain works in mysql. It just doesn't tell you a whole lot, because the query planner's dumb as a brick. Note that often that stupid query planner makes queries run really fast. When it doesn't, there's not a lot of tuning you can do to fix it. What does plain explain on pgsql tell you? Please attach output back to the list from it for us to peruse.
On Mon, Feb 23, 2009 at 5:27 PM, Gregory Stark <stark@enterprisedb.com> wrote:
Farhan Husain <russoue@gmail.com> writes:What did the query plans look like in both databases?
> I can provide any other information needed and also the data if anyone
> wants.
In Postgres you can get the query plan with
EXPLAIN ANALYZE select ...
You can leave out the ANALYZE if you can't wait until the query completes but
it will have much less information to diagnosis any problems.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Here is the output:
ingentadb=# EXPLAIN ANALYZE select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=652089.37..665004.47 rows=733195 width=134) (actual time=5410683.129..5410690.033 rows=30 loops=1)
Merge Cond: ((a0.subj)::text = (a1.subj)::text)
-> Sort (cost=86716.91..86796.78 rows=31949 width=208) (actual time=76.395..76.423 rows=30 loops=1)
Sort Key: a0.subj
Sort Method: quicksort Memory: 24kB
-> Nested Loop (cost=0.00..84326.57 rows=31949 width=208) (actual time=4.146..65.409 rows=30 loops=1)
-> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0 (cost=0.00..5428.34 rows=487 width=74) (actual time=1.980..2.142 rows=30 loops=1)
Index Cond: ((obj)::text = 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
Filter: (((prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid = 1))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2 (cost=0.00..161.37 rows=51 width=134) (actual time=2.101..2.104 rows=1 loops=30)
Index Cond: (((a2.subj)::text = (a0.subj)::text) AND ((a2.prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
Filter: (a2.graphid = 1)
-> Sort (cost=565372.46..568084.16 rows=1084680 width=74) (actual time=5410606.604..5410606.628 rows=31 loops=1)
Sort Key: a1.subj
Sort Method: quicksort Memory: 489474kB
-> Seq Scan on jena_g1t1_stmt a1 (cost=0.00..456639.59 rows=1084680 width=74) (actual time=0.043..44005.780 rows=3192000 loops=1)
Filter: ((graphid = 1) AND ((prop)::text = 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text) AND ((obj)::text = 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text))
Total runtime: 5410691.012 ms
(18 rows)
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Mon, Feb 23, 2009 at 6:24 PM, Farhan Husain <russoue@gmail.com> wrote: This sort here: > -> Sort (cost=565372.46..568084.16 rows=1084680 width=74) (actual > time=5410606.604..5410606.628 rows=31 loops=1) > Sort Key: a1.subj > Sort Method: quicksort Memory: 489474kB > -> Seq Scan on jena_g1t1_stmt a1 (cost=0.00..456639.59 > rows=1084680 width=74) (actual time=0.043..44005.780 rows=3192000 loops=1) Seems to be the problem. There are a few things that seem odd, the first is that it estimates it will return 1M ros, but returns only 31. The other is that sorting 31 rows is taking 5410606 milliseconds. My first guess would be to crank up the statistics on a1.subj to a few hundred (going up to a thousand if necessary) re-analyzing and seeing if the query plan changes. I'm not expert enough on explain analyze to offer any more.
Scott Marlowe <scott.marlowe@gmail.com> writes: > On Mon, Feb 23, 2009 at 6:24 PM, Farhan Husain <russoue@gmail.com> wrote: > This sort here: >> -> Sort (cost=565372.46..568084.16 rows=1084680 width=74) (actual >> time=5410606.604..5410606.628 rows=31 loops=1) >> Sort Key: a1.subj >> Sort Method: quicksort Memory: 489474kB >> -> Seq Scan on jena_g1t1_stmt a1 (cost=0.00..456639.59 >> rows=1084680 width=74) (actual time=0.043..44005.780 rows=3192000 loops=1) > Seems to be the problem. There are a few things that seem odd, the > first is that it estimates it will return 1M ros, but returns only 31. > The other is that sorting 31 rows is taking 5410606 milliseconds. Uh, no, it's sorting 3192000 rows --- look at the input scan. Evidently only the first 31 of those rows are getting fetched out of the sort, though. regards, tom lane
Farhan Husain <russoue@gmail.com> writes: > Here is the output: I see a couple of things going on here: * The planner is choosing to use sort-and-mergejoin for the second join. This requires sorting all of jena_g1t1_stmt. If it had accurately estimated the output size of the first join (ie 30 rows not 30K rows) it'd likely have gone for a nestloop join instead, assuming that you have an index on jena_g1t1_stmt.subj. You need to try to reduce the 1000X error factor in that estimate. I'm not sure how much it will help to increase the stats targets on the input tables, but that's the first thing to try. * Considering that the sort is all in memory, 5400 seconds seems like a heck of a long time even for sorting 3M rows. Since we already found out you're using a non-C locale, the sort comparisons are ultimately strcoll() calls, and I'm betting that you are on a platform where strcoll() is really really slow. Another possibility is that you don't really have 500MB of memory to spare, and the sort workspace is getting swapped in and out (read thrashed...). Setting work_mem higher than you can afford is a bad idea. In comparison to mysql, I think that their planner will use a indexed nestloop if it possibly can, which makes it look great on this type of query (but not so hot if the query actually does need to join a lot of rows). regards, tom lane
On Mon, Feb 23, 2009 at 5:27 PM, Gregory Stark <stark@enterprisedb.com> wrote:
Farhan Husain <russoue@gmail.com> writes:What did the query plans look like in both databases?
> I can provide any other information needed and also the data if anyone
> wants.
In Postgres you can get the query plan with
EXPLAIN ANALYZE select ...
You can leave out the ANALYZE if you can't wait until the query completes but
it will have much less information to diagnosis any problems.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Here is what I got from MySQL:
mysql> explain Select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf:' AND A0.Obj='Uv::http://ww
w.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1:' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type:' AND A1.Obj='Uv::http://metastore.ingenta
.com/ns/structure/Article:' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage:' AND A2.GraphID=1;
+----+-------------+-------+------+------------------------------------+-------------------+---------+-------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------------------+-------------------+---------+-------------------------+------+-------------+
| 1 | SIMPLE | A0 | ref | jena_g1t1_stmtXSP,jena_g1t1_stmtXO | jena_g1t1_stmtXO | 102 | const | 628 | Using where |
| 1 | SIMPLE | A1 | ref | jena_g1t1_stmtXSP,jena_g1t1_stmtXO | jena_g1t1_stmtXSP | 204 | ingentadb.A0.Subj,const | 1 | Using where |
| 1 | SIMPLE | A2 | ref | jena_g1t1_stmtXSP | jena_g1t1_stmtXSP | 204 | ingentadb.A0.Subj,const | 1 | Using where |
+----+-------------+-------+------+------------------------------------+-------------------+---------+-------------------------+------+-------------+
3 rows in set (0.00 sec)
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
> I am doing a performance comparison between running Jena with MySQL and > Postgres. I used the 8.3-community version of Postgres and MySQL 5.0.67. I > have run several queries to both MySQL and Postgres and all of them took > similar amount of time to execute except one. For the following query to a > table having 10,003,728 rows, MySQL takes 0.11 seconds to return results > whereas Postgres takes like 1 hour and 20 minutes! > > Query: > > select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, > jena_g1t1_stmt A2 Where > A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND > A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' > AND A0.GraphID=1 AND A0.Subj=A1.Subj AND > A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND > A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND > A1.GraphID=1 AND A0.Subj=A2.Subj AND > A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND > A2.GraphID=1; > > Table: > > Table "public.jena_g1t1_stmt" > Column | Type | Modifiers > ---------+-------------------- > ----+----------- > subj | character varying(250) | not null > prop | character varying(250) | not null > obj | character varying(250) | not null > graphid | integer | > Indexes: > "jena_g1t1_stmt_ixo" btree (obj) > "jena_g1t1_stmt_ixsp" btree (subj, prop) Isn't it missing an index on the column prop? select ... where A0.Prop='foo' and ... -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare
The result set should have 31 rows, that is correct.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Mon, Feb 23, 2009 at 7:53 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Feb 23, 2009 at 6:24 PM, Farhan Husain <russoue@gmail.com> wrote:
This sort here:Seems to be the problem. There are a few things that seem odd, the
> -> Sort (cost=565372.46..568084.16 rows=1084680 width=74) (actual
> time=5410606.604..5410606.628 rows=31 loops=1)
> Sort Key: a1.subj
> Sort Method: quicksort Memory: 489474kB
> -> Seq Scan on jena_g1t1_stmt a1 (cost=0.00..456639.59
> rows=1084680 width=74) (actual time=0.043..44005.780 rows=3192000 loops=1)
first is that it estimates it will return 1M ros, but returns only 31.
The other is that sorting 31 rows is taking 5410606 milliseconds.
My first guess would be to crank up the statistics on a1.subj to a few
hundred (going up to a thousand if necessary) re-analyzing and seeing
if the query plan changes.
I'm not expert enough on explain analyze to offer any more.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Tue, Feb 24, 2009 at 1:28 AM, Claus Guttesen <kometen@gmail.com> wrote:
Isn't it missing an index on the column prop?> I am doing a performance comparison between running Jena with MySQL and
> Postgres. I used the 8.3-community version of Postgres and MySQL 5.0.67. I
> have run several queries to both MySQL and Postgres and all of them took
> similar amount of time to execute except one. For the following query to a
> table having 10,003,728 rows, MySQL takes 0.11 seconds to return results
> whereas Postgres takes like 1 hour and 20 minutes!
>
> Query:
>
> select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1,
> jena_g1t1_stmt A2 Where
> A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND
> A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'
> AND A0.GraphID=1 AND A0.Subj=A1.Subj AND
> A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND
> A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND
> A1.GraphID=1 AND A0.Subj=A2.Subj AND
> A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND
> A2.GraphID=1;
>
> Table:
>
> Table "public.jena_g1t1_stmt"
> Column | Type | Modifiers
> ---------+--------------------
> ----+-----------
> subj | character varying(250) | not null
> prop | character varying(250) | not null
> obj | character varying(250) | not null
> graphid | integer |
> Indexes:
> "jena_g1t1_stmt_ixo" btree (obj)
> "jena_g1t1_stmt_ixsp" btree (subj, prop)
select ... where A0.Prop='foo' and ...
--
regards
Claus
When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.
Shakespeare
Can you please elaborate a bit?
Thanks,
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
>> > Query: >> > >> > select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, >> > jena_g1t1_stmt A2 Where >> > A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND >> > >> > A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' >> > AND A0.GraphID=1 AND A0.Subj=A1.Subj AND >> > A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND >> > A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND >> > A1.GraphID=1 AND A0.Subj=A2.Subj AND >> > A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' >> > AND >> > A2.GraphID=1; >> > >> > Table: >> > >> > Table "public.jena_g1t1_stmt" >> > Column | Type | Modifiers >> > ---------+-------------------- >> > ----+----------- >> > subj | character varying(250) | not null >> > prop | character varying(250) | not null >> > obj | character varying(250) | not null >> > graphid | integer | >> > Indexes: >> > "jena_g1t1_stmt_ixo" btree (obj) >> > "jena_g1t1_stmt_ixsp" btree (subj, prop) >> >> Isn't it missing an index on the column prop? >> >> select ... where A0.Prop='foo' and ... >> -- > Can you please elaborate a bit? I thought that A0.Prop would ignore the composite index created on the columns subj and prop but this does not seem to be the case. -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare
>> Can you please elaborate a bit? > > I thought that A0.Prop would ignore the composite index created on the > columns subj and prop but this does not seem to be the case. Yeah, I think you're barking up the wrong tree here. I think Tom had the correct diagnosis - what do you get from "show work_mem"? What kind of machine are you running this on? If it's a UNIX-ish machine, what do you get from "free -m"and "uname -a"? ...Robert
On Wed, Feb 25, 2009 at 12:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
You still haven't answered the work_mem question, and you probably
want to copy the list, rather than just sending this to me.
...Robert
On Wed, Feb 25, 2009 at 1:34 PM, Farhan Husain <russoue@gmail.com> wrote:
>
>
> On Tue, Feb 24, 2009 at 8:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> >> Can you please elaborate a bit?
>> >
>> > I thought that A0.Prop would ignore the composite index created on the
>> > columns subj and prop but this does not seem to be the case.
>>
>> Yeah, I think you're barking up the wrong tree here. I think Tom had
>> the correct diagnosis - what do you get from "show work_mem"?
>>
>> What kind of machine are you running this on? If it's a UNIX-ish
>> machine, what do you get from "free -m"and "uname -a"?
>>
>> ...Robert
>
> Here is the machine info:
>
> Machine: SunOS 5.10 Generic_127111-11 sun4u sparc SUNW, Sun-Fire-880
> Memory: 4 GB
> Number of physical processors: 2
>
>
> --
> Mohammad Farhan Husain
> Research Assistant
> Department of Computer Science
> Erik Jonsson School of Engineering and Computer Science
> University of Texas at Dallas
>
Did you mean the work_mem field in the config file?
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
Just start up psql and type: show work_mem; (You could look in the config file too I suppose.) ...Robert On Wed, Feb 25, 2009 at 1:53 PM, Farhan Husain <russoue@gmail.com> wrote: > > > On Wed, Feb 25, 2009 at 12:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> You still haven't answered the work_mem question, and you probably >> want to copy the list, rather than just sending this to me. >> >> ...Robert >> >> On Wed, Feb 25, 2009 at 1:34 PM, Farhan Husain <russoue@gmail.com> wrote: >> > >> > >> > On Tue, Feb 24, 2009 at 8:21 PM, Robert Haas <robertmhaas@gmail.com> >> > wrote: >> >> >> >> >> Can you please elaborate a bit? >> >> > >> >> > I thought that A0.Prop would ignore the composite index created on >> >> > the >> >> > columns subj and prop but this does not seem to be the case. >> >> >> >> Yeah, I think you're barking up the wrong tree here. I think Tom had >> >> the correct diagnosis - what do you get from "show work_mem"? >> >> >> >> What kind of machine are you running this on? If it's a UNIX-ish >> >> machine, what do you get from "free -m"and "uname -a"? >> >> >> >> ...Robert >> > >> > Here is the machine info: >> > >> > Machine: SunOS 5.10 Generic_127111-11 sun4u sparc SUNW, Sun-Fire-880 >> > Memory: 4 GB >> > Number of physical processors: 2 >> > >> > >> > -- >> > Mohammad Farhan Husain >> > Research Assistant >> > Department of Computer Science >> > Erik Jonsson School of Engineering and Computer Science >> > University of Texas at Dallas >> > > > Did you mean the work_mem field in the config file? > > > -- > Mohammad Farhan Husain > Research Assistant > Department of Computer Science > Erik Jonsson School of Engineering and Computer Science > University of Texas at Dallas >
On Wed, Feb 25, 2009 at 12:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Just start up psql and type:
show work_mem;
(You could look in the config file too I suppose.)
...Robert
On Wed, Feb 25, 2009 at 1:53 PM, Farhan Husain <russoue@gmail.com> wrote:
>
>
> On Wed, Feb 25, 2009 at 12:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> You still haven't answered the work_mem question, and you probably
>> want to copy the list, rather than just sending this to me.
>>
>> ...Robert
>>
>> On Wed, Feb 25, 2009 at 1:34 PM, Farhan Husain <russoue@gmail.com> wrote:
>> >
>> >
>> > On Tue, Feb 24, 2009 at 8:21 PM, Robert Haas <robertmhaas@gmail.com>
>> > wrote:
>> >>
>> >> >> Can you please elaborate a bit?
>> >> >
>> >> > I thought that A0.Prop would ignore the composite index created on
>> >> > the
>> >> > columns subj and prop but this does not seem to be the case.
>> >>
>> >> Yeah, I think you're barking up the wrong tree here. I think Tom had
>> >> the correct diagnosis - what do you get from "show work_mem"?
>> >>
>> >> What kind of machine are you running this on? If it's a UNIX-ish
>> >> machine, what do you get from "free -m"and "uname -a"?
>> >>
>> >> ...Robert
>> >
>> > Here is the machine info:
>> >
>> > Machine: SunOS 5.10 Generic_127111-11 sun4u sparc SUNW, Sun-Fire-880
>> > Memory: 4 GB
>> > Number of physical processors: 2
>> >
>> >
>> > --
>> > Mohammad Farhan Husain
>> > Research Assistant
>> > Department of Computer Science
>> > Erik Jonsson School of Engineering and Computer Science
>> > University of Texas at Dallas
>> >
>
> Did you mean the work_mem field in the config file?
>
>
> --
> Mohammad Farhan Husain
> Research Assistant
> Department of Computer Science
> Erik Jonsson School of Engineering and Computer Science
> University of Texas at Dallas
>
I did it, it does not show anything. Here is what I have got from the config file:
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
# - Memory -
shared_buffers = 32MB # min 128kB or max_connections*16kB
# (change requires restart)
temp_buffers = 1024MB # min 800kB
#max_prepared_transactions = 5 # can be 0 or more
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 1792MB # min 64kB
#maintenance_work_mem = 16MB # min 1MB
#max_stack_depth = 32MB # min 100kB
# - Free Space Map -
max_fsm_pages = 204800 # min max_fsm_relations*16, 6 bytes each
# (change requires restart)
#max_fsm_relations = 1000 # min 100, ~70 bytes each
# (change requires restart)
# - Kernel Resource Usage -
#max_files_per_process = 1000 # min 25
# (change requires restart)
#shared_preload_libraries = '' # (change requires restart)
# - Cost-Based Vacuum Delay -
#vacuum_cost_delay = 0 # 0-1000 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits
# - Background Writer -
#bgwriter_delay = 200ms # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round
Please note that this (1792MB) is the highest that I could set for work_mem.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
> Please note that this (1792MB) is the highest that I could set for work_mem. Yeah, that's almost certainly part of your problem. You need to make that number MUCH smaller. You probably want a value like 1MB or 5MB or maybe if you have really a lot of memory 20MB. That's insanely high. ...Robert
On Wed, Feb 25, 2009 at 12:05 PM, Farhan Husain <russoue@gmail.com> wrote: > > On Wed, Feb 25, 2009 at 12:58 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> Just start up psql and type: >> >> show work_mem; > > I did it, it does not show anything. Did you remember the ; symbol? > Here is what I have got from the config > file: > > shared_buffers = 32MB # min 128kB or max_connections*16kB > # (change requires restart) Assuming your machine has more than a few hundred megs of ram in it, this is kinda small. Generally setting it to 10 to 25% of total memory is about right. > work_mem = 1792MB # min 64kB That's crazy high. work_mem is the amount of memory a single sort can use. Each query can have multiple sorts. So, if you have 10 users running 10 sorts, you could use 100*1792MB memory. Look to set it into the 1 to 16Meg. If testing shows a few queries can do better with more work_mem, then look at setting it higher for just that one connection.
On Wed, Feb 25, 2009 at 1:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Please note that this (1792MB) is the highest that I could set for work_mem.Yeah, that's almost certainly part of your problem.
You need to make that number MUCH smaller. You probably want a value
like 1MB or 5MB or maybe if you have really a lot of memory 20MB.
That's insanely high.
...Robert
Initially, it was the default value (32MB). Later I played with that value thinking that it might improve the performance. But all the values resulted in same amount of time.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Wed, Feb 25, 2009 at 3:44 PM, Farhan Husain <russoue@gmail.com> wrote: > Initially, it was the default value (32MB). Later I played with that value > thinking that it might improve the performance. But all the values resulted > in same amount of time. Well, if you set it back to what we consider to be a reasonable value, rerun EXPLAIN ANALYZE, and post that plan, it might help us tell you what to do next. ...Robert
On Wed, Feb 25, 2009 at 3:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 25, 2009 at 3:44 PM, Farhan Husain <russoue@gmail.com> wrote:Well, if you set it back to what we consider to be a reasonable value,
> Initially, it was the default value (32MB). Later I played with that value
> thinking that it might improve the performance. But all the values resulted
> in same amount of time.
rerun EXPLAIN ANALYZE, and post that plan, it might help us tell you
what to do next.
...Robert
Right now I am running the query again with 32MB work_mem. It is taking a long time as before. However, I have kept the following values unchanged:
shared_buffers = 32MB # min 128kB or max_connections*16kB
temp_buffers = 1024MB # min 800kB
Do you think I should change them to something else?
Thanks,
Do you think I should change them to something else?
Thanks,
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Wed, Feb 25, 2009 at 2:32 PM, Farhan Husain <russoue@gmail.com> wrote: > > On Wed, Feb 25, 2009 at 3:30 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Wed, Feb 25, 2009 at 3:44 PM, Farhan Husain <russoue@gmail.com> wrote: >> > Initially, it was the default value (32MB). Later I played with that >> > value >> > thinking that it might improve the performance. But all the values >> > resulted >> > in same amount of time. >> >> Well, if you set it back to what we consider to be a reasonable value, >> rerun EXPLAIN ANALYZE, and post that plan, it might help us tell you >> what to do next. >> >> ...Robert > > Right now I am running the query again with 32MB work_mem. It is taking a > long time as before. However, I have kept the following values unchanged: > > shared_buffers = 32MB # min 128kB or max_connections*16kB That's REALLY small for pgsql. Assuming your machine has at least 1G of ram, I'd set it to 128M to 256M as a minimum.
On Wed, Feb 25, 2009 at 4:32 PM, Farhan Husain <russoue@gmail.com> wrote: > On Wed, Feb 25, 2009 at 3:30 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Feb 25, 2009 at 3:44 PM, Farhan Husain <russoue@gmail.com> wrote: >> > Initially, it was the default value (32MB). Later I played with that >> > value >> > thinking that it might improve the performance. But all the values >> > resulted >> > in same amount of time. >> >> Well, if you set it back to what we consider to be a reasonable value, >> rerun EXPLAIN ANALYZE, and post that plan, it might help us tell you >> what to do next. >> >> ...Robert > > Right now I am running the query again with 32MB work_mem. It is taking a > long time as before. However, I have kept the following values unchanged: > > shared_buffers = 32MB # min 128kB or max_connections*16kB > > temp_buffers = 1024MB # min 800kB > > Do you think I should change them to something else? It would probably be good to change them, but I don't think it's going to fix the problem you're having with this query. ...Robert
On Wed, Feb 25, 2009 at 3:35 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Wed, Feb 25, 2009 at 2:32 PM, Farhan Husain <russoue@gmail.com> wrote:That's REALLY small for pgsql. Assuming your machine has at least 1G
>
> On Wed, Feb 25, 2009 at 3:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Feb 25, 2009 at 3:44 PM, Farhan Husain <russoue@gmail.com> wrote:
>> > Initially, it was the default value (32MB). Later I played with that
>> > value
>> > thinking that it might improve the performance. But all the values
>> > resulted
>> > in same amount of time.
>>
>> Well, if you set it back to what we consider to be a reasonable value,
>> rerun EXPLAIN ANALYZE, and post that plan, it might help us tell you
>> what to do next.
>>
>> ...Robert
>
> Right now I am running the query again with 32MB work_mem. It is taking a
> long time as before. However, I have kept the following values unchanged:
>
> shared_buffers = 32MB # min 128kB or max_connections*16kB
of ram, I'd set it to 128M to 256M as a minimum.
As I wrote in a previous email, I had the value set to 1792MB (the highest I could set) and had the same execution time. This value is not helping me to bring down the execution time.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
What is random_page_cost set to? You could try to lower it to 1.5 if set higher. -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare
>> > shared_buffers = 32MB # min 128kB or >> > max_connections*16kB >> >> That's REALLY small for pgsql. Assuming your machine has at least 1G >> of ram, I'd set it to 128M to 256M as a minimum. > > As I wrote in a previous email, I had the value set to 1792MB (the highest I > could set) and had the same execution time. This value is not helping me to > bring down the execution time. No, you increased work_mem, not shared_buffers. You might want to go and read the documentation: http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html But at any rate, the large work_mem was producing a very strange plan. It may help to see what the system does without that setting. But changing shared_buffers will not change the plan, so let's not worry about that right now. ...Robert
It was only after I got this high execution time when I started to look into the configuration file and change those values. I tried several combinations in which all those values were higher than the default values. I got no improvement in runtime. The machine postgres is running on has 4 GB of RAM.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Wed, Feb 25, 2009 at 3:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > shared_buffers = 32MB # min 128kB orNo, you increased work_mem, not shared_buffers. You might want to go
>> > max_connections*16kB
>>
>> That's REALLY small for pgsql. Assuming your machine has at least 1G
>> of ram, I'd set it to 128M to 256M as a minimum.
>
> As I wrote in a previous email, I had the value set to 1792MB (the highest I
> could set) and had the same execution time. This value is not helping me to
> bring down the execution time.
and read the documentation:
http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html
But at any rate, the large work_mem was producing a very strange plan.
It may help to see what the system does without that setting. But
changing shared_buffers will not change the plan, so let's not worry
about that right now.
...Robert
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Wed, Feb 25, 2009 at 2:38 PM, Farhan Husain <russoue@gmail.com> wrote: > > > On Wed, Feb 25, 2009 at 3:35 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Wed, Feb 25, 2009 at 2:32 PM, Farhan Husain <russoue@gmail.com> wrote: >> > >> > On Wed, Feb 25, 2009 at 3:30 PM, Robert Haas <robertmhaas@gmail.com> >> > wrote: >> >> >> >> On Wed, Feb 25, 2009 at 3:44 PM, Farhan Husain <russoue@gmail.com> >> >> wrote: >> >> > Initially, it was the default value (32MB). Later I played with that >> >> > value >> >> > thinking that it might improve the performance. But all the values >> >> > resulted >> >> > in same amount of time. >> >> >> >> Well, if you set it back to what we consider to be a reasonable value, >> >> rerun EXPLAIN ANALYZE, and post that plan, it might help us tell you >> >> what to do next. >> >> >> >> ...Robert >> > >> > Right now I am running the query again with 32MB work_mem. It is taking >> > a >> > long time as before. However, I have kept the following values >> > unchanged: >> > >> > shared_buffers = 32MB # min 128kB or >> > max_connections*16kB >> >> That's REALLY small for pgsql. Assuming your machine has at least 1G >> of ram, I'd set it to 128M to 256M as a minimum. > > As I wrote in a previous email, I had the value set to 1792MB (the highest I > could set) and had the same execution time. This value is not helping me to > bring down the execution time. No, that was work_mem. This is shared_buffers.
On Wed, Feb 25, 2009 at 3:55 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
No, that was work_mem. This is shared_buffers.On Wed, Feb 25, 2009 at 2:38 PM, Farhan Husain <russoue@gmail.com> wrote:
>
>
> On Wed, Feb 25, 2009 at 3:35 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Wed, Feb 25, 2009 at 2:32 PM, Farhan Husain <russoue@gmail.com> wrote:
>> >
>> > On Wed, Feb 25, 2009 at 3:30 PM, Robert Haas <robertmhaas@gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Feb 25, 2009 at 3:44 PM, Farhan Husain <russoue@gmail.com>
>> >> wrote:
>> >> > Initially, it was the default value (32MB). Later I played with that
>> >> > value
>> >> > thinking that it might improve the performance. But all the values
>> >> > resulted
>> >> > in same amount of time.
>> >>
>> >> Well, if you set it back to what we consider to be a reasonable value,
>> >> rerun EXPLAIN ANALYZE, and post that plan, it might help us tell you
>> >> what to do next.
>> >>
>> >> ...Robert
>> >
>> > Right now I am running the query again with 32MB work_mem. It is taking
>> > a
>> > long time as before. However, I have kept the following values
>> > unchanged:
>> >
>> > shared_buffers = 32MB # min 128kB or
>> > max_connections*16kB
>>
>> That's REALLY small for pgsql. Assuming your machine has at least 1G
>> of ram, I'd set it to 128M to 256M as a minimum.
>
> As I wrote in a previous email, I had the value set to 1792MB (the highest I
> could set) and had the same execution time. This value is not helping me to
> bring down the execution time.
Oh, sorry for the confusion. I will change the shared_buffer once the current run finishes.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
Wed, 25 Feb 2009 15:43:49 -0600 -n Farhan Husain <russoue@gmail.com> írta: OK, you have two options: 1. Learn to read carefully, and differentiate between work_mem and shared_buffers options. Lower work_mem and rise shared_buffers as others wrote. 2. Leave Postgresql alone and go for Oracle or Microsoft SQL... Rgds, Akos > It was only after I got this high execution time when I started to > look into the configuration file and change those values. I tried > several combinations in which all those values were higher than the > default values. I got no improvement in runtime. The machine postgres > is running on has 4 GB of RAM. > > On Wed, Feb 25, 2009 at 3:40 PM, Robert Haas <robertmhaas@gmail.com> > wrote: > > > >> > shared_buffers = 32MB # min 128kB or > > >> > max_connections*16kB > > >> > > >> That's REALLY small for pgsql. Assuming your machine has at > > >> least 1G of ram, I'd set it to 128M to 256M as a minimum. > > > > > > As I wrote in a previous email, I had the value set to 1792MB (the > > highest I > > > could set) and had the same execution time. This value is not > > > helping me > > to > > > bring down the execution time. > > > > No, you increased work_mem, not shared_buffers. You might want to > > go and read the documentation: > > > > > > http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html > > > > But at any rate, the large work_mem was producing a very strange > > plan. It may help to see what the system does without that > > setting. But changing shared_buffers will not change the plan, so > > let's not worry about that right now. > > > > ...Robert > > > > > -- Üdvözlettel, Gábriel Ákos -=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353 |Mobil:+36209278894 =-
I am trying to find the reason of the problem so going to Oracle or something else is not the solution. I tried with several combinations of those parameters before posting the problem here. I have read http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html before and I think I understood what it said.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
2009/2/25 Akos Gabriel <akos.gabriel@i-logic.hu>
Wed, 25 Feb 2009 15:43:49 -0600 -n
Farhan Husain <russoue@gmail.com> írta:
OK, you have two options:
1. Learn to read carefully, and differentiate between work_mem and
shared_buffers options. Lower work_mem and rise shared_buffers as
others wrote.
2. Leave Postgresql alone and go for Oracle or Microsoft SQL...
Rgds,
AkosÜdvözlettel,
> It was only after I got this high execution time when I started to
> look into the configuration file and change those values. I tried
> several combinations in which all those values were higher than the
> default values. I got no improvement in runtime. The machine postgres
> is running on has 4 GB of RAM.
>
> On Wed, Feb 25, 2009 at 3:40 PM, Robert Haas <robertmhaas@gmail.com>
> wrote:
>
> > >> > shared_buffers = 32MB # min 128kB or
> > >> > max_connections*16kB
> > >>
> > >> That's REALLY small for pgsql. Assuming your machine has at
> > >> least 1G of ram, I'd set it to 128M to 256M as a minimum.
> > >
> > > As I wrote in a previous email, I had the value set to 1792MB (the
> > highest I
> > > could set) and had the same execution time. This value is not
> > > helping me
> > to
> > > bring down the execution time.
> >
> > No, you increased work_mem, not shared_buffers. You might want to
> > go and read the documentation:
> >
> >
> > http://www.postgresql.org/docs/current/interactive/runtime-config-resource.html
> >
> > But at any rate, the large work_mem was producing a very strange
> > plan. It may help to see what the system does without that
> > setting. But changing shared_buffers will not change the plan, so
> > let's not worry about that right now.
> >
> > ...Robert
> >
>
>
>
--
Gábriel Ákos
-=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=-
-=Tel/fax:+3612367353 |Mobil:+36209278894 =-
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
>>> Farhan Husain <russoue@gmail.com> wrote: > The machine postgres is running on has 4 GB of RAM. In addition to the other suggestions, you should be sure that effective_cache_size is set to a reasonable value, which would probably be somewhere in the neighborhood of '3GB'. This doesn't affect actual RAM allocation, but gives the optimizer a rough idea how much data is going to be kept in cache, between both the PostgreSQL shared_memory setting and the OS cache. It can make better choices with more accurate information. -Kevin
On Wed, Feb 25, 2009 at 4:10 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>>> Farhan Husain <russoue@gmail.com> wrote:> The machine postgres is running on has 4 GB of RAM.In addition to the other suggestions, you should be sure that
effective_cache_size is set to a reasonable value, which would
probably be somewhere in the neighborhood of '3GB'. This doesn't
affect actual RAM allocation, but gives the optimizer a rough idea how
much data is going to be kept in cache, between both the PostgreSQL
shared_memory setting and the OS cache. It can make better choices
with more accurate information.
-Kevin
Here is the latest output:
ingentadb=# EXPLAIN ANALYZE select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=799852.37..812767.47 rows=733195 width=134) (actual time=5941553.710..5941569.192 rows=30 loops=1)
Merge Cond: ((a0.subj)::text = (a1.subj)::text)
-> Sort (cost=89884.41..89964.28 rows=31949 width=208) (actual time=243.711..243.731 rows=30 loops=1)
Sort Key: a0.subj
Sort Method: quicksort Memory: 24kB
-> Nested Loop (cost=0.00..84326.57 rows=31949 width=208) (actual time=171.255..232.765 rows=30 loops=1)
-> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0 (cost=0.00..5428.34 rows=487 width=74) (actual time=96.735..97.070 rows=30 loops=1)
Index Cond: ((obj)::text = 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
Filter: (((prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid = 1))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2 (cost=0.00..161.37 rows=51 width=134) (actual time=4.513..4.518 rows=1 loops=30)
Index Cond: (((a2.subj)::text = (a0.subj)::text) AND ((a2.prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
Filter: (a2.graphid = 1)
-> Materialize (cost=709967.96..723526.46 rows=1084680 width=74) (actual time=5941309.876..5941318.552 rows=31 loops=1)
-> Sort (cost=709967.96..712679.66 rows=1084680 width=74) (actual time=5941309.858..5941318.488 rows=31 loops=1)
Sort Key: a1.subj
Sort Method: external merge Disk: 282480kB
-> Seq Scan on jena_g1t1_stmt a1 (cost=0.00..456639.59 rows=1084680 width=74) (actual time=0.054..44604.597 rows=3192000 loops=1)
Filter: ((graphid = 1) AND ((prop)::text = 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text) AND ((obj)::text = 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text))
Total runtime: 5941585.248 ms
(19 rows)
ingentadb=# show work_mem;
work_mem
----------
1MB
(1 row)
ingentadb=# show shared_buffers;
shared_buffers
----------------
32MB
(1 row)
ingentadb=# show temp_buffers;
temp_buffers
--------------
131072
(1 row)
The execution time has not improved. I am going to increase the shared_buffers now keeping the work_mem same.
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
>>> Farhan Husain <russoue@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov wrote: >> >>> Farhan Husain <russoue@gmail.com> wrote: >> > The machine postgres is running on has 4 GB of RAM. >> >> In addition to the other suggestions, you should be sure that >> effective_cache_size is set to a reasonable value, which would >> probably be somewhere in the neighborhood of '3GB'. > The execution time has not improved. I am going to increase the > shared_buffers now keeping the work_mem same. Increasing shared_buffers is good, but it has been mentioned that this will not change the plan, which currently scans and sorts the whole table for a1. Nobody will be surprised when you report minimal change, if any. If you have not changed effective_cache_size (be sure not to confuse this with any of the other configuration values) it will think you only have 128MB of cache, which will be off by a factor of about 24 from reality. Also, I'm going to respectfully differ with some of the other posts on the best setting for work_mem. Most benchmarks I've run and can remember seeing posted found best performance for this at somewhere between 16MB and 32MB. You do have to be careful if you have a large number of concurrent queries, however, and keep it lower. In most such cases, though, you're better off using a connection pool to limit concurrent queries instead. -Kevin
I will second Kevin’s suggestion. Unless you think you will have more than a few dozen concurrent queries, start with work_mem around 32MB.
For the query here, a very large work_mem might help it hash join depending on the data... But that’s not the real problem here.
The real problem is that it does a huge scan of all of the a1 table, and sorts it. Its pretty clear that this table has incorrect statistics. It thinks that it will get about 1 million rows back in the scan, but it is actually 3 million in the scan.
Crank up the statistics target on that table from the default to at least 100, perhaps even 1000. This is a large table, the default statistics target of 10 is not good for large tables with skewed column data. Those to try increasing the target on are the columns filtered in the explain: graphid, prop, and obj. Then run vacuum analzye on that table (a1). The planner should then have better stats and will likely be able to use a better plan for the join.
The other tables involved in the join also seem to have bad statistics. You might just take the easiest solution and change the global statistics target and vacuum analyze the tables involved:
set default_statistics_target = 50;
vacuum analyze jena_g1t1_stmt ;
(test the query)
Repeat for several values of the default statistics target. You can run “explain” before running the actual query, to see if the plan changed. If it has not, the time will not likely change.
The max value for the statistics target is 1000, which makes analyzing and query planning slower, but more accurate. In most cases, dramatic differences can happen between the default of 10 and values of 25 or 50. Sometimes, you have to go into the hundreds, and it is safer to do this on a per-column basis once you get to larger values.
For larger database, I recommend increasing the default to 20 to 40 and re-analyzing all the tables.
On 2/25/09 3:11 PM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
For the query here, a very large work_mem might help it hash join depending on the data... But that’s not the real problem here.
The real problem is that it does a huge scan of all of the a1 table, and sorts it. Its pretty clear that this table has incorrect statistics. It thinks that it will get about 1 million rows back in the scan, but it is actually 3 million in the scan.
Crank up the statistics target on that table from the default to at least 100, perhaps even 1000. This is a large table, the default statistics target of 10 is not good for large tables with skewed column data. Those to try increasing the target on are the columns filtered in the explain: graphid, prop, and obj. Then run vacuum analzye on that table (a1). The planner should then have better stats and will likely be able to use a better plan for the join.
The other tables involved in the join also seem to have bad statistics. You might just take the easiest solution and change the global statistics target and vacuum analyze the tables involved:
set default_statistics_target = 50;
vacuum analyze jena_g1t1_stmt ;
(test the query)
Repeat for several values of the default statistics target. You can run “explain” before running the actual query, to see if the plan changed. If it has not, the time will not likely change.
The max value for the statistics target is 1000, which makes analyzing and query planning slower, but more accurate. In most cases, dramatic differences can happen between the default of 10 and values of 25 or 50. Sometimes, you have to go into the hundreds, and it is safer to do this on a per-column basis once you get to larger values.
For larger database, I recommend increasing the default to 20 to 40 and re-analyzing all the tables.
On 2/25/09 3:11 PM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>>> Farhan Husain <russoue@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov wrote:
>> >>> Farhan Husain <russoue@gmail.com> wrote:
>> > The machine postgres is running on has 4 GB of RAM.
>>
>> In addition to the other suggestions, you should be sure that
>> effective_cache_size is set to a reasonable value, which would
>> probably be somewhere in the neighborhood of '3GB'.
> The execution time has not improved. I am going to increase the
> shared_buffers now keeping the work_mem same.
Increasing shared_buffers is good, but it has been mentioned that this
will not change the plan, which currently scans and sorts the whole
table for a1. Nobody will be surprised when you report minimal
change, if any. If you have not changed effective_cache_size (be sure
not to confuse this with any of the other configuration values) it
will think you only have 128MB of cache, which will be off by a factor
of about 24 from reality.
Also, I'm going to respectfully differ with some of the other posts on
the best setting for work_mem. Most benchmarks I've run and can
remember seeing posted found best performance for this at somewhere
between 16MB and 32MB. You do have to be careful if you have a large
number of concurrent queries, however, and keep it lower. In most
such cases, though, you're better off using a connection pool to limit
concurrent queries instead.
-Kevin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
> Here is the latest output: > > ingentadb=# EXPLAIN ANALYZE select A0.Subj, A2.Obj From jena_g1t1_stmt A0, > jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where > A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND > A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' > AND A0.GraphID=1 AND A0.Subj=A1.Subj AND > A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND > A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND > A1.GraphID=1 AND A0.Subj=A2.Subj AND > A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND > A2.GraphID=1; > > QUERY > PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=799852.37..812767.47 rows=733195 width=134) (actual > time=5941553.710..5941569.192 rows=30 loops=1) > Merge Cond: ((a0.subj)::text = (a1.subj)::text) > -> Sort (cost=89884.41..89964.28 rows=31949 width=208) (actual > time=243.711..243.731 rows=30 loops=1) > Sort Key: a0.subj > Sort Method: quicksort Memory: 24kB > -> Nested Loop (cost=0.00..84326.57 rows=31949 width=208) (actual > time=171.255..232.765 rows=30 loops=1) > -> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0 > (cost=0.00..5428.34 rows=487 width=74) (actual time=96.735..97.070 rows=30 > loops=1) > Index Cond: ((obj)::text = > 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text) > Filter: (((prop)::text = > 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND > (graphid = 1)) > -> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt > a2 (cost=0.00..161.37 rows=51 width=134) (actual time=4.513..4.518 rows=1 > loops=30) > Index Cond: (((a2.subj)::text = (a0.subj)::text) AND > ((a2.prop)::text = > 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text)) > Filter: (a2.graphid = 1) > -> Materialize (cost=709967.96..723526.46 rows=1084680 width=74) > (actual time=5941309.876..5941318.552 rows=31 loops=1) > -> Sort (cost=709967.96..712679.66 rows=1084680 width=74) (actual > time=5941309.858..5941318.488 rows=31 loops=1) > Sort Key: a1.subj > Sort Method: external merge Disk: 282480kB > -> Seq Scan on jena_g1t1_stmt a1 (cost=0.00..456639.59 > rows=1084680 width=74) (actual time=0.054..44604.597 rows=3192000 loops=1) > Filter: ((graphid = 1) AND ((prop)::text = > 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text) AND > ((obj)::text = > 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text)) > Total runtime: 5941585.248 ms > (19 rows) Can you do this: select * from pg_statistic where starelid = 'jena_g1t1_stmt'::regclass; Thanks, ...Robert
> The execution time has not improved. I am going to increase the > shared_buffers now keeping the work_mem same. Have you performed a vacuum analyze? -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare
Thu, 26 Feb 2009 09:00:07 +0100 -n Claus Guttesen <kometen@gmail.com> írta: > > The execution time has not improved. I am going to increase the > > shared_buffers now keeping the work_mem same. > > Have you performed a vacuum analyze? > and reindex -- Üdvözlettel, Gábriel Ákos -=E-Mail :akos.gabriel@i-logic.hu|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353 |Mobil:+36209278894 =-
On Wed, Feb 25, 2009 at 4:10 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>>> Farhan Husain <russoue@gmail.com> wrote:> The machine postgres is running on has 4 GB of RAM.In addition to the other suggestions, you should be sure that
effective_cache_size is set to a reasonable value, which would
probably be somewhere in the neighborhood of '3GB'. This doesn't
affect actual RAM allocation, but gives the optimizer a rough idea how
much data is going to be kept in cache, between both the PostgreSQL
shared_memory setting and the OS cache. It can make better choices
with more accurate information.
-Kevin
I reran the query with new values of work_mem, effective_cache_size and shared_buffers. There is no change in runtime. Here is the output:
ingentadb=# show work_mem;
work_mem
----------
16MB
(1 row)
ingentadb=# show shared_buffers;
shared_buffers
----------------
64MB
(1 row)
ingentadb=# show effective_cache_size;
effective_cache_size
----------------------
2GB
(1 row)
ingentadb=# EXPLAIN ANALYZE select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=698313.99..711229.09 rows=733195 width=134) (actual time=7659407.195..7659418.630 rows=30 loops=1)
Merge Cond: ((a0.subj)::text = (a1.subj)::text)
-> Sort (cost=84743.03..84822.90 rows=31949 width=208) (actual time=77.269..77.300 rows=30 loops=1)
Sort Key: a0.subj
Sort Method: quicksort Memory: 24kB
-> Nested Loop (cost=0.00..82352.69 rows=31949 width=208) (actual time=4.821..66.390 rows=30 loops=1)
-> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0 (cost=0.00..5428.34 rows=487 width=74) (actual time=2.334..2.675 rows=30 loops=1)
Index Cond: ((obj)::text = 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
Filter: (((prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid = 1))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2 (cost=0.00..157.32 rows=51 width=134) (actual time=2.114..2.119 rows=1 loops=30)
Index Cond: (((a2.subj)::text = (a0.subj)::text) AND ((a2.prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
Filter: (a2.graphid = 1)
-> Materialize (cost=613570.96..627129.46 rows=1084680 width=74) (actual time=7659329.799..7659334.251 rows=31 loops=1)
-> Sort (cost=613570.96..616282.66 rows=1084680 width=74) (actual time=7659329.781..7659334.185 rows=31 loops=1)
Sort Key: a1.subj
Sort Method: external merge Disk: 282480kB
-> Seq Scan on jena_g1t1_stmt a1 (cost=0.00..456639.59 rows=1084680 width=74) (actual time=0.042..46465.020 rows=3192000 loops=1)
Filter: ((graphid = 1) AND ((prop)::text = 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text) AND ((obj)::text = 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text))
Total runtime: 7659420.128 ms
(19 rows)
I will try out other suggestions posted yesterday now.
Thanks,
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
On Wed, Feb 25, 2009 at 6:07 PM, Scott Carey <scott@richrelevance.com> wrote:
Thanks a lot Scott! I think that was the problem. I just changed the default statistics target to 50 and ran explain. The plan changed and I ran explain analyze. Now it takes a fraction of a second!I will second Kevin’s suggestion. Unless you think you will have more than a few dozen concurrent queries, start with work_mem around 32MB.
For the query here, a very large work_mem might help it hash join depending on the data... But that’s not the real problem here.
The real problem is that it does a huge scan of all of the a1 table, and sorts it. Its pretty clear that this table has incorrect statistics. It thinks that it will get about 1 million rows back in the scan, but it is actually 3 million in the scan.
Crank up the statistics target on that table from the default to at least 100, perhaps even 1000. This is a large table, the default statistics target of 10 is not good for large tables with skewed column data. Those to try increasing the target on are the columns filtered in the explain: graphid, prop, and obj. Then run vacuum analzye on that table (a1). The planner should then have better stats and will likely be able to use a better plan for the join.
The other tables involved in the join also seem to have bad statistics. You might just take the easiest solution and change the global statistics target and vacuum analyze the tables involved:
set default_statistics_target = 50;
vacuum analyze jena_g1t1_stmt ;
(test the query)
Repeat for several values of the default statistics target. You can run “explain” before running the actual query, to see if the plan changed. If it has not, the time will not likely change.
The max value for the statistics target is 1000, which makes analyzing and query planning slower, but more accurate. In most cases, dramatic differences can happen between the default of 10 and values of 25 or 50. Sometimes, you have to go into the hundreds, and it is safer to do this on a per-column basis once you get to larger values.
For larger database, I recommend increasing the default to 20 to 40 and re-analyzing all the tables.>>> Farhan Husain <russoue@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov wrote:
>> >>> Farhan Husain <russoue@gmail.com> wrote:
>> > The machine postgres is running on has 4 GB of RAM.
>>
>> In addition to the other suggestions, you should be sure that
>> effective_cache_size is set to a reasonable value, which would
>> probably be somewhere in the neighborhood of '3GB'.
> The execution time has not improved. I am going to increase the
> shared_buffers now keeping the work_mem same.
Increasing shared_buffers is good, but it has been mentioned that this
will not change the plan, which currently scans and sorts the whole
table for a1. Nobody will be surprised when you report minimal
change, if any. If you have not changed effective_cache_size (be sure
not to confuse this with any of the other configuration values) it
will think you only have 128MB of cache, which will be off by a factor
of about 24 from reality.
Also, I'm going to respectfully differ with some of the other posts on
the best setting for work_mem. Most benchmarks I've run and can
remember seeing posted found best performance for this at somewhere
between 16MB and 32MB. You do have to be careful if you have a large
number of concurrent queries, however, and keep it lower. In most
such cases, though, you're better off using a connection pool to limit
concurrent queries instead.
-Kevin--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Thanks to all of you who wanted to help me. I would be happy if someone does me one last favor. I want to know how these query plans are generated and how the parameters you suggested to change affects it. If there is any article, paper or book on it please give me the name or url.
Here is the output of my latest tasks:
ingentadb=# set default_statistics_target=50;
SET
ingentadb=# show default_statistics_target;
default_statistics_target
---------------------------
50
(1 row)
ingentadb=# vacuum analyze jena_g1t1_stmt;
VACUUM
ingentadb=# EXPLAIN select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..37838.46 rows=7568 width=134)
-> Nested Loop (cost=0.00..7485.09 rows=495 width=148)
-> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0 (cost=0.00..1160.62 rows=97 width=74)
Index Cond: ((obj)::text = 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
Filter: (((prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid = 1))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a1 (cost=0.00..65.15 rows=4 width=74)
Index Cond: (((a1.subj)::text = (a0.subj)::text) AND ((a1.prop)::text = 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text))
Filter: ((a1.graphid = 1) AND ((a1.obj)::text = 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2 (cost=0.00..61.17 rows=12 width=134)
Index Cond: (((a2.subj)::text = (a0.subj)::text) AND ((a2.prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
Filter: (a2.graphid = 1)
(11 rows)
ingentadb=# EXPLAIN ANALYZE select A0.Subj, A2.Obj From jena_g1t1_stmt A0, jena_g1t1_stmt A1, jena_g1t1_stmt A2 Where A0.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf' AND A0.Obj='Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1' AND A0.GraphID=1 AND A0.Subj=A1.Subj AND A1.Prop='Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type' AND A1.Obj='Uv::http://metastore.ingenta.com/ns/structure/Article' AND A1.GraphID=1 AND A0.Subj=A2.Subj AND A2.Prop='Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage' AND A2.GraphID=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..37838.46 rows=7568 width=134) (actual time=6.535..126.791 rows=30 loops=1)
-> Nested Loop (cost=0.00..7485.09 rows=495 width=148) (actual time=4.404..64.078 rows=30 loops=1)
-> Index Scan using jena_g1t1_stmt_ixo on jena_g1t1_stmt a0 (cost=0.00..1160.62 rows=97 width=74) (actual time=2.127..2.270 rows=30 loops=1)
Index Cond: ((obj)::text = 'Uv::http://www.utdallas.edu/~farhan.husain/IngentaConnect/issue1_1'::text)
Filter: (((prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/isPartOf'::text) AND (graphid = 1))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a1 (cost=0.00..65.15 rows=4 width=74) (actual time=2.054..2.056 rows=1 loops=30)
Index Cond: (((a1.subj)::text = (a0.subj)::text) AND ((a1.prop)::text = 'Uv::http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text))
Filter: ((a1.graphid = 1) AND ((a1.obj)::text = 'Uv::http://metastore.ingenta.com/ns/structure/Article'::text))
-> Index Scan using jena_g1t1_stmt_ixsp on jena_g1t1_stmt a2 (cost=0.00..61.17 rows=12 width=134) (actual time=2.083..2.086 rows=1 loops=30)
Index Cond: (((a2.subj)::text = (a0.subj)::text) AND ((a2.prop)::text = 'Uv::http://prismstandard.org/namespaces/1.2/basic/startingPage'::text))
Filter: (a2.graphid = 1)
Total runtime: 127.065 ms
(12 rows)
Thanks and regards,
--
Mohammad Farhan Husain
Research Assistant
Department of Computer Science
Erik Jonsson School of Engineering and Computer Science
University of Texas at Dallas
>>> Farhan Husain <russoue@gmail.com> wrote: > Thanks a lot Scott! I think that was the problem. I just changed the > default statistics target to 50 and ran explain. The plan changed > and I ran explain analyze. Now it takes a fraction of a second! Yeah, the default of 10 has been too low. In 8.4 it is being raised to 100. > Thanks to all of you who wanted to help me. I would be happy if > someone does me one last favor. I want to know how these query plans > are generated and how the parameters you suggested to change affects > it. If there is any article, paper or book on it please give me the > name or url. In terms of tuning in general, you might start with these: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server http://www.postgresql.org/docs/8.3/interactive/runtime-config-query.html To understand the mechanics of the optimizer you might be best off downloading the source code and reading through the README files and comments in the source code. -Kevin
On Thu, Feb 26, 2009 at 12:10 PM, Steve Clark <sclark@netwolves.com> wrote: > > Can this be set in the postgresql.conf file? > default_statistics_target = 50 Yep. It will take affect after a reload and after the current connection has been reset. If you want to you also set a default for a database or a role. Fine tuning as needed.
Kevin Grittner wrote: >>>> Farhan Husain <russoue@gmail.com> wrote: >> Thanks a lot Scott! I think that was the problem. I just changed the >> default statistics target to 50 and ran explain. The plan changed >> and I ran explain analyze. Now it takes a fraction of a second! > > Yeah, the default of 10 has been too low. In 8.4 it is being raised > to 100. > >> Thanks to all of you who wanted to help me. I would be happy if >> someone does me one last favor. I want to know how these query plans >> are generated and how the parameters you suggested to change affects >> it. If there is any article, paper or book on it please give me the >> name or url. > > In terms of tuning in general, you might start with these: > > http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > > http://www.postgresql.org/docs/8.3/interactive/runtime-config-query.html > > To understand the mechanics of the optimizer you might be best off > downloading the source code and reading through the README files and > comments in the source code. > > -Kevin > Hello List, Can this be set in the postgresql.conf file? default_statistics_target = 50 Thanks, Steve