Thread: Increase performance of a UNION query that thakes 655.07 msec to be runned ?
Increase performance of a UNION query that thakes 655.07 msec to be runned ?
From
"Bruno BAGUETTE"
Date:
Hello, Do you see a way to get better performances with this query which takes currently 655.07 msec to be done. levure=> explain analyze SELECT distinct lower(substr(l_name, 1, 1)) AS initiale FROM people levure-> UNION levure-> SELECT distinct lower(substr(org_name, 1, 1)) AS initiale FROM organizations levure-> ORDER BY initiale; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ----------- Sort (cost=158.73..158.78 rows=20 width=43) (actual time=650.82..650.89 rows=39 loops=1) Sort Key: initiale -> Unique (cost=157.30..158.30 rows=20 width=43) (actual time=649.55..650.17 rows=39 loops=1) -> Sort (cost=157.30..157.80 rows=200 width=43) (actual time=649.55..649.67 rows=69 loops=1) Sort Key: initiale -> Append (cost=69.83..149.66 rows=200 width=43) (actual time=198.48..648.51 rows=69 loops=1) -> Subquery Scan "*SELECT* 1" (cost=69.83..74.83 rows=100 width=38) (actual time=198.48..230.62 rows=37 loops=1) -> Unique (cost=69.83..74.83 rows=100 width=38) (actual time=198.46..230.31 rows=37 loops=1) -> Sort (cost=69.83..72.33 rows=1000 width=38) (actual time=198.45..205.99 rows=4093 loops=1) Sort Key: lower(substr((l_name)::text, 1, 1)) -> Seq Scan on people (cost=0.00..20.00 rows=1000 width=38) (actual time=0.19..52.33 rows=4093 loops=1) -> Subquery Scan "*SELECT* 2" (cost=69.83..74.83 rows=100 width=43) (actual time=361.82..417.62 rows=32 loops=1) -> Unique (cost=69.83..74.83 rows=100 width=43) (actual time=361.79..417.33 rows=32 loops=1) -> Sort (cost=69.83..72.33 rows=1000 width=43) (actual time=361.79..374.81 rows=7074 loops=1) Sort Key: lower(substr((org_name)::text, 1, 1)) -> Seq Scan on organizations (cost=0.00..20.00 rows=1000 width=43) (actual time=0.23..95.47 rows=7074 loops=1) Total runtime: 655.07 msec (17 rows) I was thinking that a index on lower(substr(l_name, 1, 1)) and another index on lower(substr(org_name, 1, 1)) should gives better performances. When I've to create theses two indexes, it seems like this is not allowed : levure=> CREATE INDEX firstchar_lastname_idx ON people(lower(substr(l_name, 1, 1))); ERROR: parser: parse error at or near "(" at character 59 Do you have another idea to get better performances ? Thanks in advance :-) PS : Note that this database is VACUUMed twice per day (and sometimes more). ------------------------------------- Bruno BAGUETTE - pgsql-ml@baguette.net
Re: Increase performance of a UNION query that thakes 655.07 msec to be runned ?
From
Tom Lane
Date:
"Bruno BAGUETTE" <pgsql-ml@baguette.net> writes: > Do you see a way to get better performances with this query which takes > currently 655.07 msec to be done. > levure=> explain analyze SELECT distinct lower(substr(l_name, 1, 1)) AS > initiale FROM people > levure-> UNION > levure-> SELECT distinct lower(substr(org_name, 1, 1)) AS initiale FROM > organizations > levure-> ORDER BY initiale; This is inherently a bit inefficient since the UNION implies a DISTINCT step, thus partially repeating the DISTINCT work done inside each SELECT. It would likely be a tad faster to drop the DISTINCTs from the subselects and rely on UNION to do the filtering. However, you're still gonna have a big SORT/UNIQUE step. As of PG 7.4 you could probably get a performance win by converting the thing to use GROUP BY instead of DISTINCT or UNION: select initiale from ( select lower(substr(l_name,1,1)) as initiale from people union all select lower(substr(org_name,1,1)) as initiale from organizations ) ss group by initiale order by initiale; This should use a HashAggregate to do the unique-ification. I think that will be faster than Sort/Unique. regards, tom lane
On Fri, 6 Feb 2004, Bruno BAGUETTE wrote: > I was thinking that a index on lower(substr(l_name, 1, 1)) and another > index on lower(substr(org_name, 1, 1)) should gives better performances. > When I've to create theses two indexes, it seems like this is not > allowed : > > levure=> CREATE INDEX firstchar_lastname_idx ON > people(lower(substr(l_name, 1, 1))); > ERROR: parser: parse error at or near "(" at character 59 In 7.4, I believe you would say on people((lower(substr(l_name,1,1)))) but I'm not sure that index would really help in practice. > Do you have another idea to get better performances ? In addition to what Tom said, the row estimates look suspiciously default. You mention vacuuming, but do you ever analyze the tables? Also, what do you have sort_mem set to?
RE : Increase performance of a UNION query that thakes 655.07 msec to be runned ?
From
"Bruno BAGUETTE"
Date:
re-Hello, As suggested by Tom, I've removed the distinct and tried it's query : levure=> explain analyze select initiale from ( levure(> select lower(substr(l_name,1,1)) as initiale from people levure(> union all levure(> select lower(substr(org_name,1,1)) as initiale from organizations levure(> ) ss levure-> group by initiale order by initiale; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ - Group (cost=1018.48..1074.32 rows=1117 width=17) (actual time=783.47..867.61 rows=39 loops=1) -> Sort (cost=1018.48..1046.40 rows=11167 width=17) (actual time=782.18..801.68 rows=11167 loops=1) Sort Key: initiale -> Subquery Scan ss (cost=0.00..267.67 rows=11167 width=17) (actual time=0.23..330.31 rows=11167 loops=1) -> Append (cost=0.00..267.67 rows=11167 width=17) (actual time=0.22..263.69 rows=11167 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..87.93 rows=4093 width=15) (actual time=0.22..79.51 rows=4093 loops=1) -> Seq Scan on people (cost=0.00..87.93 rows=4093 width=15) (actual time=0.20..53.82 rows=4093 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..179.74 rows=7074 width=17) (actual time=0.24..146.12 rows=7074 loops=1) -> Seq Scan on organizations (cost=0.00..179.74 rows=7074 width=17) (actual time=0.23..100.70 rows=7074 loops=1) Total runtime: 874.79 msec (10 rows) That seems to be 200 msec longer that my first query... Indeed, I've noticed something strange : now, if I rerun my initial query, I get worse runtime than this morning : levure=> EXPLAIN ANALYZE SELECT lower(substr(l_name, 1, 1)) AS initiale FROM people levure-> UNION levure-> SELECT lower(substr(org_name, 1, 1)) AS initiale FROM organizations levure-> ORDER BY initiale; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ Sort (cost=1130.85..1133.64 rows=1117 width=17) (actual time=802.52..802.58 rows=39 loops=1) Sort Key: initiale -> Unique (cost=1018.48..1074.32 rows=1117 width=17) (actual time=712.04..801.83 rows=39 loops=1) -> Sort (cost=1018.48..1046.40 rows=11167 width=17) (actual time=712.03..732.63 rows=11167 loops=1) Sort Key: initiale -> Append (cost=0.00..267.67 rows=11167 width=17) (actual time=0.21..263.54 rows=11167 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..87.93 rows=4093 width=15) (actual time=0.20..80.47 rows=4093 loops=1) -> Seq Scan on people (cost=0.00..87.93 rows=4093 width=15) (actual time=0.19..54.14 rows=4093 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..179.74 rows=7074 width=17) (actual time=0.28..144.82 rows=7074 loops=1) -> Seq Scan on organizations (cost=0.00..179.74 rows=7074 width=17) (actual time=0.27..99.06 rows=7074 loops=1) Total runtime: 806.47 msec (11 rows) I don't understand why this runtime has changed because no data has been added/updated/deleted since several weeks (I'm working on a copy of the production database. And this copy is not accessible for the users). My PostgreSQL version is PostgreSQL 7.3.2, I have to ask to the administrator if it can be upgraded to 7.4 in the production server. Thanks in advance for your help. --------------------------------------- Bruno BAGUETTE - pgsql-ml@baguette.net
RE : Increase performance of a UNION query that thakes 655.07 msec to be runned ?
From
"Bruno BAGUETTE"
Date:
> In addition to what Tom said, the row estimates look > suspiciously default. You mention vacuuming, but do you ever > analyze the tables? I run VACUUM FULL ANALYZE with the postgres user on all the PostgreSQL databases on the server, twice a day, sometimes more. > Also, what do you have sort_mem set to? [root@levure data]# cat postgresql.conf | grep sort_mem sort_mem = 6144 # min 64, size in KB Do you think I should increase that value ? It's not so easy to do a good setup of that postgresql.conf file, is there any tool that suggests some values for that ? Thanks in advance for your tips :-) --------------------------------------- Bruno BAGUETTE - pgsql-ml@baguette.net
On Fri, 6 Feb 2004, Bruno BAGUETTE wrote: > > In addition to what Tom said, the row estimates look > > suspiciously default. You mention vacuuming, but do you ever > > analyze the tables? > > I run VACUUM FULL ANALYZE with the postgres user on all the PostgreSQL > databases on the server, twice a day, sometimes more. Wierd, because you're getting 1000 estimated on both people and organizations. What does pg_class have to say about those two tables? > > Also, what do you have sort_mem set to? > > [root@levure data]# cat postgresql.conf | grep sort_mem > sort_mem = 6144 # min 64, size in KB > > Do you think I should increase that value ? Hmm, I'd expect that the sort would fit in that space in general. If you want to try different values, you can set sort_mem from psql rather than changing the configuration file. ---- On my machine the index does actually help, although I needed to lower random_page_cost a little from its default of 4 to get it to use it preferentially, but I'm also getting times about 1/3 of yours (and my machine is pretty poor) so I think I may not have data that matches yours very well.
RE : RE : Increase performance of a UNION query that thakes 655.07 msec to be runned ?
From
"Bruno BAGUETTE"
Date:
> On Fri, 6 Feb 2004, Bruno BAGUETTE wrote: > > > > In addition to what Tom said, the row estimates look suspiciously > > > default. You mention vacuuming, but do you ever analyze > > > the tables? > > > > I run VACUUM FULL ANALYZE with the postgres user on all the > > PostgreSQL > > databases on the server, twice a day, sometimes more. > > Wierd, because you're getting 1000 estimated on both people > and organizations. What does pg_class have to say about > those two tables? I'm sorry but I think that I misunderstand you. Are you telling me that running VACUUM FULL ANALYZE is weird ? Or do you mean another thing ? Finally, I've found another way : I've build a MATERIALIZED VIEW that stores the initial (CHAR(1) of both people and organizations, with an index on that column. I get excellent results : Unique (cost=0.00..290.34 rows=1117 width=5) (actual time=0.52..267.38 rows=39 loops=1) -> Index Scan using idx_mview_initials on mview_contacts (cost=0.00..262.42 rows=11167 width=5) (actual time=0.51..172.15 rows=11167 loops=1) Total runtime: 267.81 msec (3 rows) So, that's a better runtime :-) Thanks for your help :-) ------------------------------------- Bruno BAGUETTE - pgsql-ml@baguette.net
On Sun, 8 Feb 2004, Bruno BAGUETTE wrote: > > On Fri, 6 Feb 2004, Bruno BAGUETTE wrote: > > > > > > In addition to what Tom said, the row estimates look suspiciously > > > > default. You mention vacuuming, but do you ever analyze > > > > the tables? > > > > > > I run VACUUM FULL ANALYZE with the postgres user on all the > > > PostgreSQL > > > databases on the server, twice a day, sometimes more. > > > > Wierd, because you're getting 1000 estimated on both people > > and organizations. What does pg_class have to say about > > those two tables? > > I'm sorry but I think that I misunderstand you. Are you telling me that > running VACUUM FULL ANALYZE is weird ? Or do you mean another thing ? No, I was saying it's wierd that it'd be misestimating to the default values after a vacuum full analyze.