Thread: Opptimizing projections containing unused columns
I have lot of autogenerated from projection queries in form SELECT source.c1, source.c2, t1.col1, t1.col2, ... FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source LEFT JOIN t2 USING (somecolumn) Main SELECT uses only few columns (source.c1 and source.c2 in this sample) from source projection. Inner table may have up to 500000 rows. I read 8.4 WITH documentation and this describes somewhat that PostgreSQL can discard unused columns automatically but this is not clear. I havent found exact doc about this. I havent way how to determine this using EXPLAIN or similar command. Questions: 1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance ? 2. Will PostgreSQL 8.0+ automatically exclude columns c3 .. c20 while executing query ? 3. Or should I make sql builder much more sophisticated so that it will not generate columns c3 .. c20 when it creates sql ? Andrus.
"Andrus" <kobruleht2@hot.ee> writes: > I have lot of autogenerated from projection queries in form > SELECT source.c1, source.c2, t1.col1, t1.col2, ... > FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source > LEFT JOIN t2 USING (somecolumn) > Main SELECT uses only few columns (source.c1 and source.c2 in this sample) > from source projection. > 1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance Don't worry about it. All modern Postgres versions ignore columns that aren't actually used in the query --- at least for examples as simple as this one. In cases where you intentionally defeat optimization (eg via OFFSET 0 in a sub-select) it's possible that the sub-select will compute all its output columns even though the upper query doesn't use 'em all. The width field in EXPLAIN output offers a good hint as to how many columns the query is actually fetching. regards, tom lane
looks like most ppl nowdays have two simple problems, and try to work against it. Instead they all should focus on getting their data organized properly, and queries writeen for project before they start to code other stuff.
The problems are: trying to outsmart db, still belive that you can catch all data to memory, and work it out in software.
I see it time and time again :/
Tom, > Don't worry about it. All modern Postgres versions ignore columns that > aren't actually used in the query --- at least for examples as simple as > this one. In cases where you intentionally defeat optimization (eg via > OFFSET 0 in a sub-select) it's possible that the sub-select will compute > all its output columns even though the upper query doesn't use 'em all. > > The width field in EXPLAIN output offers a good hint as to how many > columns the query is actually fetching. thank you very much. I don't use OFFSET clause in those queries and have servers starting at 8.0 version. So I hope this works. I have also a related question. Inner query returns about 500000 records, outer query returns about 10000 records. Is it OK to put all filters to outer query WHERE clase? Or should I move as many filter conditions as possible to inner query so that inner query returns 10000 records instead of 500000 records. Is there difference in perfomance if inner query returs large number of rows which are filtered only by outer query ? Andrus.
"Andrus" <kobruleht2@hot.ee> writes: > Is it OK to put all filters to outer query WHERE clase? > Or should I move as many filter conditions as possible to inner query so > that inner query returns 10000 records instead of 500000 records. > Is there difference in perfomance if inner query returs large number of rows > which are filtered only by outer query ? This question is too vague to be answerable --- especially if you want an answer that applies across all 8.x releases. I'd suggest experimenting a bit using EXPLAIN ANALYZE to see what happens in your actual application. regards, tom lane
Tom, > This question is too vague to be answerable --- especially if you want > an answer that applies across all 8.x releases. I'd suggest > experimenting a bit using EXPLAIN ANALYZE to see what happens in your > actual application. Thank you very much. I cannot experiment with application currently since this part is at planning state. I asked this to get right designing direction. OK, I will create applicaton so that inner WHERE clauses contain as much restrictions as it is possible without adding additional tables to inner SELECTs. Hopefully this cannot be slower than moving those restrictions to outer select. A third related issue: Is it OK to use 5 levels of nested queries like SELECT ... FROM ... (SELECT .. FROM ... (SELECT ... FROM .. ... ) p1 ) p2 or is it better to create queries with as few levels of nestings as possible ? In some cases we have selection to use UNIONS, JOINS and CREATE TEMP TABLE x ON COMMIT DROP clauses instead of nested queries but nested queries look more natural to create. Currently we are planning to support servers starting at 8.0 so we are not planning to use WITH statement. Andrus
Andrus,Hi! 1.From the query plan,it makes no difference. 2.PostgreSQL 8.0+ will not exclude columns c3..c20 while executing. 3.I just make a test,From query plan or executing time,it makes no difference. I'm from China.I hope you could see what I mean :) ============================================================ >I have lot of autogenerated from projection queries in form > >SELECT source.c1, source.c2, t1.col1, t1.col2, ... >FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source >LEFT JOIN t2 USING (somecolumn) > >Main SELECT uses only few columns (source.c1 and source.c2 in this sample) >from source projection. >Inner table may have up to 500000 rows. > >I read 8.4 WITH documentation and this describes somewhat that PostgreSQL >can discard unused columns automatically but this is not clear. I havent >found exact doc about this. I havent way how to determine this using EXPLAIN >or similar command. > >Questions: > >1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance >? >2. Will PostgreSQL 8.0+ automatically exclude columns c3 .. c20 while >executing >query ? >3. Or should I make sql builder much more sophisticated so that it will not >generate columns c3 .. c20 when it creates sql ? > >Andrus. > > >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general weinna.chen Beijing China
On Wed, Oct 15, 2008 at 09:07:28PM +0300, Andrus wrote: > Tom, > > >This question is too vague to be answerable --- especially if you want > >an answer that applies across all 8.x releases. I'd suggest > >experimenting a bit using EXPLAIN ANALYZE to see what happens in your > >actual application. > > Thank you very much. > I cannot experiment with application currently since this part is at > planning state. > I asked this to get right designing direction. The basic answer is: every version of pg gets smarter. And the current version is really smart. What you really have to watch out for is that the moving of the clause really works. In the presense of outer joins and NULL some things arn't as obvious as they seem... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
not so smart yet ;)
thickbook:~ gj$ psql gj
Timing is on.
psql (8.4devel)
Type "help" for help.
gj=# create table smartarse(id bigserial, v varchar not null , primary key (id, v));
NOTICE: CREATE TABLE will create implicit sequence "smartarse_id_seq" for serial column "smartarse.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "smartarse_pkey" for table "smartarse"
CREATE TABLE
Time: 172,315 ms
gj=# \d+ smartarse
Table "public.smartarse"
Column | Type | Modifiers | Storage | Description
--------+-------------------+--------------------------------------------------------+----------+-------------
id | bigint | not null default nextval('smartarse_id_seq'::regclass) | plain |
v | character varying | not null | extended |
Indexes:
"smartarse_pkey" PRIMARY KEY, btree (id, v)
Has OIDs: no
gj=# insert into smartarse(v) values('one'), ('two'), ('three'), ('four');
INSERT 0 4
Time: 1,500 ms
gj=# explain select count(*) from smartarse where v is null;
QUERY PLAN
----------------------------------------------------------------
Aggregate (cost=21.62..21.63 rows=1 width=0)
-> Seq Scan on smartarse (cost=0.00..21.60 rows=6 width=0)
Filter: (v IS NULL)
(3 rows)
Time: 1,115 ms
but, getting more and more smart - that's for sure :)
> gj=# explain select count(*) from smartarse where v is null; >---------------------------------------------------------------- > Aggregate (cost=21.62..21.63 rows=1 width=0) < -> Seq Scan on smartarse (cost=0.00..21.60 rows=6 width=0) Thread question and sample was about using nested queries. Your sample does not use nested queries but it shows (width=0) that all columns are excluded. So I do'nt understand how this applies to topic. Andrus.
> So I do'nt understand how this applies to topic. I'm sorry, after re-reading yuor reply I understood this. You think thank PostgreSQL should use not null inforamtion from metadata. Maybe it is better to create separate static code analyzer like FxCop, Smokey or Gendarme for C# for PostgreSQL to analyze code. Andrus.
I just generally wanted to show, that even tho it is much smarter from version to version, it still has this little things.
I generally don't put 'is null' or 'is not null' in queries on column that is 'not null' - but it is imo example of those little things, where postgres isn't so smart yet :)