Thread: Opptimizing projections containing unused columns

Opptimizing projections containing unused columns

From
"Andrus"
Date:
I have lot of autogenerated from projection queries in form

SELECT source.c1, source.c2, t1.col1, t1.col2, ...
FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source
LEFT JOIN t2 USING (somecolumn)

Main SELECT uses only few columns (source.c1 and source.c2 in this sample)
from source projection.
Inner table may have up to 500000 rows.

I read 8.4 WITH documentation and this describes somewhat that PostgreSQL
can discard unused columns automatically but this is not clear. I havent
found exact doc about this. I havent way how to determine this using EXPLAIN
or similar command.

Questions:

1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance
?
2. Will PostgreSQL 8.0+ automatically exclude columns c3 .. c20 while
executing
query ?
3. Or should I make sql builder much more sophisticated so that it will not
generate columns c3 .. c20 when it creates sql ?

Andrus.


Re: Opptimizing projections containing unused columns

From
Tom Lane
Date:
"Andrus" <kobruleht2@hot.ee> writes:
> I have lot of autogenerated from projection queries in form
> SELECT source.c1, source.c2, t1.col1, t1.col2, ...
> FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source
> LEFT JOIN t2 USING (somecolumn)
> Main SELECT uses only few columns (source.c1 and source.c2 in this sample)
> from source projection.

> 1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance

Don't worry about it.  All modern Postgres versions ignore columns that
aren't actually used in the query --- at least for examples as simple as
this one.  In cases where you intentionally defeat optimization (eg via
OFFSET 0 in a sub-select) it's possible that the sub-select will compute
all its output columns even though the upper query doesn't use 'em all.

The width field in EXPLAIN output offers a good hint as to how many
columns the query is actually fetching.

            regards, tom lane

Re: Opptimizing projections containing unused columns

From
"Grzegorz Jaśkiewicz"
Date:

looks like most ppl nowdays have two simple problems, and try to work against it. Instead they all should focus on getting their data organized properly, and queries writeen for project before they start to code other stuff.

The problems are: trying to outsmart db, still belive that you can catch all data to memory, and work it out in software. 

I see it time and time again :/

Re: Opptimizing projections containing unused columns

From
"Andrus"
Date:
Tom,

> Don't worry about it.  All modern Postgres versions ignore columns that
> aren't actually used in the query --- at least for examples as simple as
> this one.  In cases where you intentionally defeat optimization (eg via
> OFFSET 0 in a sub-select) it's possible that the sub-select will compute
> all its output columns even though the upper query doesn't use 'em all.
>
> The width field in EXPLAIN output offers a good hint as to how many
> columns the query is actually fetching.

thank you very much.
I don't use OFFSET clause in those queries and have servers starting at 8.0
version.
So I hope this works.

I have also a related question.
Inner query returns about 500000 records, outer query returns about 10000
records.

Is it OK to put all filters to outer query WHERE clase?
Or should I move as many filter conditions as possible to inner query so
that inner query returns 10000 records instead of 500000 records.

Is there difference in perfomance if inner query returs large number of rows
which are filtered only by outer query ?

Andrus.


Re: Opptimizing projections containing unused columns

From
Tom Lane
Date:
"Andrus" <kobruleht2@hot.ee> writes:
> Is it OK to put all filters to outer query WHERE clase?
> Or should I move as many filter conditions as possible to inner query so
> that inner query returns 10000 records instead of 500000 records.

> Is there difference in perfomance if inner query returs large number of rows
> which are filtered only by outer query ?

This question is too vague to be answerable --- especially if you want
an answer that applies across all 8.x releases.  I'd suggest
experimenting a bit using EXPLAIN ANALYZE to see what happens in your
actual application.

            regards, tom lane

Re: Optimizing projections containing unused columns

From
"Andrus"
Date:
Tom,

> This question is too vague to be answerable --- especially if you want
> an answer that applies across all 8.x releases.  I'd suggest
> experimenting a bit using EXPLAIN ANALYZE to see what happens in your
> actual application.

Thank you very much.
I cannot experiment with application currently since this part is at
planning state.
I asked this to get right designing direction.

OK, I will create applicaton so that inner WHERE clauses contain as much
restrictions as it is possible without adding additional tables to
inner SELECTs. Hopefully this cannot be slower than moving those
restrictions to outer select.

A third related issue:

Is it OK to use 5 levels of nested queries like

SELECT ...
  FROM ...
   (SELECT ..       FROM  ...
       (SELECT ... FROM ..
    ...
    )  p1
    ) p2

or is it better to create queries with as few levels of nestings as possible
?
In some cases we have selection to use UNIONS, JOINS and CREATE TEMP TABLE x
ON COMMIT DROP clauses instead of nested queries but
nested queries look more natural to create.

Currently we are planning to support servers starting at 8.0 so we are not
planning to use WITH statement.

Andrus


Re: Opptimizing projections containing unused columns

From
陈伟楠
Date:
Andrus,Hi!

       1.From the query plan,it makes no difference.

       2.PostgreSQL 8.0+ will not exclude columns c3..c20 while executing.

       3.I just make a test,From query plan or executing time,it makes no difference.

   I'm from China.I hope you could see what I mean :)

============================================================

>I have lot of autogenerated from projection queries in form
>
>SELECT source.c1, source.c2, t1.col1, t1.col2, ...
>FROM (SELECT c1, c2, c3, ..... , c20 FROM ... WHERE ... ) source
>LEFT JOIN t2 USING (somecolumn)
>
>Main SELECT uses only few columns (source.c1 and source.c2 in this sample)
>from source projection.
>Inner table may have up to 500000 rows.
>
>I read 8.4 WITH documentation and this describes somewhat that PostgreSQL
>can discard unused columns automatically but this is not clear. I havent
>found exact doc about this. I havent way how to determine this using EXPLAIN
>or similar command.
>
>Questions:
>
>1. Does selecting of unused columns ( c3 ... c20) affect to query prefomance 
>?
>2. Will PostgreSQL 8.0+ automatically exclude columns c3 .. c20 while 
>executing
>query ?
>3. Or should I make sql builder much more sophisticated so that it will not
>generate columns c3 .. c20 when it creates sql ?
>
>Andrus. 
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

weinna.chen

Beijing China

Re: Optimizing projections containing unused columns

From
Martijn van Oosterhout
Date:
On Wed, Oct 15, 2008 at 09:07:28PM +0300, Andrus wrote:
> Tom,
>
> >This question is too vague to be answerable --- especially if you want
> >an answer that applies across all 8.x releases.  I'd suggest
> >experimenting a bit using EXPLAIN ANALYZE to see what happens in your
> >actual application.
>
> Thank you very much.
> I cannot experiment with application currently since this part is at
> planning state.
> I asked this to get right designing direction.

The basic answer is: every version of pg gets smarter. And the current
version is really smart. What you really have to watch out for is that
the moving of the clause really works. In the presense of outer joins
and NULL some things arn't as obvious as they seem...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: Optimizing projections containing unused columns

From
"Grzegorz Jaśkiewicz"
Date:
not so smart yet ;)

thickbook:~ gj$ psql gj
Timing is on.
psql (8.4devel)
Type "help" for help.

gj=# create table smartarse(id bigserial, v varchar not null , primary key (id, v));
NOTICE:  CREATE TABLE will create implicit sequence "smartarse_id_seq" for serial column "smartarse.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "smartarse_pkey" for table "smartarse"
CREATE TABLE
Time: 172,315 ms
gj=# \d+ smartarse
                                           Table "public.smartarse"
 Column |       Type        |                       Modifiers                        | Storage  | Description 
--------+-------------------+--------------------------------------------------------+----------+-------------
 id     | bigint            | not null default nextval('smartarse_id_seq'::regclass) | plain    | 
 v      | character varying | not null                                               | extended | 
Indexes:
    "smartarse_pkey" PRIMARY KEY, btree (id, v)
Has OIDs: no

gj=# insert into smartarse(v) values('one'), ('two'), ('three'), ('four');
INSERT 0 4
Time: 1,500 ms
gj=# explain select count(*) from smartarse where v is null;
                           QUERY PLAN                           
----------------------------------------------------------------
 Aggregate  (cost=21.62..21.63 rows=1 width=0)
   ->  Seq Scan on smartarse  (cost=0.00..21.60 rows=6 width=0)
         Filter: (v IS NULL)
(3 rows)

Time: 1,115 ms


but, getting more and more smart - that's for sure :)

Re: Optimizing projections containing unused columns

From
"Andrus"
Date:
> gj=# explain select count(*) from smartarse where v is null;
>----------------------------------------------------------------
> Aggregate  (cost=21.62..21.63 rows=1 width=0)
<   ->  Seq Scan on smartarse  (cost=0.00..21.60 rows=6 width=0)

Thread question and sample was about using nested queries.
Your sample does not use nested queries but it shows (width=0) that all
columns are excluded.

So I do'nt understand how this applies to topic.

Andrus.


Re: Optimizing projections containing unused columns

From
"Andrus"
Date:
> So I do'nt understand how this applies to topic.

I'm sorry, after re-reading yuor reply I understood this.
You think thank PostgreSQL should use not null inforamtion from metadata.

Maybe it is better to create separate static code analyzer like FxCop,
Smokey or Gendarme for C# for PostgreSQL to analyze code.

Andrus.


Re: Optimizing projections containing unused columns

From
"Grzegorz Jaśkiewicz"
Date:
I just generally wanted to show, that even tho it is much smarter from version to version, it still has this little things.
I generally don't put 'is null' or 'is not null' in queries on column that is 'not null' - but it is imo example of those little things, where postgres isn't so smart yet :)