Re: extremly bad select performance on huge table - Mailing list pgsql-performance

From Igor Neyman
Subject Re: extremly bad select performance on huge table
Date
Msg-id A76B25F2823E954C9E45E32FA49D70ECAB2FDEE5@mail.corp.perceptron.com
Whole thread Raw
In response to Re: extremly bad select performance on huge table  (Björn Wittich <Bjoern_Wittich@gmx.de>)
Responses Re: extremly bad select performance on huge table  (Björn Wittich <Bjoern_Wittich@gmx.de>)
List pgsql-performance

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Björn Wittich
Sent: Tuesday, October 21, 2014 3:32 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] extremly bad select performance on huge table

Hi Tom and Igor,

thanks for your help. With the reindex the select query running time was reduced from 5200 sec to 130 sec. Impressive!

Even a join on this table is now fast.

Unfortunately, there is now another problem: The table in my example has
500 columns which I want to retrieve with my join command.

Example which is fast "select value from smallertable inner join myhugetable ON smallertable.mycolumn =
myhugetable.mycolumn"

Example which is slow "select value,c1,c2,c3,...,c10 from smallertable inner join myhugetable ON smallertable.mycolumn
=myhugetable.mycolumn" 


Which is the number of columns to fetch so bad ? Which action is done in
the db system when querying this via pgadmin? I think that there is no
real retrieval included, why is the number of additional columns so bad
for the join performance?

> =?ISO-8859-15?Q?Bj=F6rn_Wittich?= <Bjoern_Wittich@gmx.de> writes:
>> Here is the explain (analyze,buffers) select mycolumn from myhugetable
>> "Index Only Scan using myprimkey on myhugetable  (cost=0.00..8224444.82
>> rows=71768080 width=33) (actual time=16.722..2456300.778 rows=71825999
>> loops=1)"
>> "  Heap Fetches: 356861"
>> "  Buffers: shared hit=71799472 read=613813"
>> "Total runtime: 2503009.611 ms"
> So that works out to about 4 msec per page fetched considering only I/O
> costs, which is about as good as you're likely to get if the data is
> sitting on spinning rust.
>
> You could potentially make it faster with a VACUUM (to mark all pages
> all-visible and eliminate the "heap fetches" costs), or a REINDEX
> (so that the index scan becomes more nearly sequential instead of random
> access).  However, unless the data is nearly static those will just be
> temporary fixes: the time will degrade again as you update the table.
>
>> Note: This select is just for testing. My final statement will be a join
>> on this table via the "mycolumn" column.
> In that case it's probably a waste of time to worry about the performance
> of this query as such.  In the first place, a join is not likely to use
> the index at all unless it's fetching a relatively small number of rows,
> and in the second place it seems unlikely that the join query can use
> an IndexOnlyScan on this index --- I imagine that the purpose of the join
> will require fetching additional columns.
>
>             regards, tom lane
>
>

Björn,

I think, the timing difference you see between 2 queries is caused by delivering to the front-end (PgAdmin) and
displayingall additional columns that you include in the second query (much bigger amount of data to pass from the db
tothe client). 
Pretty sure, if you do explain analyze on both queries, you'll see the same timing, because it'll reflect only db time
withoutwhat's spent on delivering data to the client. 

Regards,
Igor Neyman



pgsql-performance by date:

Previous
From: Björn Wittich
Date:
Subject: Re: extremly bad select performance on huge table
Next
From: Marco Di Cesare
Date:
Subject: Re: Query with large number of joins