Re: Index Scan Costs versus Sort - Mailing list pgsql-performance

From Charlie Savage
Subject Re: Index Scan Costs versus Sort
Date
Msg-id 437398E3.2020900@interserv.com
Whole thread Raw
In response to Re: Index Scan Costs versus Sort  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Index Scan Costs versus Sort
List pgsql-performance
Hi Tom,

 From pg_stats:

schema = "tiger";
tablename = "completechain";
attname = "tlid";
null_frac = 0;
avg_width = 4;
n_distinct = -1;
most_common_vals = ;
most_common_freqs = ;
correlation = 0.155914;

Note that I have default_statistics_target set to 100.  Here is the
first few values from histogram_bounds:


"{102450,2202250,4571797,6365754,8444936,10541593,12485818,14545727,16745594,18421868,20300549,22498643,24114709,26301001,28280632,30370123,32253657,33943046,35898115,37499478,39469054,41868498,43992143,45907830,47826340,49843926,52051798,54409298,56447416,



The tlid column is a US Census bureau ID assigned to each chain in the
US - where a chain is a road segment, river segment, railroad segment,
etc.  The data is loaded on state-by-state basis, and then a
county-by-county basis.  There is no overall ordering to TLIDs, although
perhaps there is some local ordering at the county level (but from a
quick look at the data I don't see any, and the correlation factor
indicates there isn't any if I am interpreting it correctly).

Any other info that would be helpful to see?


Charlie


Tom Lane wrote:
> Charlie Savage <cfis@interserv.com> writes:
>> 1.  Postgresql estimates the index scan will be 50 times more costly
>> than the seq scan (112870376 vs 2229858) yet in fact it only takes 3
>> times longer to execute (2312426 s vs. 768403 s).  My understanding is
>> that postgresql assumes, via the random_page_cost parameter, that an
>> index scan will take 4 times longer than a sequential scan.  So why is
>> the analyzer estimating it is 50 times slower?
>
> The other factors that are likely to affect this are index correlation
> and effective cache size.  It's fairly remarkable that a full-table
> index scan only takes 3 times longer than a seqscan; you must have both
> a high correlation and a reasonably large cache.  You showed us your
> effective_cache_size setting, but what's the pg_stats entry for
> completechain.tlid contain?  Can you quantify what the physical
> ordering of tlid values is likely to be?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

pgsql-performance by date:

Previous
From: Ashok Agrawal
Date:
Subject: Re: Outer Join performance in PostgreSQL
Next
From: Kelly Burkhart
Date:
Subject: Re: 8.x index insert performance