Home > mailing lists

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Improving N-Distinct estimation by ANALYZE
Date	January 6, 2006 20:13:16
Msg-id	87vewx2h7j.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: Improving N-Distinct estimation by ANALYZE ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses	Re: Improving N-Distinct estimation by ANALYZE Re: Improving N-Distinct estimation by ANALYZE
List	pgsql-hackers

Tree view

"Jim C. Nasby" <jnasby@pervasive.com> writes:

> Before we start debating merits of proposals based on random reads, can
> someone confirm that the sampling code actually does read randomly? I
> looked at it yesterday; there is a comment that states that blocks to be
> scanned are passed to the analyze function in physical order, and AFAICT
> the function that chooses blocks does so based strictly on applying a
> probability function to block numbers as it increments a counter. It
> seems that any reading is actually sequential and not random, which
> makes all the random_page_cost hand-waving null and void.

Hm. I'm curious just how much that behaves like a sequential scan actually. I
think I'll do some experiments.

Reading 1% (1267 read, 126733 skipped):      7748264us
Reading 2% (2609 read, 125391 skipped):     12672025us
Reading 5% (6502 read, 121498 skipped):     19005678us
Reading 5% (6246 read, 121754 skipped):     18509770us
Reading 10% (12975 read, 115025 skipped):     19305446us
Reading 20% (25716 read, 102284 skipped):     18147151us
Reading 50% (63656 read, 64344 skipped):     18089229us
Reading 100% (128000 read, 0 skipped):         18173003us

These numbers don't make much sense to me. It seems like 5% is about as slow
as reading the whole file which is even worse than I expected. I thought I was
being a bit pessimistic to think reading 5% would be as slow as reading 20% of
the table.

Anyone see anything wrong my my methodology?



--
greg

Attachment

seqtest.c

pgsql-hackers by date:

From: Tom Lane
Date: 06 January 2006, 17:08:11
Subject: Re: Improving N-Distinct estimation by ANALYZE

From: Josh Berkus
Date: 06 January 2006, 22:20:35
Subject: Re: Improving N-Distinct estimation by ANALYZE

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

Attachment

Previous

Next