Home > mailing lists

Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From	Nathan Boley
Subject	Re: Collect frequency statistics for arrays
Date	March 1, 2012 21:59:35
Msg-id	CAHetpQSc-fyp6PyvAXGFgWLEDYswsG6ydAJ9bUdrrd-WBepgsw@mail.gmail.com Whole thread Raw
In response to	Re: Collect frequency statistics for arrays (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

[ sorry Tom, reply all this time... ]

> What do you mean by "storing sequences as arrays"?

So, a simple example is, for transcripts ( sequences of DNA that are
turned into proteins ), we store each of the connected components as
an array of the form:

exon_type in [1,6]
splice_type = [1,3]

and then the array elements are

[ exon_type, splice_type, exon_type ]

~ 99% of the elements are of the form [ [1,3], 1, [1,3] ],

so I almost always get a hash or merge join ( correctly ) but for the
rare junction types ( which are usually more interesting as well ) I
correctly get nest loops with an index scan.

> Can you demonstrate
> that the existing stats are relevant at all to the query you're worried
> about?

Well, if we didn't have mcv's and just relied on ndistinct to estimate
the '=' selectivities, either my low selectivity quals would use the
index, or my high selectivity quals would use a table scan, either of
which would be wrong.

I guess I could wipe out the stats and get some real numbers tonight,
but I can't see how the planner would be able to distinguish *without*
mcv's...

Best,
Nathan

pgsql-hackers by date:

From: Tom Lane
Date: 01 March 2012, 21:46:29
Subject: Re: Collect frequency statistics for arrays

From: Marti Raudsepp
Date: 01 March 2012, 23:13:10
Subject: Re: Caching for stable expressions with constant arguments v6

Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

Previous

Next