Home > mailing lists

Re: [PERFORM] Hash Anti Join performance degradation - Mailing list pgsql-hackers

From	panam
Subject	Re: [PERFORM] Hash Anti Join performance degradation
Date	June 1, 2011 22:50:18
Msg-id	1306968571518-4446629.post@n5.nabble.com Whole thread Raw
In response to	Re: [PERFORM] Hash Anti Join performance degradation (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

I'd like to thank you all for getting this analyzed, especially Tom!
Your rigor is pretty impressive. Seems like otherwise it'd impossible to
maintain a DBS, though.
In the end, I know a lot more of postgres internals and that this
idiosyncrasy (from a user perspective) could happen again. I guess it is my
first time where I actually encountered an unexpected worst case scenario
like this...
Seems it is up to me know to be a bit more creative with query optimzation.
And in the end, it'll turn out to require an architectural change...
As the only thing to achieve is in fact to obtain the last id (currently
still with the constraint that it has to happen in an isolated subquery), i
wonder whether this requirement (obtaining the last id) is worth a special
technique/instrumentation/strategy ( lacking a good word here),  given the
fact that this data has a full logical ordering (in this case even total)
and the use case is quite common I guess.

Some ideas from an earlier post:

panam wrote:
> 
> ...
> This also made me wonder how the internal plan is carried out. Is the
> engine able to leverage the fact that a part/range of the rows ["/index
> entries"] is totally or partially ordered on disk, e.g. using some kind of
> binary search or even "nearest neighbor"-search in that section (i.e. a
> special "micro-plan" or algorithm)? Or is the speed-up "just" because
> related data is usually "nearby" and most of the standard algorithms work
> best with clustered data?
> If the first is not the case, would that be a potential point for
> improvement? Maybe it would even be more efficient, if there were some
> sort of constraints that guarantee "ordered row" sections on the disk,
> i.e. preventing the addition of a row that had an index value in between
> two row values of an already ordered/clustered section. In the simplest
> case, it would start with the "first" row and end with the "last" row (on
> the time of doing the equivalent of "cluster"). So there would be a small
> list saying rows with id x - rows with id y are guaranteed to be ordered
> on disk (by id for example) now and for all times.
> 
Maybe I am completely off the mark but what's your conclusion? To much
effort for small scenarios? Nothing that should be handled on a DB level? A
try to battle the laws of thermodynamics with small technical dodge?

Thanks again
panam

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Re-PERFORM-Hash-Anti-Join-performance-degradation-tp4443803p4446629.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

pgsql-hackers by date:

From: Tom Lane
Date: 01 June 2011, 22:43:01
Subject: Re: vacuum and row type

From: Alvaro Herrera
Date: 01 June 2011, 22:50:21
Subject: Re: creating CHECK constraints as NOT VALID

Re: [PERFORM] Hash Anti Join performance degradation - Mailing list pgsql-hackers

Previous

Next