Thread: How to get RTREE performance from GIST index?
I have been using Postgres for some years, in particular the RTREE indexes to perform spatial queries on astronomical datasets. I misguidedly got our system manager to install Postgres 8.4 and I find that I can no longer use rtrees - the system gives me a message substituting access method "gist" for obsolete method "rtree" The performance has dropped by at least a factor of 100 (I am not sure how much more, because the relevant bit of my SQL is still running after more than an hour, previously it took a minute or so to do this bit of the script). The relevant bits of SQL I have been using are: CREATE TEMPORARY TABLE cat4p AS SELECT longid, srcid, ra, dec, poserr, BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0), POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox FROM cat4; CREATE INDEX cat4pind ON cat4p USING RTREE(errbox); CREATE TEMPORARY TABLE apair AS SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr FROM avcatpos AS a, cat4p AS c WHERE a.errbox && c.errbox AND gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) < LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr)) AND a.srcid <> c.srcid; It is this latter query, involving the && operator to find where two rectangular boxes overlap, which seems to be taking the huge amount of time. Is there a way of forcing the use of Rtree indexing in v8.4, or any other work-around? Regards -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K.
2009/11/21 Clive Page <clive.page@cantab.net>
I have been using Postgres for some years, in particular the RTREE indexes to perform spatial queries on astronomical datasets. I misguidedly got our system manager to install Postgres 8.4 and I find that I can no longer use rtrees - the system gives me a message
substituting access method "gist" for obsolete method "rtree"
The performance has dropped by at least a factor of 100 (I am not sure how much more, because the relevant bit of my SQL is still running after more than an hour, previously it took a minute or so to do this bit of the script).
The relevant bits of SQL I have been using are:
CREATE TEMPORARY TABLE cat4p AS
SELECT longid, srcid, ra, dec, poserr,
BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0),
POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox
FROM cat4;
CREATE INDEX cat4pind ON cat4p USING RTREE(errbox);
CREATE TEMPORARY TABLE apair AS
SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr
FROM avcatpos AS a, cat4p AS c
WHERE a.errbox && c.errbox AND
gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) <
LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr))
AND a.srcid <> c.srcid;
It is this latter query, involving the && operator to find where two rectangular boxes overlap, which seems to be taking the huge amount of time.
Is there a way of forcing the use of Rtree indexing in v8.4, or any other work-around?
Rtree was reimplemented into GiST as of PostgreSQL version 8.2. There should no advantages of using Rtree, so I'm not sure why you're experiencing problems. Hopefully someone can provide insight into what's causing the slow down.
Thom
Clive Page <clive.page@cantab.net> writes: > Is there a way of forcing the use of Rtree indexing in v8.4, No, because the rtree code is gone entirely. We took it out on the basis of tests showing that the gist implementation performed as well or better. I'm not sure why it's not working for you, but if you can provide a more complete test case, we could look into it. One thing to check into right away is whether the system is even trying to use the index --- what does EXPLAIN show about it? Do you by any chance have EXPLAIN output for the same query on the old system? What was the old PG version, anyway? regards, tom lane
On 22/11/2009 05:40, Tom Lane wrote: > No, because the rtree code is gone entirely. We took it out on the > basis of tests showing that the gist implementation performed as well > or better. I'm not sure why it's not working for you, but if you > can provide a more complete test case, we could look into it. > > One thing to check into right away is whether the system is even > trying to use the index --- what does EXPLAIN show about it? > Do you by any chance have EXPLAIN output for the same query on the > old system? What was the old PG version, anyway? Tom Thanks for your reply. I should have said that I was using v8.1. After I posted my question, I retried with CREATE INDEX ... USING GIST(errbox box_ops) and left it to run overnight. The query using the index, which finds overlaps between rectangular boxes using the && operator, took 10228 seconds, whereas using RTREES in v8.1 it took around 50 seconds. I have several such queries to do, and cannot afford to wait for hours. I discovered the "box_ops" syntax only by reading lots of disparate bits of documentation: it is very unsatisfactory that your indexing options are so very poorly documented. I saw that as well as GIST indexing there is something called GIN indexing but failed to find anything useful about these at all. I tried to use them, but without success. There is no point in having these facilities if they are not documented adequately. I am truly sorry that you made the decision to remove R-trees from Postgres and had no regard for backward compatibility. The availability and high performance of R-trees was one of the main reasons I switched to Postgres and have been using it for the last few years. I realise that if I take the time to experiment and use the EXPLAIN command and play around for a week or two I *might* be able to restore something like the earlier performance, but unfortunately I have a job I want to get done in the next day or two. Fortunately I have a simple work-around: Postgres v8.1 is still installed here, and I'll use it right away. For the longer term, I may have to switch to MySQL, which had R-trees but not implemented very efficiently (the last time I checked). No doubt the new owners of MySQL will have tried hard to get them working properly. I'm truly sorry that you don't take the need for R-tree indexing seriously. I would have thought that geometric queries such as the ones that I've been doing would be more and more important in the real world. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K.
On Sun, Nov 22, 2009 at 10:23:29AM +0000, Clive Page wrote: > Thanks for your reply. I should have said that I was using v8.1. After > I posted my question, I retried with > CREATE INDEX ... USING GIST(errbox box_ops) > and left it to run overnight. The query using the index, which finds > overlaps between rectangular boxes using the && operator, took 10228 > seconds, whereas using RTREES in v8.1 it took around 50 seconds. I have > several such queries to do, and cannot afford to wait for hours. If it really is that much slower, then it's a bug and should be fixed. However, you have not provided not nearly enough information to work out what the problem is. Could you show the EXPLAIN output from 8.1 and 8.4 at least so we have some handle on what your problem is. PostgreSQL is used extensively for geometric queries, see postgis. They abandoned rtree a while back because the GiST rtree support was better, see http://postgis.refractions.net/documentation/manual-1.3/ch03.html#id2570697 You are AFAICR the first person to have a problem is this area, but if you can't take the few minutes needed to run EXPLAIN on before and after then there is zero chance of it being fixed either. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On 22/11/2009 10:44, Martijn van Oosterhout wrote: > PostgreSQL is used extensively for geometric queries, see postgis. They > abandoned rtree a while back because the GiST rtree support was better, Maybe the support is better, but the performance is obviously not. And when there is a difference between under a minute and 3 hours, then performance matters. At least it does to me and my colleagues. > You are AFAICR the first person to have a problem is this area, but if I find it extremely hard to believe that. All I am doing is finding whether pairs of rectangular boxes overlap or not. That is the most trivial use of R-trees possible. Surely someone thought to time that using GIST? > you can't take the few minutes needed to run EXPLAIN on before and > after then there is zero chance of it being fixed either. Unfortunately it isn't a "few minutes". To re-run in v8.1 I have to reload many tables into a different installation using v8.1: some of the tables have a few million rows and hundreds of columns. Then I have change some scripts to add an EXPLAIN command and log the resulting output (rather than getting the results that I actually want). This will take hours. I will try to do it soon, but cannot do it instantly. I have some data that I want to process first. I agree that this is a bug in Postgres - the bug was removing code that worked perfectly well and upon which some users depended. I simply don't understand why the Rtree code could not have been left in there, for those who found that the new-fangled GIST indexing did not work. Regards -- Clive Page
2009/11/22 Clive Page <cgp@star.le.ac.uk>
Unfortunately it isn't a "few minutes". To re-run in v8.1 I have to reload many tables into a different installation using v8.1: some of the tables have a few million rows and hundreds of columns. Then I have change some scripts to add an EXPLAIN command and log the resulting output (rather than getting the results that I actually want). This will take hours. I will try to do it soon, but cannot do it instantly. I have some data that I want to process first.
I agree that this is a bug in Postgres - the bug was removing code that worked perfectly well and upon which some users depended. I simply don't understand why the Rtree code could not have been left in there, for those who found that the new-fangled GIST indexing did not work.
Hi Clive,
Since this is a performance issue, this should probably have been sent to the pgsql-performance mailing list. But in any case, the vast majority of performance issues require an EXPLAIN output, or preferably with ANALYZE also as there is nothing to help diagnose what the query planner it attempting to do.
Please also see: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
It's always of interest to developers if there are corner cases that might identify overlooked scenarios, so it's always appreciated if enough information is provided to recreate the conditions to see whether there is a genuine problem in the software itself.
Regards
Thom
On 22 Nov 2009, at 11:55, Clive Page wrote: > On 22/11/2009 10:44, Martijn van Oosterhout wrote: >> PostgreSQL is used extensively for geometric queries, see postgis. They >> abandoned rtree a while back because the GiST rtree support was better, > > Maybe the support is better, but the performance is obviously not. And when there is a difference between under a minuteand 3 hours, then performance matters. At least it does to me and my colleagues. Before getting all worked up about the performance of GiST indexes, did you verify that your tables were analysed? Did the person doing the upgrade tune the database? Was it tuned the same or differently? It may well be that the resources GiST indexes require aren't exactly the same as what RTrees require, so maybe tuning needsto be different or you need to add some hardware (which means you probably were close to the limits before and wouldlikely have to do this in the near future anyway - still an unpleasant surprise of course). A query going from "mere" minutes to several hours usually points to the resource starvation or a particularly poor queryplan. Don't assume GiST indexes are that much slower than RTrees, they wouldn't have replaced them if that were thecase. That is why people are asking for the query plan (EXPLAIN, or preferably EXPLAIN ANALYSE, but that actually performs the3h query), so that we have an idea where your query is going wrong. In almost all cases we see on this ML the problemis not a bug in Postgres; it usually boils down to lack of maintenance, improper tuning or just plain inefficientqueries. >> you can't take the few minutes needed to run EXPLAIN on before and >> after then there is zero chance of it being fixed either. > > Unfortunately it isn't a "few minutes". To re-run in v8.1 I have to reload many tables into a different installation usingv8.1: some of the tables have a few million rows and hundreds of columns. Then I have change some scripts to add anEXPLAIN command and log the resulting output (rather than getting the results that I actually want). This will take hours. I will try to do it soon, but cannot do it instantly. I have some data that I want to process first. You posted a number of queries in your original mail. Don't those expose the problem? Running them in psql connected to eitherdatabase shouldn't take much time at all. We just would like to see where your performance issues are coming from, we don't necessarily need the results of your actualcalculations for those. > I agree that this is a bug in Postgres - the bug was removing code that worked perfectly well and upon which some usersdepended. I simply don't understand why the Rtree code could not have been left in there, for those who found thatthe new-fangled GIST indexing did not work. I'm not one of the developers, but I expect it was duplicating code or unmaintained and there was no evidence that therewere any regressions when replacing RTree with GiST (ISTR that GiST is in fact some form of RTree) and therefore deemeddeprecated. It is not unusual to remove deprecated features between major versions of a software product. I'm speculating here, but I'm quite convinced you would have seen the same regression in the performance of your databaseif RTree would have still been in the database. It's simply not likely that the difference in index is causing yourtrouble. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4b09295211736876095208!
On 22/11/2009 11:52, Thom Brown wrote: > Since this is a performance issue, this should probably have been sent > to the pgsql-performance mailing list. But in any case, the vast > majority of performance issues require an EXPLAIN output, or preferably > with ANALYZE also as there is nothing to help diagnose what the query > planner it attempting to do. You may think it's a performance issue, but it stems from the decision to remove from Postgres an essential facility, that of generating and using R-trees. I'm currently trying to generate test cases, together with EXPLAIN output. Regards -- Clive Page
On 21 Nov 2009, at 23:57, Clive Page wrote: > CREATE TEMPORARY TABLE cat4p AS > SELECT longid, srcid, ra, dec, poserr, > BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0), > POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox > FROM cat4; > CREATE INDEX cat4pind ON cat4p USING RTREE(errbox); ANALYSE cat4p; > CREATE TEMPORARY TABLE apair AS > SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr > FROM avcatpos AS a, cat4p AS c > WHERE a.errbox && c.errbox AND > gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) < > LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr)) > AND a.srcid <> c.srcid; If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update the planner'sstatistics on their contents. If you don't you get the default query plan that's often not efficient. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4b0929e511732016739697!
On Sun, Nov 22, 2009 at 10:55:21AM +0000, Clive Page wrote: >> You are AFAICR the first person to have a problem is this area, but if > > I find it extremely hard to believe that. All I am doing is finding > whether pairs of rectangular boxes overlap or not. That is the most > trivial use of R-trees possible. Surely someone thought to time that > using GIST? Ofcourse, there were benchmarks, which showed the GiST version to be faster or the same. http://www.sai.msu.su/~megera/postgres/gist/code/rtree/README.rtree_gist Not surprising really, since conceptually the GiST version resembles the old rtree code very closely, which is why people are surprised you're seeing a difference. Hence people as suspecting that the problem lies elsewhere. (GiST is basically the extension of rtree of non-geometric types, there really isn't that much difference between the two). > I agree that this is a bug in Postgres - the bug was removing code that > worked perfectly well and upon which some users depended. I simply > don't understand why the Rtree code could not have been left in there, > for those who found that the new-fangled GIST indexing did not work. I think there would be some disagreements about whether the old code was "working well", it was broken enough that people didn't want to maintain it. Looking forward to your explain output. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On 22/11/2009 12:09, Alban Hertroys wrote: > If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update theplanner's statistics on their contents. If you don't you get the default query plan that's often not efficient. Alban Thanks - I didn't know that. I'll try removing the TEMPORARY tag. Is it documented somewhere that I should have seen? Regards -- Clive Page
On 22/11/2009 12:15, Martijn van Oosterhout wrote: > Looking forward to your explain output. Here it is (I wrapped some of the longer lines as might not have survived the translation to email): Postgres v8.1.0 EXPLAIN SELECT a.longid AS longid, b.longid AS blongid, gcdist(a.ra, a.dec, b.ra, b.dec) AS dist FROM pos AS a, pos AS b WHERE a.errbox && b.errbox AND gcdist(a.ra, a.dec, b.ra, b.dec) < LEAST(0.9*a.dist_nn, 0.9*b.dist_nn, 7.0, 3.0 * (a.poserr + b.poserr) ) AND (a.obsid <> b.obsid OR a.longid = b.longid) ; QUERY PLAN --------------------------------------------------------------- Nested Loop (cost=22.16..1241963555.61 rows=205459449 width=48) Join Filter: ((gcdist("outer".ra, "outer"."dec", "inner".ra, "inner"."dec") < LEAST((0.9::double precision * "outer".dist_nn), (0.9 ::double precision * "inner".dist_nn), 7::double precision, (3::double precision * ("outer".poserr + "inner".poserr)))) AND (("outer". obsid <> "inner".obsid) OR ("outer".longid = "inner".longid))) -> Seq Scan on pos a (cost=0.00..8213.83 rows=351983 width=68) -> Bitmap Heap Scan on pos b (cost=22.16..3469.79 rows=1760 width=68) Recheck Cond: ("outer".errbox && b.errbox) -> Bitmap Index Scan on pos_errbox (cost=0.00..22.16 rows=1760 width=0) Index Cond: ("outer".errbox && b.errbox) (7 rows) Actual timing using v8.1.0: SELECT Time: 71351.102 ms Postgres 8.4.1 EXPLAIN output: --------------------------------------------------------------------------------- Nested Loop (cost=0.00..235836993.78 rows=205459449 width=48) Join Filter: (((a.obsid <> b.obsid) OR (a.longid = b.longid)) AND (gcdist(a.ra, a."dec", b.ra, b."dec") < LEAST((0.9::double precision * a.dist_nn), (0.9::double precision * b.dist_nn), 7::double precision, (3::double precision * (a.poserr + b.poserr))))) -> Seq Scan on pos a (cost=0.00..8032.83 rows=351983 width=68) -> Index Scan using pos_errbox on pos b (cost=0.00..31.27 rows=1760 width=68) Index Cond: (a.errbox && b.errbox) (5 rows) Actual timing using v8.4.1 was 10228 seconds (sorry didn't record the milliseconds). It only worked when I left it running overnight! Regards -- Clive Page
On 22 Nov 2009, at 13:19, Clive Page wrote: > On 22/11/2009 12:09, Alban Hertroys wrote: >> If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update theplanner's statistics on their contents. If you don't you get the default query plan that's often not efficient. > > Alban > > Thanks - I didn't know that. I'll try removing the TEMPORARY tag. > > Is it documented somewhere that I should have seen? It's not just temporary tables, it goes for all tables in fact. The difference is that with normal tables there is time forautovacuum to pick them up as needing maintenance, whereas temporary tables are usually queried immediately after they'recreated so that autovacuum is too late. This specific case for using ANALYSE isn't explicitly documented, it more or less follows from the usage pattern of temporarytables. From the notes on the documentation of the ANALYZE command (http://www.postgresql.org/docs/8.4/interactive/sql-analyze.html): "In the default PostgreSQL configuration, The Autovacuum Daemon takes care of automatic analyzing of tables when they arefirst loaded with data, and as they change throughout regular operation. When autovacuum is disabled, it is a good ideato run ANALYZE periodically, or just after making major changes in the contents of a table." That last line isn't explicit about temporary tables, but the reason for running ANALYZE in both cases is the same. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4b092e5911731012678321!
On 21 Nov 2009, at 23:57, Clive Page wrote: > The relevant bits of SQL I have been using are: > > CREATE TEMPORARY TABLE cat4p AS > SELECT longid, srcid, ra, dec, poserr, > BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0), > POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox > FROM cat4; > CREATE INDEX cat4pind ON cat4p USING RTREE(errbox); Looking closer at this, that errbox calculation looks like its formula wouldn't change between sessions. If you use it frequentlyenough it's a good candidate to put a functional index on or, if your SELECT vs INSERT/UPDATE/DELETE ratio leansto the former, add a column with the value pre-calculated (and indexed of course). You can automate keeping that column up to date by using a few simple BEFORE INSERT and BEFORE UPDATE triggers (they reallyonly need to calculate the box-value and override that column's value). Insert/Update performance will decrease (there'sa function call and an extra calculation after all), but Select performance will probably improve and there's sufficienttime for autovacuum to pick up any changes in the data. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4b09327a11731713516847!
On Sun, Nov 22, 2009 at 12:24:50PM +0000, Clive Page wrote: > On 22/11/2009 12:15, Martijn van Oosterhout wrote: > >> Looking forward to your explain output. > > Here it is (I wrapped some of the longer lines as might not have > survived the translation to email): Ok, very interesting, since this shows that the plan is essentially identical between the two versions. Which kind of rules out problems with statistics and missing ANALYSE. My next thought goes to configuration, in particular work_mem, maintainence_work_mem and shared_buffers. Are they the same between 8.1 and 8.4? Can you give some idea of the density of the rectangle? What would be a typical number of overlapping boxes for this query? Have a nice day, > Postgres v8.1.0 > EXPLAIN SELECT a.longid AS longid, b.longid AS blongid, > gcdist(a.ra, a.dec, b.ra, b.dec) AS dist > FROM pos AS a, pos AS b > WHERE a.errbox && b.errbox > AND gcdist(a.ra, a.dec, b.ra, b.dec) < > LEAST(0.9*a.dist_nn, 0.9*b.dist_nn, 7.0, 3.0 * (a.poserr + b.poserr) ) > AND (a.obsid <> b.obsid OR a.longid = b.longid) ; > QUERY PLAN > --------------------------------------------------------------- > Nested Loop (cost=22.16..1241963555.61 rows=205459449 width=48) > Join Filter: ((gcdist("outer".ra, "outer"."dec", "inner".ra, > "inner"."dec") < > LEAST((0.9::double precision * "outer".dist_nn), (0.9 > ::double precision * "inner".dist_nn), 7::double precision, > (3::double precision * ("outer".poserr + "inner".poserr)))) AND > (("outer". > obsid <> "inner".obsid) OR ("outer".longid = "inner".longid))) > -> Seq Scan on pos a (cost=0.00..8213.83 rows=351983 width=68) > -> Bitmap Heap Scan on pos b (cost=22.16..3469.79 rows=1760 width=68) > Recheck Cond: ("outer".errbox && b.errbox) > -> Bitmap Index Scan on pos_errbox (cost=0.00..22.16 > rows=1760 width=0) > Index Cond: ("outer".errbox && b.errbox) > (7 rows) > Actual timing using v8.1.0: > SELECT > Time: 71351.102 ms > > > Postgres 8.4.1 > EXPLAIN output: > --------------------------------------------------------------------------------- > Nested Loop (cost=0.00..235836993.78 rows=205459449 width=48) > Join Filter: (((a.obsid <> b.obsid) OR (a.longid = b.longid)) AND > (gcdist(a.ra, a."dec", b.ra, b."dec") < LEAST((0.9::double precision > * a.dist_nn), > (0.9::double precision * b.dist_nn), 7::double precision, > (3::double precision * (a.poserr + b.poserr))))) > -> Seq Scan on pos a (cost=0.00..8032.83 rows=351983 width=68) > -> Index Scan using pos_errbox on pos b (cost=0.00..31.27 rows=1760 > width=68) > Index Cond: (a.errbox && b.errbox) > (5 rows) > Actual timing using v8.4.1 was 10228 seconds (sorry didn't record the > milliseconds). > It only worked when I left it running overnight! > > Regards > > -- > Clive Page > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Thanks to all those who responded to my posting yesterday. I have now tried a simple simulation of joining tables with partly overlapping rectangular boxes using Rtrees (with GIST automatically replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the latter a bit faster. But my original data processing script still hangs (or takes 3 hours, much the same thing as far as getting work done is concerned) at various points when using v8.4.1. The identical script works fine using a 8.1.0 server, which fortunately we still have available. I have now inserted ANALYSE table commands before each SELECT that depends upon an R-tree (GIST) index. This doesn't seem to help. It will obviously take a lot of time and effort to track this down. For the moment I shall stick to using v8.1.0, as there's really no alternative. I still think it a great pity that rather than merely deprecating R-tree indexing or making GIST the default but still allowing R-trees to be used if one really wanted them, you actually removed Rtrees from the code. No doubt some tests show GIST to work and work faster than Rtrees in test cases; clearly from my experience when using complicated real-world data that's not necessarily true. Maybe there's some magic spell that can be used to restore the earlier performance, but I really don't have time at present to do the necessary experimenting. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K.
Thanks to all those who responded to my posting yesterday. I have now tried a simple simulation of joining tables with partly overlapping rectangular boxes using Rtrees (with GIST automatically replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the latter a bit faster. But my original data processing script still hangs (or takes 3 hours, much the same thing as far as getting work done is concerned) at various points when using v8.4.1. The identical script works fine using a 8.1.0 server, which fortunately we still have available. I have now inserted ANALYSE table commands before each SELECT that depends upon an R-tree (GIST) index. This doesn't seem to help. It will obviously take a lot of time and effort to track this down. For the moment I shall stick to using v8.1.0, as there's really no alternative. I still think it a great pity that rather than merely deprecating R-tree indexing or making GIST the default but still allowing R-trees to be used if one really wanted them, you actually removed Rtrees from the code. No doubt some tests show GIST to work and work faster than Rtrees in test cases; clearly from my experience when using complicated real-world data that's not necessarily true. Maybe there's some magic spell that can be used to restore the earlier performance, but I really don't have time at present to do the necessary experimenting. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. -- Clive Page
Alban Hertroys wrote: > "In the default PostgreSQL configuration, The Autovacuum Daemon takes > care of automatic analyzing of tables when they are first loaded with > data, and as they change throughout regular operation. When autovacuum > is disabled, it is a good idea to run ANALYZE periodically, or just > after making major changes in the contents of a table." > > That last line isn't explicit about temporary tables, but the reason > for running ANALYZE in both cases is the same. Actually, autovacuum doesn't process temp tables at all because it cannot get to them; they might live solely in the creating process' private memory area. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
> Actually, autovacuum doesn't process temp tables at all because it > cannot get to them; they might live solely in the creating process' > private memory area. > Does that mean that, in between creating a temporary table and actually using it in a complicate query, it is desirable to run an ANALYZE command on it? I haven't been doing that, because I didn't know. Regards -- Clive Page
On 2 Dec 2009, at 21:12, Clive Page wrote: >> Actually, autovacuum doesn't process temp tables at all because it >> cannot get to them; they might live solely in the creating process' >> private memory area. > > Does that mean that, in between creating a temporary table and actually using it in a complicate query, it is desirableto run an ANALYZE command on it? > > I haven't been doing that, because I didn't know. Only if you created an index on it. If you didn't it doesn't matter as in that case you always get a sequential scan, unlessI'm mistaken. Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:737,4b16f67511731227681557!