Thread: planner/optimizer question
Hi, I have a query which I think should be using an index all of the time but postgres only uses the index part of the time. The index (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed first followed by the selected column (support_person_id). Wouldn't the most efficient plan be to scan the index regardless of crm_id because the only columns needed are in the index? Below is the table, 2 queries showing the difference in plans, followed by the record distribution of ticket_crm_map. I first did a 'vacuum analyze' and am running postgres 7.4.2. Thanks, Brad athenapost=> \d ticket_crm_map Table "public.ticket_crm_map" Column | Type | Modifiers ------------------------+-----------------------------+--------------------- ----------------------- tcrm_map_id | integer | not null ticket_id | integer | not null crm_id | integer | not null support_person_id | integer | not null escalated_to_person_id | integer | not null status | character varying(50) | not null default 'Open'::character varying close_date | timestamp without time zone | updated_date | timestamp without time zone | updated_by | character varying(255) | created_date | timestamp without time zone | created_by | character varying(255) | additional_info | text | subject | character varying(255) | Indexes: "ticket_crm_map_pkey" primary key, btree (tcrm_map_id) "ticket_crm_map_crm_id_key" unique, btree (crm_id, ticket_id) "ticket_crm_map_crm_id_suppid" btree (crm_id, support_person_id) "ticket_crm_map_status" btree (status) "ticket_crm_map_ticket_id" btree (ticket_id) Foreign-key constraints: "$1" FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id) "$2" FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id) "$3" FOREIGN KEY (support_person_id) REFERENCES person(person_id) "$4" FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id) "$5" FOREIGN KEY (status) REFERENCES ticket_status(status) athenapost=> explain analyze select distinct support_person_id from ticket_crm_map where crm_id = 7; QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------- Unique (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245 rows=20 loops=1) -> Sort (cost=1262.99..1264.13 rows=456 width=4) (actual time=15.332..16.605 rows=2275 loops=1) Sort Key: support_person_id -> Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map (cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275 loops=1) Index Cond: (crm_id = 7) Total runtime: 18.553 ms (6 rows) Time: 20.598 ms athenapost=> explain analyze select distinct support_person_id from ticket_crm_map where crm_id = 1; QUERY PLAN ---------------------------------------------------------------------------- ----------------------------------------------------- Unique (cost=10911.12..11349.26 rows=32 width=4) (actual time=659.102..791.517 rows=24 loops=1) -> Sort (cost=10911.12..11130.19 rows=87628 width=4) (actual time=659.090..713.285 rows=93889 loops=1) Sort Key: support_person_id -> Seq Scan on ticket_crm_map (cost=0.00..3717.25 rows=87628 width=4) (actual time=0.027..359.299 rows=93889 loops=1) Filter: (crm_id = 1) Total runtime: 814.601 ms (6 rows) Time: 817.095 ms athenapost=> select count(*), crm_id from ticket_crm_map group by crm_id; count | crm_id -------+-------- 2554 | 63 129 | 25 17 | 24 110 | 23 74 | 22 69 | 21 2 | 20 53 | 82 10 | 17 16 | 81 46637 | 16 14 | 80 2 | 15 1062 | 79 87 | 78 93 | 77 60 | 44 363 | 76 225 | 10 4 | 74 83 | 9 27 | 73 182 | 8 2275 | 7 15 | 71 554 | 6 44 | 70 631 | 5 37 | 4 190 | 3 112 | 2 93889 | 1 (32 rows) Time: 436.697 ms
Hi, You should try the next queries: select support_person_id from ticket_crm_map where crm_id = 7 GROUP BY support_person_id; select support_person_id from ticket_crm_map where crm_id = 1 GROUP BY support_person_id; It can use the 'ticket_crm_map_crm_id_suppid' index. Generally the Postgres use an k-column index if columns of your conditions are prefix of the index column. For example: CREATE INDEX test_idx on test(col1,col2,col3,col4); SELECT * FROM test WHERE col1=3 AND col2=13; -- This can use the index. But the next queries cannot use the index: SELECT * FROM test WHERE col1=3 AND col3=13;. SELECT * FROM test WHERE col2=3; If you have problem with seq_scan or sort, you can disable globally and locally: SET enable_seqscan=0; SET enable_sort = 0; Regards, Antal Attila
brad-pgperf@duttonbros.com writes: > ... Wouldn't the most efficient plan be to scan the index regardless > of crm_id because the only columns needed are in the index? No. People coming from other databases often have the misconception that queries can be answered by looking only at an index. That is never true in Postgres because row validity info is only stored in the table; so we must always visit the table entry to make sure the row is still valid/visible for the current query. Accordingly, columns added to the index that aren't constrained by the WHERE clause are not very useful ... regards, tom lane
I know you will shoot me down, but... Why is there an entry in the index for a row if the row is not valid? Wouldn't it be better for the index entry validity to track the row validity. If a particular data value for a query (join, where etc.) can be satisfied by the index entry itself this would be a big performance gain. Cheers, Gary. On 28 Apr 2004 at 0:27, Tom Lane wrote: > brad-pgperf@duttonbros.com writes: > > ... Wouldn't the most efficient plan be to scan the index regardless > > of crm_id because the only columns needed are in the index? > > No. People coming from other databases often have the misconception > that queries can be answered by looking only at an index. That is never > true in Postgres because row validity info is only stored in the table; > so we must always visit the table entry to make sure the row is still > valid/visible for the current query. > > Accordingly, columns added to the index that aren't constrained by the > WHERE clause are not very useful ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
> Why is there an entry in the index for a row if the row is not valid? > Wouldn't it be better for the index entry validity to track the row validity. > If a particular data value for a query (join, where etc.) can be satisfied > by the index entry itself this would be a big performance gain. For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big performance loss. Chris
I can understand the performance loss on non-selects for keeping the index validity state tracking the row validity, but would that outweigh the performance gains on selects? Depends on your mix of selects to non selects I guess, but other database systems seem to imply that keeping the index on track is worth it overall. Cheers, Gary. On 28 Apr 2004 at 15:04, Christopher Kings-Lynne wrote: > > Why is there an entry in the index for a row if the row is not valid? > > Wouldn't it be better for the index entry validity to track the row validity. > > If a particular data value for a query (join, where etc.) can be satisfied > > by the index entry itself this would be a big performance gain. > > For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big > performance loss. > > Chris >
> I can understand the performance loss on non-selects for keeping the > index validity state tracking the row validity, but would that outweigh the > performance gains on selects? Depends on your mix of selects to non > selects I guess, but other database systems seem to imply that keeping > the index on track is worth it overall. Yes, some sort of flag on index creation would be sweet :) Chris
On Wed, 28 Apr 2004 07:35:41 +0100, "Gary Doades" <gpd@gpdnet.co.uk> wrote: >Why is there an entry in the index for a row if the row is not valid? Because whether a row is seen as valid or not lies in the eye of the transaction looking at it. Full visibility information is stored in the heap tuple header. The developers' consensus is that this overhead should not be in every index tuple. Servus Manfred
Manfred Koizar <mkoi-pg@aon.at> writes: > On Wed, 28 Apr 2004 07:35:41 +0100, "Gary Doades" <gpd@gpdnet.co.uk> > wrote: >> Why is there an entry in the index for a row if the row is not valid? > Because whether a row is seen as valid or not lies in the eye of the > transaction looking at it. Full visibility information is stored in the > heap tuple header. The developers' consensus is that this overhead > should not be in every index tuple. Storing that information would at least double the overhead space used for each index tuple. The resulting index bloat would significantly slow index operations by requiring more I/O. So it's far from clear that this would be a win, even for those who care only about select speed. regards, tom lane
On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [ ... visibility information in index tuples ... ] >Storing that information would at least double the overhead space used >for each index tuple. The resulting index bloat would significantly >slow index operations by requiring more I/O. So it's far from clear >that this would be a win, even for those who care only about select >speed. While the storage overhead could be reduced to 1 bit (not a joke) we'd still have the I/O overhead of locating and updating index tuples for every heap tuple deleted/updated. Servus Manfred
On 29 Apr 2004 at 19:03, Manfred Koizar wrote: > While the storage overhead could be reduced to 1 bit (not a joke) we'd > still have the I/O overhead of locating and updating index tuples for > every heap tuple deleted/updated. But this is what a lot of DBMSs do and seem to do well enough. I can see that the MVCC system gives additional problems, but maybe it shouldn't be dismissed so lightly. Coming from a MS SQLServer platform I have spent a lot of time optimising SQL in PostgreSQL to be comparable to SQLServer. For the most part I have done this, but some things are just slower in PostgreSQL. Recently I have been looking at raw performance (CPU, IO) rather than the plans. I have some test queries that (as far as I can determine) use the same access plans on PostgreSQL and SQLServer. Getting to the detail, an index scan of an index on a integer column (222512 rows) takes 60ms on SQLServer and 540ms on PostgreSQL. A full seq table scan on the same table without the index on the other hand takes 370ms in SQLServer and 420ms in PostgreSQL. I know that the platforms are different (windows 2000 vs Linux 2.6.3), but the statement was executed several times to make sure the index and data was in cache (no disk io) on both systems. Same data, Same CPU, Same disks, Same memory, Same motherboards. The only thing I can think of is the way that the index scan is performed on each platform, SQLServer can use the data directly from the index. This makes the biggest difference in multi join statements where several of the intermediate tables do not need to be accessed at all, the data is contained in the join indexes. This results in almost an order of magnitude performance difference for the same data. I would be nice to get a feel for how much performance loss would be incurred in maintaining the index flags against possible performance gains for getting the data back out again. Regards, Gary.
> I would be nice to get a feel for how much performance loss would be incurred in > maintaining the index flags against possible performance gains for getting the data back > out again. I guess the real question is, why maintain index flags and not simply drop the index entry altogether? A more interesting case would be to have the backend process record index tuples that it would invalidate (if committed), then on commit send that list to a garbage collection process. It's still vacuum -- just the reaction time for it would be much quicker.
> > I guess the real question is, why maintain index flags and not simply > drop the index entry altogether? > > A more interesting case would be to have the backend process record > index tuples that it would invalidate (if committed), then on commit > send that list to a garbage collection process. > > It's still vacuum -- just the reaction time for it would be much > quicker. > This was my original question. I guess the problem is with MVCC. The row may have gone from your current view of the table but not from someone elses. I don't (yet) understand the way it works to say for sure, but I still think it is worth pursuing further for someone who does know the deep stuff. They seem to have concluded that it is not worth it however. Cheers, Gary.
while you weren't looking, Gary Doades wrote: > Recently I have been looking at raw performance (CPU, IO) > rather than the plans. I have some test queries that (as far > as I can determine) use the same access plans on PostgreSQL > and SQLServer. Getting to the detail, an index scan of an > index on a integer column (222512 rows) takes 60ms on > SQLServer and 540ms on PostgreSQL. After a recent power outage, I had the opportunity to watch both PostgreSQL and MS SQL come back from forced shutdowns (clean, though there were active connections, in one case a bulk insert). PostgreSQL was available and responsive as soon as the postmaster had started. MS SQL, on the other hand, took the better part of an hour to become remotely usable again -- on a radically faster machine (Dell 6650, versus the 6450 we run PostgreSQL on). Digging a bit, I noted that once MS SQL was up again, it was using nearly 2GB main memory even when more or less idle. From this, and having observed the performance differences between the two, I'm left with little alternative but to surmise that part of MS SQL's noted performance advantage [1] is due to its forcibly storing its indices in main memory. Its startup lag (during which it was utterly unusable; even SELECTs blocked) could be accounted for by reindexing the tables. [2] Granted, this is only a hypothesis, is rather unverifyable, and probably belongs more on ADVOCACY than it does PERFORM, but it seemed relevant. It's also entirely possible your indices are using inaccurate statistical information. Have you ANALYZEd recently? /rls [1] Again, at least in our case, the comparison is entirely invalid, as MS SQL gets a hell of a lot more machine than PostgreSQL. Even so, for day-to-day work and queries, even our DBA, an until-recently fervent MS SQL advocate can't fault PostgreSQL's SELECT, INSERT or DELETE performance. We still can't get UPDATEs (at least bulk such) to pass muster. [2] This is further supported by having observed MS SQL run a "recovery process" on databases that were entirely unused, even for SELECT queries, at the time of the outage. The only thing it might conceivably need to recover on them is in-memory indices that were lost when power was lost. -- Rosser Schwarz Total Card, Inc.
> It's also entirely possible your indices are using inaccurate > statistical information. Have you ANALYZEd recently? > In this example the statistics don't matter. The plans used were the same for MSSQL and Postgres. I was trying to eliminate the difference in plans between the two, which obviously does make a difference, sometimes in MSSQL favour and sometimes the other way round. Both systems, having decided to do the same index scan, took noticably different times. The Postgres database was fully vacuumed and analysed anyway. I agree about MSSQL recovery time. it sucks. This is why they are making a big point about the improved recovery time in "yukon". Although the recovery time is important, I see this as an exception, whereas at the moment I am interested in the everyday. Cheers, Gary.
Gary, > In this example the statistics don't matter. The plans used were the same for > MSSQL and Postgres. I was trying to eliminate the difference in plans > between the two, which obviously does make a difference, sometimes in > MSSQL favour and sometimes the other way round. Both systems, having > decided to do the same index scan, took noticably different times. The > Postgres database was fully vacuumed and analysed anyway. It's also quite possble the MSSQL simply has more efficient index scanning implementation that we do. They've certainly had incentive; their storage system sucks big time for random lookups and they need those fast indexes. (just try to build a 1GB adjacency list tree on SQL Server. I dare ya). Certainly the fact that MSSQL is essentially a single-user database makes things easier for them. They don't have to maintain multiple copies of the index tuples in memory. I think that may be our main performance loss. -- -Josh Berkus Aglio Database Solutions San Francisco
On 29 Apr 2004 at 15:35, Kenneth Marshall wrote: > Did you try to cluster based on the index? > > --Ken Yes, This speeds up the index scan a little (12%). This to me just reinforces the overhead that subsequently having to go and fetch the data tuple actually has on the performance. Cheers, Gary.
On 29 Apr 2004 at 13:54, Josh Berkus wrote: > Gary, > > > It's also quite possble the MSSQL simply has more efficient index scanning > implementation that we do. They've certainly had incentive; their storage > system sucks big time for random lookups and they need those fast indexes. > (just try to build a 1GB adjacency list tree on SQL Server. I dare ya). > > Certainly the fact that MSSQL is essentially a single-user database makes > things easier for them. They don't have to maintain multiple copies of the > index tuples in memory. I think that may be our main performance loss. > Possibly, but MSSQL certainly uses data from indexes and cuts out the subsequent (possibly random seek) data fetch. This is also why the "Index Tuning Wizard" often recommends multi column compound indexes in some cases. I've tried these recommendations on occasions and they certainly speed up the selects significantly. If anyhing the index scan on the new compound index must be slower then the original single column index and yet it still gets the data faster. This indicates to me that it is not the scan (or IO) performance that is making the difference, but not having to go get the data row. Cheers, Gary.
"Gary Doades" <gpd@gpdnet.co.uk> writes: > In this example the statistics don't matter. Don't they? A prior poster mentioned that he thought MSSQL tries to keep all its indexes in memory. I wonder whether you are giving Postgres a fair chance to do the same. What postgresql.conf settings are you using? regards, tom lane
On 29 Apr 2004 at 17:54, Tom Lane wrote:
> "Gary Doades" <gpd@gpdnet.co.uk> writes:
> > In this example the statistics don't matter.
>
> Don't they?
>
> A prior poster mentioned that he thought MSSQL tries to keep all its
> indexes in memory. I wonder whether you are giving Postgres a fair
> chance to do the same. What postgresql.conf settings are you using?
>
> regards, tom lane
As far as I understand it the statistics only contribute to determining the query plan. Once the access methods are determined, the stats don't matter during the running of the query.
I believe I have given Postgres exactly the same chance. The data is small enough to fit into RAM (all the tables in the query add up to around 50meg) and I executed the query several times to get a consistent figure for the explain analyze.
Having picked out an index scan as being the highest time user I concentrated on that in this case and compared the same index scan on MSSQL. At least MSSQL reported it as an index scan on the same index for the same number of rows.
There was nothing wrong with the query plan that Postgres used. As far as I could see it was probably the best one to use, it just physically took longer than the same access plan on MSSQL.
The query and plan are included below, the main thing I was looking at was the index scan on staff_booking_pkey being 676ms long.
The only postgresql.conf parameters changed from the default are:
shared_buffers = 3000
sort_mem = 4096
effective_cache_size = 15000
default_statistics_target = 100
There was no disk IO (above the small background IO) during the final run of the query as reported by vmstat (Task Mangler on Windows).
SELECT B.CONTRACT_ID,SUM(R.DURATION+1)/60.0 AS SUMDUR FROM SEARCH_REQT_RESULT TSR
JOIN STAFF_BOOKING B ON (B.STAFF_ID = TSR.STAFF_ID)
JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID)
JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND BP.BOOKING_DATE BETWEEN '2004-04-12' AND '2004-04-18' AND TSR.SEARCH_ID = 8 GROUP BY B.CONTRACT_ID
QUERY PLAN
HashAggregate (cost=11205.80..11209.81 rows=401 width=6) (actual time=1179.729..1179.980 rows=50 loops=1)
-> Nested Loop (cost=326.47..11203.79 rows=401 width=6) (actual time=39.700..1177.149 rows=652 loops=1)
-> Hash Join (cost=326.47..9990.37 rows=401 width=8) (actual time=39.537..1154.807 rows=652 loops=1)
Hash Cond: ("outer".staff_id = "inner".staff_id)
-> Merge Join (cost=320.39..9885.06 rows=3809 width=12) (actual time=38.316..1143.953 rows=4079 loops=1)
Merge Cond: ("outer".booking_id = "inner".booking_id)
-> Index Scan using staff_booking_pkey on staff_booking b (cost=0.00..8951.94 rows=222612 width=16) (actual time=0.218..676.219 rows=222609 loops=1)
-> Sort (cost=320.39..329.91 rows=3808 width=4) (actual time=26.225..32.754 rows=4079 loops=1)
Sort Key: bp.booking_id
-> Index Scan using booking_plan_idx2 on booking_plan bp (cost=0.00..93.92 rows=3808 width=4) (actual time=0.223..14.186 rows=4079 loops=1)
Index Cond: ((booking_date >= '2004-04-12'::date) AND (booking_date <= '2004-04-18'::date))
-> Hash (cost=5.59..5.59 rows=193 width=4) (actual time=1.139..1.139 rows=0 loops=1)
-> Index Scan using fk_idx_search_reqt_result on search_reqt_result tsr (cost=0.00..5.59 rows=193 width=4) (actual time=0.213..0.764 rows=192 loops=1)
Index Cond: (search_id = 8)
-> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..3.01 rows=1 width=6) (actual time=0.023..0.025 rows=1 loops=652)
Index Cond: (r.reqt_id = "outer".reqt_id)
Total runtime: 1181.239 ms
Cheers,
Gary.
> > Having picked out an index scan as being the highest time user I > concentrated on that in this case and compared the same index scan on > MSSQL. At least MSSQL reported it as an index scan on the same index > for the same number of rows. > I should have also pointed out that MSSQL reported that same index scan as taking 65% of the overall query time. It was just "faster". The overall query took 103ms in MSSQL. Gary.
Josh Berkus <josh@agliodbs.com> writes: > Certainly the fact that MSSQL is essentially a single-user database makes > things easier for them. Our recent testing (cf the "Xeon" thread) says that the interlocking we do to make the world safe for multiple backends has a fairly high cost (at least on some hardware) compared to the rest of the work in scenarios where you are doing zero-I/O scans of data already in memory. Especially so for index scans. I'm not sure this completely explains the differential that Gary is complaining about, but it could be part of it. Is it really true that MSSQL doesn't support concurrent operations? regards, tom lane
On Fri, 30 Apr 2004, Gary Doades wrote: > I should have also pointed out that MSSQL reported that same index scan > as taking 65% of the overall query time. It was just "faster". The > overall query took 103ms in MSSQL. Are your results based on a single client accessing the database and no concurrent updates? Would adding more clients, and maybe having some client that updates/inserts into the tables, still make mssql faster then pg? Maybe it's so simple as pg being optimized for more concurrent users then mssql? I'm just asking, I don't know much about the inner workings of mssql. -- /Dennis Björklund
On 29 Apr 2004 at 19:17, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > Certainly the fact that MSSQL is essentially a single-user database makes > > things easier for them. > > Our recent testing (cf the "Xeon" thread) says that the interlocking we > do to make the world safe for multiple backends has a fairly high cost > (at least on some hardware) compared to the rest of the work in > scenarios where you are doing zero-I/O scans of data already in memory. > Especially so for index scans. I'm not sure this completely explains > the differential that Gary is complaining about, but it could be part of > it. Is it really true that MSSQL doesn't support concurrent operations? > > regards, tom lane As far as I am aware SQLSever supports concurrent operations. It certainly creates more threads for each connection. None of my observations of the system under load (50 ish concurrent users, 150 ish connections) suggest that it is serializing queries. These tests are currentl on single processor Athlon XP 2000+ systems. Regards, Gary.
On 30 Apr 2004 at 7:26, Dennis Bjorklund wrote: > On Fri, 30 Apr 2004, Gary Doades wrote: > > > I should have also pointed out that MSSQL reported that same index scan > > as taking 65% of the overall query time. It was just "faster". The > > overall query took 103ms in MSSQL. > > Are your results based on a single client accessing the database and no > concurrent updates? > > Would adding more clients, and maybe having some client that > updates/inserts into the tables, still make mssql faster then pg? Maybe > it's so simple as pg being optimized for more concurrent users then mssql? > > I'm just asking, I don't know much about the inner workings of > mssql. > > -- > /Dennis Björklund > At the moment it is difficult to set up many clients for testing concurrent stuff. In the past I have had several SQLServer clients under test, mainly select queries. MSSQL can certainly execute queries while other queries are still running in the background. Our production app is fairly well biased towards selects. Currently it is about 70% selects, 20% inserts, 6% deletes and 4% updates. Very few updates are more than one row based on the primary key. Over 90% of the time spend running SQL is in select queries. My limited concurrent testing on Postgres gives very good performance on updates, inserts, deletes, but it is suffering on the selects in certain areas which why I have been concentrating my efforts on that area. Having got similar (or the same) access plans in both Postgres and MSSQL I was getting down to the next level of checking what was going on when executing the already planned query. I do have another database system I could try. Sybase SQLAnywhere. This is not the original Sybase Entrerprise which has the same roots as MSSQL. In the past my testing suggested that SQLAnywhere performance was as godd or better than MSSQL. I mey try to set it up with the same data in these tests for a more detailed comparison. Regards, Gary.
On Apr 30, 2004, at 3:01 AM, Gary Doades wrote: [ pg query plan, etc ] I wonder if other parts of the plan are affecting the speed. I've recently run into a case where a merge join plan was chosen for this query, which took 11 seconds to execute. Forcing it to pick a nested loop join dropped it to 3. (Updating my default_statistics_target to 500 caused the planner to choose nested loop join) So, is the plan really the same? A better comparision query may be a simple "select a from mytable where a between foo and bar" to get an index scan. In that case its a straight up, vanilla index scan. Nothing else getting in the way. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Manfred Koizar wrote: > On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> [ ... visibility information in index tuples ... ] >> >> Storing that information would at least double the overhead space used >> for each index tuple. The resulting index bloat would significantly >> slow index operations by requiring more I/O. So it's far from clear >> that this would be a win, even for those who care only about select >> speed. > > While the storage overhead could be reduced to 1 bit (not a joke) You mean adding an isLossy bit and only where it is set the head tuple has to be checked for visibility, if it is not set the head tuple does not have to be checked? > we'd > still have the I/O overhead of locating and updating index tuples for > every heap tuple deleted/updated. Would there be additional I/O for the additional bit in the index tuple (I am unable to find the layout of index tuple headers in the docs)? Jochem -- I don't get it immigrants don't work and steal our jobs - Loesje
On 30 Apr 2004 at 8:32, Jeff wrote: > > A better comparision query may be a simple "select a from mytable > where a between foo and bar" to get an index scan. In that case its a > straight up, vanilla index scan. Nothing else getting in the way. > Yes, you're right and I have done this just to prove to myself that it is the index scan that is the bottleneck. I have some complex SQL that executes very quickly with Postgres, similar to MSSQL, but the index scans in most of those only touch a few rows for a few loops. It seems to be a problem when the index scan is scanning very many rows and for each of these it has to go to the table just to find out if the index it just looked at is still valid. Gary.
On 30 Apr 2004 at 9:37, Kevin Barnard wrote: > > I was always under the impression that MSSQL used leaf and row level locking and therefore > was not a concurrent, in the same sense that postgres is, database. It would still allow for > concurrent connections and such but updates will get blocked/ delayed. I might be wrong. > Ultimately you may be right. I don't know enough about SQLServer internals to say either way. Anyway, most of our system is in selects for 70% of the time. I could try and set up a test for this when I get a bit more time. Unfortunately I suspect that this topic won't get taken much further. In order to test this it would mean modifying quite a bit of code. Whether putting additional info in the index header and not visiting the data row if all the required data is in the index would be beneficial would require quite a bit of work by someone who knows more than I do. I reckon that no-one has the time to do this at the moment. Regards, Gary.
On Fri, 30 Apr 2004, Gary Doades wrote: > Yes, you're right and I have done this just to prove to myself that it > is the index scan that is the bottleneck. I have some complex SQL that > executes very quickly with Postgres, similar to MSSQL, but the index > scans in most of those only touch a few rows for a few loops. It seems > to be a problem when the index scan is scanning very many rows and for > each of these it has to go to the table just to find out if the index it > just looked at is still valid. > Another way to speed this up is the TODO item: "Use bitmaps to fetch heap pages in sequential order" For an indexscan that fetches a number of rows those rows may be anywhere in the base table so as each index entry is found it fetches the corresponding table row from the heap. This is not ideal because you can be jumping around in the heap and end up fetching the same page multiple times because table rows are in the same page, but were found in different places in the index. Kris Jurka
On Fri, 30 Apr 2004 19:46:24 +0200, Jochem van Dieten <jochemd@oli.tudelft.nl> wrote: >> While the storage overhead could be reduced to 1 bit (not a joke) > >You mean adding an isLossy bit and only where it is set the head >tuple has to be checked for visibility, if it is not set the head >tuple does not have to be checked? Yes, something like this. Actually I imagined it the other way round: a visible-to-all flag similar to the existing dead-to-all flag (search for LP_DELETE and ItemIdDeleted in nbtree.c). >> we'd >> still have the I/O overhead of locating and updating index tuples for >> every heap tuple deleted/updated. > >Would there be additional I/O for the additional bit in the index >tuple (I am unable to find the layout of index tuple headers in >the docs)? Yes, the visible-to-all flag would be set as a by-product of an index scan, if the heap tuple is found to be visible to all active transactions. This update is non-critical and, I think, not very expensive. Deleting (and hence updating) a tuple is more critical, regarding both consistency and performance. We'd have to locate all index entries pointing to the heap tuple and set their visible-to-all flags to false.
Manfred Koizar <mkoi-pg@aon.at> writes: > Yes, the visible-to-all flag would be set as a by-product of an index > scan, if the heap tuple is found to be visible to all active > transactions. This update is non-critical Oh really? I think you need to think harder about the transition conditions. Dead-to-all is reasonably safe to treat as a hint bit because *it does not ever need to be undone*. Visible-to-all does not have that property. regards, tom lane
Tom Lane wrote: > Manfred Koizar <mkoi-pg@aon.at> writes: >> >> Yes, the visible-to-all flag would be set as a by-product of an index >> scan, if the heap tuple is found to be visible to all active >> transactions. This update is non-critical > > Oh really? I think you need to think harder about the transition > conditions. > > Dead-to-all is reasonably safe to treat as a hint bit because *it does > not ever need to be undone*. Visible-to-all does not have that > property. Yes, really :-) When a tuple is inserted the visible-to-all flag is set to false. The effect of this is that every index scan that finds this tuple has to visit the heap to verify visibility. If it turns out the tuple is not only visible to the current transaction, but to all current transactions, the visible-to-all flag can be set to true. This is non-critical, because if it is set to false scans will not miss the tuple, they will just visit the heap to verify visibility. The moment the heap tuple is updated/deleted the visible-to-all flag needs to be set to false again in all indexes. This is critical, and the I/O and (dead)lock costs of unsetting the visible-to-all flag are unknown and might be big enough to ofset any advantage on the selects. But I believe that for applications with a "load, select, drop" usage pattern (warehouses, archives etc.) having this visible-to-all flag would be a clear winner. Jochem -- I don't get it immigrants don't work and steal our jobs - Loesje
On 1 May 2004 at 13:18, Jochem van Dieten wrote: > Yes, really :-) > > When a tuple is inserted the visible-to-all flag is set to false. > The effect of this is that every index scan that finds this tuple > has to visit the heap to verify visibility. If it turns out the > tuple is not only visible to the current transaction, but to all > current transactions, the visible-to-all flag can be set to true. > This is non-critical, because if it is set to false scans will > not miss the tuple, they will just visit the heap to verify > visibility. > > The moment the heap tuple is updated/deleted the visible-to-all > flag needs to be set to false again in all indexes. This is > critical, and the I/O and (dead)lock costs of unsetting the > visible-to-all flag are unknown and might be big enough to ofset > any advantage on the selects. > > But I believe that for applications with a "load, select, drop" > usage pattern (warehouses, archives etc.) having this > visible-to-all flag would be a clear winner. > > Jochem > If needs be this index maintenance could be set as a configuration option. It is likely that database usage patterns are reasonably well known for a particular installation. This option could be set on or off dependant on typical transactions. In my case with frequent large/complex selects and few very short insert/updates I think it could be a big performance boost. If it works :-) Regards, Gary.
Jochem van Dieten <jochemd@oli.tudelft.nl> writes: > The moment the heap tuple is updated/deleted the visible-to-all > flag needs to be set to false again in all indexes. This is > critical, Exactly. This gets you out of the hint-bit semantics and into a ton of interesting problems, such as race conditions. (Process A determines that tuple X is visible-to-all, and goes to update the index tuple. Before it can reacquire lock on the index page, process B updates the heap tuple and visits the index to clear the flag bit. Once A obtains lock it will set the flag bit. Oops.) Basically what you are buying into with such a thing is multiple copies of critical state. It may be only one bit rather than several words, but updating it is no less painful than if it were a full copy of the tuple's commit status. regards, tom lane
On Sat, 01 May 2004 13:18:04 +0200, Jochem van Dieten <jochemd@oli.tudelft.nl> wrote: >Tom Lane wrote: >> Oh really? I think you need to think harder about the transition >> conditions. Indeed. >> >> Dead-to-all is reasonably safe to treat as a hint bit because *it does >> not ever need to be undone*. Visible-to-all does not have that >> property. > >Yes, really :-) No, not really :-( As Tom has explained in a nearby message his concern is that -- unlike dead-to-all -- visible-to-all starts as false, is set to true at some point in time, and is eventually set to false again. Problems arise if one backend wants to set visible-to-all to true while at the same time another backend wants to set it to false. This could be curable by using a second bit as a deleted flag (might be even the same bit that's now used as dead-to-all, but I'm not sure). An index tuple having both the visible flag (formerly called visible-to-all) and the deleted flag set would cause a heap tuple access to check visibility. But that leaves the question of what to do after the deleting transaction has rolled back. I see no clean way from the visible-and-deleted state to visible-to-all. This obviously needs another round of hard thinking ...
Manfred Koizar said: > > As Tom has explained in a nearby message his concern is that -- > unlike dead-to-all -- visible-to-all starts as false, is set to true > at some point in time, and is eventually set to false again. > Problems arise if one backend wants to set visible-to-all to true > while at the same time another backend wants to set it to false. Got it, I misinterpreted his concern as "visible-to-all should not be set to true when the tuple is inserted". > This could be curable by using a second bit as a deleted flag (might > be even the same bit that's now used as dead-to-all, but I'm not > sure). An index tuple having both the visible flag (formerly called > visible-to-all) and the deleted flag set would cause a heap tuple > access to check visibility. Or in a more generalized way: with 2 bits written at the same time you can express 4 states. But only 3 actions need to be signalled: dead-to-all, visible-to-all and check-heap. So we can have 2 states that both signal check-heap. The immediate solution to the race condition Tom presented would be to have the transaction that invalidates the heap tuple switch the index tuple from the one check-heap state to the other. The transaction that wants to update to visible-to-all can now see that the state has changed (but not the meaning) and aborts its change. > But that leaves the question of what to > do after the deleting transaction has rolled back. I see no clean > way from the visible-and-deleted state to visible-to-all. I'm afraid I don't know enough about the inner workings of rollbacks to determine how the scenario "A determines visible-to-all should be set, B invalidates tuple, B rolls back, C invalidates tuple, C commits, A reaquires lock on index" would work out. I guess I have some more reading to do. But if you don't roll back too often it wouldn't even be a huge problem to just leave them in visible-and-deleted state until eventually they go into the dead-to-all state. Jochem