Thread: planner/optimizer question

planner/optimizer question

From
brad-pgperf@duttonbros.com
Date:
Hi,

I have a query which I think should be using an index all of the time but
postgres only uses the index part of the time.  The index
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
first followed by the selected column (support_person_id).  Wouldn't the
most efficient plan be to scan the index regardless of crm_id because the
only columns needed are in the index?  Below is the table, 2 queries showing
the difference in plans, followed by the record distribution of
ticket_crm_map.  I first did a 'vacuum analyze' and am running postgres
7.4.2.

Thanks,
Brad


athenapost=> \d ticket_crm_map
                                Table "public.ticket_crm_map"
      Column         |            Type             |
Modifiers
 ------------------------+-----------------------------+---------------------
 -----------------------
tcrm_map_id            | integer                     | not null
ticket_id              | integer                     | not null
crm_id                 | integer                     | not null
support_person_id      | integer                     | not null
escalated_to_person_id | integer                     | not null
status                 | character varying(50)       | not null default
'Open'::character varying
close_date             | timestamp without time zone |
updated_date           | timestamp without time zone |
updated_by             | character varying(255)      |
created_date           | timestamp without time zone |
created_by             | character varying(255)      |
additional_info        | text                        |
subject                | character varying(255)      |
Indexes:
 "ticket_crm_map_pkey" primary key, btree (tcrm_map_id)
 "ticket_crm_map_crm_id_key" unique, btree (crm_id, ticket_id)
 "ticket_crm_map_crm_id_suppid" btree (crm_id, support_person_id)
 "ticket_crm_map_status" btree (status)
 "ticket_crm_map_ticket_id" btree (ticket_id)
Foreign-key constraints:
 "$1" FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id)
 "$2" FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id)
 "$3" FOREIGN KEY (support_person_id) REFERENCES person(person_id)
 "$4" FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id)
 "$5" FOREIGN KEY (status) REFERENCES ticket_status(status)

athenapost=> explain analyze select distinct support_person_id from
ticket_crm_map where crm_id = 7;

QUERY PLAN
 ----------------------------------------------------------------------------
 ----------------------------------------------------------------------------
 ----------
Unique  (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245
rows=20 loops=1)
 ->  Sort  (cost=1262.99..1264.13 rows=456 width=4) (actual
time=15.332..16.605 rows=2275 loops=1)
      Sort Key: support_person_id
      ->  Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map
(cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275
loops=1)
            Index Cond: (crm_id = 7)
Total runtime: 18.553 ms
(6 rows)

Time: 20.598 ms
athenapost=> explain analyze select distinct support_person_id from
ticket_crm_map where crm_id = 1;
                                                        QUERY PLAN
 ----------------------------------------------------------------------------
 -----------------------------------------------------
Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual
time=659.102..791.517 rows=24 loops=1)
 ->  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual
time=659.090..713.285 rows=93889 loops=1)
      Sort Key: support_person_id
      ->  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628
width=4) (actual time=0.027..359.299 rows=93889 loops=1)
            Filter: (crm_id = 1)
Total runtime: 814.601 ms
(6 rows)

Time: 817.095 ms
athenapost=> select count(*), crm_id from ticket_crm_map group by crm_id;
count | crm_id
 -------+--------
2554 |     63
129 |     25
 17 |     24
110 |     23
 74 |     22
 69 |     21
  2 |     20
 53 |     82
 10 |     17
 16 |     81
46637 |     16
 14 |     80
  2 |     15
1062 |     79
 87 |     78
 93 |     77
 60 |     44
363 |     76
225 |     10
  4 |     74
 83 |      9
 27 |     73
182 |      8
2275 |      7
 15 |     71
554 |      6
 44 |     70
631 |      5
 37 |      4
190 |      3
112 |      2
93889 |      1
(32 rows)

Time: 436.697 ms


Re: planner/optimizer question

From
"Atesz"
Date:
Hi,

You should try the next queries:

select support_person_id from ticket_crm_map where crm_id = 7 GROUP BY
support_person_id;
select support_person_id from ticket_crm_map where crm_id = 1 GROUP BY
support_person_id;

It can use the 'ticket_crm_map_crm_id_suppid' index.

Generally the Postgres use an k-column index if columns of your
conditions are prefix of the index column.
For example:
CREATE INDEX test_idx on test(col1,col2,col3,col4);
SELECT * FROM test WHERE col1=3 AND col2=13;  -- This can use the index.

But the next queries cannot use the index:
SELECT * FROM test WHERE col1=3 AND col3=13;.
SELECT * FROM test WHERE col2=3;

If you have problem with seq_scan or sort, you can disable globally and
locally:
SET enable_seqscan=0;
SET enable_sort = 0;

Regards, Antal Attila



Re: planner/optimizer question

From
Tom Lane
Date:
brad-pgperf@duttonbros.com writes:
> ... Wouldn't the most efficient plan be to scan the index regardless
> of crm_id because the only columns needed are in the index?

No.  People coming from other databases often have the misconception
that queries can be answered by looking only at an index.  That is never
true in Postgres because row validity info is only stored in the table;
so we must always visit the table entry to make sure the row is still
valid/visible for the current query.

Accordingly, columns added to the index that aren't constrained by the
WHERE clause are not very useful ...

            regards, tom lane

Re: planner/optimizer question

From
"Gary Doades"
Date:
I know you will shoot me down, but...

Why is there an entry in the index for a row if the row is not valid?
Wouldn't it be better for the index entry validity to track the row validity.
If a particular data value for a query (join, where etc.) can be satisfied
by the index entry itself this would be a big performance gain.

Cheers,
Gary.

On 28 Apr 2004 at 0:27, Tom Lane wrote:

> brad-pgperf@duttonbros.com writes:
> > ... Wouldn't the most efficient plan be to scan the index regardless
> > of crm_id because the only columns needed are in the index?
>
> No.  People coming from other databases often have the misconception
> that queries can be answered by looking only at an index.  That is never
> true in Postgres because row validity info is only stored in the table;
> so we must always visit the table entry to make sure the row is still
> valid/visible for the current query.
>
> Accordingly, columns added to the index that aren't constrained by the
> WHERE clause are not very useful ...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend



Re: planner/optimizer question

From
Christopher Kings-Lynne
Date:
> Why is there an entry in the index for a row if the row is not valid?
> Wouldn't it be better for the index entry validity to track the row validity.
> If a particular data value for a query (join, where etc.) can be satisfied
> by the index entry itself this would be a big performance gain.

For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big
performance loss.

Chris


Re: planner/optimizer question

From
"Gary Doades"
Date:
I can understand the performance loss on non-selects for keeping the
index validity state tracking the row validity, but would that outweigh the
performance gains on selects? Depends on your mix of selects to non
selects I guess, but other database systems seem to imply that keeping
the index on track is worth it overall.

Cheers,
Gary.

On 28 Apr 2004 at 15:04, Christopher Kings-Lynne wrote:

> > Why is there an entry in the index for a row if the row is not valid?
> > Wouldn't it be better for the index entry validity to track the row validity.
> > If a particular data value for a query (join, where etc.) can be satisfied
> > by the index entry itself this would be a big performance gain.
>
> For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big
> performance loss.
>
> Chris
>



Re: planner/optimizer question

From
Christopher Kings-Lynne
Date:
> I can understand the performance loss on non-selects for keeping the
> index validity state tracking the row validity, but would that outweigh the
> performance gains on selects? Depends on your mix of selects to non
> selects I guess, but other database systems seem to imply that keeping
> the index on track is worth it overall.

Yes, some sort of flag on index creation would be sweet :)

Chris


Re: planner/optimizer question

From
Manfred Koizar
Date:
On Wed, 28 Apr 2004 07:35:41 +0100, "Gary Doades" <gpd@gpdnet.co.uk>
wrote:
>Why is there an entry in the index for a row if the row is not valid?

Because whether a row is seen as valid or not lies in the eye of the
transaction looking at it.  Full visibility information is stored in the
heap tuple header.  The developers' consensus is that this overhead
should not be in every index tuple.

Servus
 Manfred

Re: planner/optimizer question

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> On Wed, 28 Apr 2004 07:35:41 +0100, "Gary Doades" <gpd@gpdnet.co.uk>
> wrote:
>> Why is there an entry in the index for a row if the row is not valid?

> Because whether a row is seen as valid or not lies in the eye of the
> transaction looking at it.  Full visibility information is stored in the
> heap tuple header.  The developers' consensus is that this overhead
> should not be in every index tuple.

Storing that information would at least double the overhead space used
for each index tuple.  The resulting index bloat would significantly
slow index operations by requiring more I/O.  So it's far from clear
that this would be a win, even for those who care only about select
speed.

            regards, tom lane

Re: planner/optimizer question

From
Manfred Koizar
Date:
On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> [ ... visibility information in index tuples ... ]

>Storing that information would at least double the overhead space used
>for each index tuple.  The resulting index bloat would significantly
>slow index operations by requiring more I/O.  So it's far from clear
>that this would be a win, even for those who care only about select
>speed.

While the storage overhead could be reduced to 1 bit (not a joke) we'd
still have the I/O overhead of locating and updating index tuples for
every heap tuple deleted/updated.

Servus
 Manfred

Re: planner/optimizer question

From
"Gary Doades"
Date:
On 29 Apr 2004 at 19:03, Manfred Koizar wrote:

> While the storage overhead could be reduced to 1 bit (not a joke) we'd
> still have the I/O overhead of locating and updating index tuples for
> every heap tuple deleted/updated.

But this is what a lot of DBMSs do and seem to do well enough. I can see that the
MVCC system gives additional problems, but maybe it shouldn't be dismissed so lightly.

Coming from a MS SQLServer platform I have spent a lot of time optimising SQL in
PostgreSQL to be comparable to SQLServer. For the most part I have done this, but
some things are just slower in PostgreSQL.

Recently  I have been looking at raw performance (CPU, IO) rather than the plans. I
have some test queries that (as far as I can determine) use the same access plans on
PostgreSQL and SQLServer. Getting to the detail, an index scan of an index on a
integer column (222512 rows) takes 60ms on SQLServer and 540ms on PostgreSQL.
A full seq table scan on the same table without the index on the other hand takes 370ms
in SQLServer and 420ms in PostgreSQL.

I know that the platforms are different (windows 2000 vs Linux 2.6.3), but the statement
was executed several times to make sure the index and data was in cache (no disk io)
on both systems. Same data, Same CPU, Same disks, Same memory, Same
motherboards.

The only thing I can think of is the way that the index scan is performed on each
platform, SQLServer can use the data directly from the index. This makes the biggest
difference in multi join statements where several of the intermediate tables do not need
to be accessed at all, the data is contained in the join indexes. This results in almost an
order of magnitude performance difference for the same data.

I would be nice to get a feel for how much performance loss would be incurred in
maintaining the index flags against possible performance gains for getting the data back
out again.

Regards,
Gary.


Re: planner/optimizer question

From
Rod Taylor
Date:
> I would be nice to get a feel for how much performance loss would be incurred in
> maintaining the index flags against possible performance gains for getting the data back
> out again.

I guess the real question is, why maintain index flags and not simply
drop the index entry altogether?

A more interesting case would be to have the backend process record
index tuples that it would invalidate (if committed), then on commit
send that list to a garbage collection process.

It's still vacuum -- just the reaction time for it would be much
quicker.



Re: planner/optimizer question

From
"Gary Doades"
Date:
>
> I guess the real question is, why maintain index flags and not simply
> drop the index entry altogether?
>
> A more interesting case would be to have the backend process record
> index tuples that it would invalidate (if committed), then on commit
> send that list to a garbage collection process.
>
> It's still vacuum -- just the reaction time for it would be much
> quicker.
>
This was my original question.

I guess the problem is with MVCC. The row may have gone from your
current view of the table but not from someone elses. I don't (yet)
understand the way it works to say for sure, but I still think it is worth
pursuing further for someone who does know the deep stuff. They seem
to have concluded that it is not worth it however.

Cheers,
Gary.




Re: planner/optimizer question

From
"Rosser Schwarz"
Date:
while you weren't looking, Gary Doades wrote:

> Recently  I have been looking at raw performance (CPU, IO)
> rather than the plans. I have some test queries that (as far
> as I can determine) use the same access plans on PostgreSQL
> and SQLServer. Getting to the detail, an index scan of an
> index on a integer column (222512 rows) takes 60ms on
> SQLServer and  540ms on PostgreSQL.

After a recent power outage, I had the opportunity to watch both
PostgreSQL and MS SQL come back from forced shutdowns (clean,
though there were active connections, in one case a bulk insert).
PostgreSQL was available and responsive as soon as the postmaster
had started.  MS SQL, on the other hand, took the better part of
an hour to become remotely usable again -- on a radically faster
machine (Dell 6650, versus the 6450 we run PostgreSQL on).

Digging a bit, I noted that once MS SQL was up again, it was
using nearly 2GB main memory even when more or less idle.  From
this, and having observed the performance differences between
the two, I'm left with little alternative but to surmise that
part of MS SQL's noted performance advantage [1] is due to its
forcibly storing its indices in main memory.  Its startup lag
(during which it was utterly unusable; even SELECTs blocked)
could be accounted for by reindexing the tables. [2]

Granted, this is only a hypothesis, is rather unverifyable, and
probably belongs more on ADVOCACY than it does PERFORM, but it
seemed relevant.

It's also entirely possible your indices are using inaccurate
statistical information.  Have you ANALYZEd recently?

/rls

[1] Again, at least in our case, the comparison is entirely
    invalid, as MS SQL gets a hell of a lot more machine than
    PostgreSQL.  Even so, for day-to-day work and queries, even
    our DBA, an until-recently fervent MS SQL advocate can't
    fault PostgreSQL's SELECT, INSERT or DELETE performance.
    We still can't get UPDATEs (at least bulk such) to pass
    muster.

[2] This is further supported by having observed MS SQL run a
    "recovery process" on databases that were entirely unused,
    even for SELECT queries, at the time of the outage.  The
    only thing it might conceivably need to recover on them
    is in-memory indices that were lost when power was lost.

--
Rosser Schwarz
Total Card, Inc.


Re: planner/optimizer question

From
"Gary Doades"
Date:
> It's also entirely possible your indices are using inaccurate
> statistical information.  Have you ANALYZEd recently?
>

In this example the statistics don't matter. The plans used were the same for
MSSQL and Postgres. I was trying to eliminate the difference in plans
between the two, which obviously does make a difference, sometimes in
MSSQL favour and sometimes the other way round. Both systems, having
decided to do the same index scan, took noticably different times. The
Postgres database was fully vacuumed and analysed anyway.

I agree about MSSQL recovery time. it sucks. This is why they are making a
big point about the improved recovery time in "yukon". Although the recovery
time is important, I see this as an exception, whereas at the moment I am
interested in the everyday.

Cheers,
Gary.


Re: planner/optimizer question

From
Josh Berkus
Date:
Gary,

> In this example the statistics don't matter. The plans used were the same
for
> MSSQL and Postgres. I was trying to eliminate the difference in plans
> between the two, which obviously does make a difference, sometimes in
> MSSQL favour and sometimes the other way round. Both systems, having
> decided to do the same index scan, took noticably different times. The
> Postgres database was fully vacuumed and analysed anyway.

It's also quite possble the MSSQL simply has more efficient index scanning
implementation that we do.    They've certainly had incentive; their storage
system sucks big time for random lookups and they need those fast indexes.
(just try to build a 1GB adjacency list tree on SQL Server.   I dare ya).

Certainly the fact that MSSQL is essentially a single-user database makes
things easier for them.    They don't have to maintain multiple copies of the
index tuples in memory.    I think that may be our main performance loss.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: planner/optimizer question

From
"Gary Doades"
Date:
On 29 Apr 2004 at 15:35, Kenneth Marshall wrote:

> Did you try to cluster based on the index?
>
> --Ken

Yes, This speeds up the index scan a little (12%). This to me just
reinforces the overhead that subsequently having to go and fetch the
data tuple actually has on the performance.

Cheers,
Gary.


Re: planner/optimizer question

From
"Gary Doades"
Date:
On 29 Apr 2004 at 13:54, Josh Berkus wrote:

> Gary,
>
>
> It's also quite possble the MSSQL simply has more efficient index scanning
> implementation that we do.    They've certainly had incentive; their storage
> system sucks big time for random lookups and they need those fast indexes.
> (just try to build a 1GB adjacency list tree on SQL Server.   I dare ya).
>
> Certainly the fact that MSSQL is essentially a single-user database makes
> things easier for them.    They don't have to maintain multiple copies of the
> index tuples in memory.    I think that may be our main performance loss.
>

Possibly, but MSSQL certainly uses data from indexes and cuts out the
subsequent (possibly random seek) data fetch. This is also why the
"Index Tuning Wizard" often recommends multi column compound
indexes in some cases. I've tried these recommendations on occasions
and they certainly speed up the selects significantly. If anyhing the index
scan on the new compound index must be slower then the original single
column index and yet it still gets the data faster.

This indicates to me that it is not the scan (or IO) performance that is
making the difference, but not having to go get the data row.

Cheers,
Gary.


Re: planner/optimizer question

From
Tom Lane
Date:
"Gary Doades" <gpd@gpdnet.co.uk> writes:
> In this example the statistics don't matter.

Don't they?

A prior poster mentioned that he thought MSSQL tries to keep all its
indexes in memory.  I wonder whether you are giving Postgres a fair
chance to do the same.  What postgresql.conf settings are you using?

            regards, tom lane

Re: planner/optimizer question

From
"Gary Doades"
Date:
On 29 Apr 2004 at 17:54, Tom Lane wrote:

> "Gary Doades" <gpd@gpdnet.co.uk> writes:
> > In this example the statistics don't matter.
>
> Don't they?
>
> A prior poster mentioned that he thought MSSQL tries to keep all its
> indexes in memory.  I wonder whether you are giving Postgres a fair
> chance to do the same.  What postgresql.conf settings are you using?
>
>                                  regards, tom lane

As far as I understand it the statistics only contribute to determining the query plan. Once the access methods are determined, the stats don't matter during the running of the query.

I believe I have given Postgres exactly the same chance. The data is small enough to fit into RAM (all the tables in the query add up to around 50meg) and I executed the query several times to get a consistent figure for the explain analyze.

Having picked out an index scan as being the highest time user I concentrated on that in this case and compared the same index scan on MSSQL. At least MSSQL reported it as an index scan on the same index for the same number of rows.

There was nothing wrong with the query plan that Postgres used. As far as I could see it was probably the best one to use, it just physically took longer than the same access plan on MSSQL.

The query and plan are included below, the main thing I was looking at was the index scan on staff_booking_pkey being 676ms long.

The only postgresql.conf parameters changed from the default are:

shared_buffers = 3000   
sort_mem = 4096
effective_cache_size = 15000
default_statistics_target = 100

There was no disk IO (above the small background IO) during the final run of the query as reported by vmstat (Task Mangler on Windows).

SELECT B.CONTRACT_ID,SUM(R.DURATION+1)/60.0 AS SUMDUR FROM SEARCH_REQT_RESULT TSR
JOIN STAFF_BOOKING B ON (B.STAFF_ID = TSR.STAFF_ID)
JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID)
JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND BP.BOOKING_DATE BETWEEN '2004-04-12' AND '2004-04-18' AND TSR.SEARCH_ID = 8 GROUP BY B.CONTRACT_ID

QUERY PLAN
HashAggregate  (cost=11205.80..11209.81 rows=401 width=6) (actual time=1179.729..1179.980 rows=50 loops=1)
  ->  Nested Loop  (cost=326.47..11203.79 rows=401 width=6) (actual time=39.700..1177.149 rows=652 loops=1)
        ->  Hash Join  (cost=326.47..9990.37 rows=401 width=8) (actual time=39.537..1154.807 rows=652 loops=1)
              Hash Cond: ("outer".staff_id = "inner".staff_id)
              ->  Merge Join  (cost=320.39..9885.06 rows=3809 width=12) (actual time=38.316..1143.953 rows=4079 loops=1)
                    Merge Cond: ("outer".booking_id = "inner".booking_id)
                    ->  Index Scan using staff_booking_pkey on staff_booking b (cost=0.00..8951.94 rows=222612 width=16) (actual time=0.218..676.219 rows=222609 loops=1)
                    ->  Sort  (cost=320.39..329.91 rows=3808 width=4) (actual time=26.225..32.754 rows=4079 loops=1)
                          Sort Key: bp.booking_id
                          ->  Index Scan using booking_plan_idx2 on booking_plan bp (cost=0.00..93.92 rows=3808 width=4) (actual time=0.223..14.186 rows=4079 loops=1)
                                Index Cond: ((booking_date >= '2004-04-12'::date) AND (booking_date <= '2004-04-18'::date))
              ->  Hash  (cost=5.59..5.59 rows=193 width=4) (actual time=1.139..1.139 rows=0 loops=1)
                    ->  Index Scan using fk_idx_search_reqt_result on search_reqt_result tsr (cost=0.00..5.59 rows=193 width=4) (actual time=0.213..0.764 rows=192 loops=1)
                          Index Cond: (search_id = 8)
        ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..3.01 rows=1 width=6) (actual time=0.023..0.025 rows=1 loops=652)
              Index Cond: (r.reqt_id = "outer".reqt_id)
Total runtime: 1181.239 ms


Cheers,
Gary.

Re: planner/optimizer question

From
"Gary Doades"
Date:
>
> Having picked out an index scan as being the highest time user I
> concentrated on that in  this case and compared the same index scan on
> MSSQL. At least MSSQL reported it as  an index scan on the same index
> for the same number of rows.
>

I should have also pointed out that MSSQL reported that same index scan as taking 65% of the overall query time.
It was just "faster". The overall query took 103ms in MSSQL.

Gary.


Re: planner/optimizer question

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Certainly the fact that MSSQL is essentially a single-user database makes
> things easier for them.

Our recent testing (cf the "Xeon" thread) says that the interlocking we
do to make the world safe for multiple backends has a fairly high cost
(at least on some hardware) compared to the rest of the work in
scenarios where you are doing zero-I/O scans of data already in memory.
Especially so for index scans.  I'm not sure this completely explains
the differential that Gary is complaining about, but it could be part of
it.  Is it really true that MSSQL doesn't support concurrent operations?

            regards, tom lane

Re: planner/optimizer question

From
Dennis Bjorklund
Date:
On Fri, 30 Apr 2004, Gary Doades wrote:

> I should have also pointed out that MSSQL reported that same index scan
> as taking 65% of the overall query time. It was just "faster". The
> overall query took 103ms in MSSQL.

Are your results based on a single client accessing the database and no
concurrent updates?

Would adding more clients, and maybe having some client that
updates/inserts into the tables, still make mssql faster then pg? Maybe
it's so simple as pg being optimized for more concurrent users then mssql?

I'm just asking, I don't know much about the inner workings of
mssql.

--
/Dennis Björklund


Re: planner/optimizer question

From
"Gary Doades"
Date:
On 29 Apr 2004 at 19:17, Tom Lane wrote:

> Josh Berkus <josh@agliodbs.com> writes:
> > Certainly the fact that MSSQL is essentially a single-user database makes
> > things easier for them.
>
> Our recent testing (cf the "Xeon" thread) says that the interlocking we
> do to make the world safe for multiple backends has a fairly high cost
> (at least on some hardware) compared to the rest of the work in
> scenarios where you are doing zero-I/O scans of data already in memory.
> Especially so for index scans.  I'm not sure this completely explains
> the differential that Gary is complaining about, but it could be part of
> it.  Is it really true that MSSQL doesn't support concurrent operations?
>
>             regards, tom lane

As far as I am aware SQLSever supports concurrent operations. It
certainly creates more threads for each connection. None of my
observations of the system under load (50 ish concurrent users, 150 ish
connections) suggest that it is serializing queries.

These tests are currentl on single processor Athlon XP 2000+ systems.

Regards,
Gary.

Re: planner/optimizer question

From
"Gary Doades"
Date:
On 30 Apr 2004 at 7:26, Dennis Bjorklund wrote:

> On Fri, 30 Apr 2004, Gary Doades wrote:
>
> > I should have also pointed out that MSSQL reported that same index scan
> > as taking 65% of the overall query time. It was just "faster". The
> > overall query took 103ms in MSSQL.
>
> Are your results based on a single client accessing the database and no
> concurrent updates?
>
> Would adding more clients, and maybe having some client that
> updates/inserts into the tables, still make mssql faster then pg? Maybe
> it's so simple as pg being optimized for more concurrent users then mssql?
>
> I'm just asking, I don't know much about the inner workings of
> mssql.
>
> --
> /Dennis Björklund
>

At the moment it is difficult to set up many clients for testing concurrent
stuff. In the past I have had several SQLServer clients under test,
mainly select queries. MSSQL can certainly execute queries while other
queries are still running in the background.

Our production app is fairly well biased towards selects. Currently it is
about 70% selects, 20% inserts, 6% deletes and 4% updates. Very few
updates are more than one row based on the primary key. Over 90% of
the time spend running SQL is in select queries.

My limited concurrent testing on Postgres gives very good performance
on updates, inserts, deletes, but it is suffering on the selects in certain
areas which why I have been concentrating my efforts on that area.

Having got similar (or the same) access plans in both Postgres and
MSSQL I was getting down to the next level of checking what was going
on when executing the already planned query.

I do have another database system I could try. Sybase SQLAnywhere.
This is not the original Sybase Entrerprise which has the same roots as
MSSQL. In the past my testing suggested that SQLAnywhere
performance was as godd or better than MSSQL. I mey try to set it up
with the same data in these tests for a more detailed comparison.

Regards,
Gary.


Re: planner/optimizer question

From
Jeff
Date:
On Apr 30, 2004, at 3:01 AM, Gary Doades wrote:

[ pg query plan, etc ]

I wonder if other parts of the plan are affecting the speed.

I've recently run into a case where a merge join plan was chosen for
this query, which took 11 seconds to execute.  Forcing it to pick a
nested loop join dropped it to 3.  (Updating my
default_statistics_target to 500 caused the planner to choose nested
loop join)

So, is the plan really the same?

  A better comparision query may be a simple "select a from mytable
where a between foo and bar"  to get an index scan.  In that case its a
straight up, vanilla index scan.  Nothing else getting in the way.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/


Re: planner/optimizer question

From
Jochem van Dieten
Date:
Manfred Koizar wrote:
> On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> [ ... visibility information in index tuples ... ]
>>
>> Storing that information would at least double the overhead space used
>> for each index tuple.  The resulting index bloat would significantly
>> slow index operations by requiring more I/O.  So it's far from clear
>> that this would be a win, even for those who care only about select
>> speed.
>
> While the storage overhead could be reduced to 1 bit (not a joke)

You mean adding an isLossy bit and only where it is set the head
tuple has to be checked for visibility, if it is not set the head
tuple does not have to be checked?


> we'd
> still have the I/O overhead of locating and updating index tuples for
> every heap tuple deleted/updated.

Would there be additional I/O for the additional bit in the index
tuple (I am unable to find the layout of index tuple headers in
the docs)?

Jochem

--
I don't get it
immigrants don't work
and steal our jobs
     - Loesje



Re: planner/optimizer question

From
"Gary Doades"
Date:
On 30 Apr 2004 at 8:32, Jeff wrote:
>
>   A better comparision query may be a simple "select a from mytable
> where a between foo and bar"  to get an index scan.  In that case its a
> straight up, vanilla index scan.  Nothing else getting in the way.
>

Yes, you're right and I have done this just to prove to myself that it is the index scan that
is the bottleneck. I have some complex SQL that executes very quickly with Postgres,
similar to MSSQL, but the index scans in most of those only touch a few rows for a few
loops. It seems to be a problem when the index scan is scanning very many rows and
for each of these it has to go to the table just to find out if the index it just looked at is
still valid.

Gary.


Re: planner/optimizer question

From
"Gary Doades"
Date:
On 30 Apr 2004 at 9:37, Kevin Barnard wrote:

>
> I was always under the impression that MSSQL used leaf and row level locking and therefore
> was not a concurrent, in the same sense that postgres is, database. It would still allow for
> concurrent connections and such but updates will get blocked/ delayed. I might be wrong.
>

Ultimately you may be right. I don't know enough about SQLServer
internals to say either way. Anyway, most of our system is in selects for
70% of the time. I could try and set up a test for this when I get a bit
more time.

Unfortunately I suspect that this topic won't get taken much further. In
order to test this it would mean modifying quite a bit of code. Whether
putting additional info in the index header and not visiting the data row
if all the required data is in the index would be beneficial would require
quite a bit of work by someone who knows more than I do. I reckon that
no-one has the time to do this at the moment.

Regards,
Gary.


Re: planner/optimizer question

From
Kris Jurka
Date:

On Fri, 30 Apr 2004, Gary Doades wrote:

> Yes, you're right and I have done this just to prove to myself that it
> is the index scan that is the bottleneck. I have some complex SQL that
> executes very quickly with Postgres, similar to MSSQL, but the index
> scans in most of those only touch a few rows for a few loops. It seems
> to be a problem when the index scan is scanning very many rows and for
> each of these it has to go to the table just to find out if the index it
> just looked at is still valid.
>

Another way to speed this up is the TODO item: "Use bitmaps to fetch
heap pages in sequential order"  For an indexscan that fetches a number
of rows those rows may be anywhere in the base table so as each index
entry is found it fetches the corresponding table row from the heap.  This
is not ideal because you can be jumping around in the heap and end up
fetching the same page multiple times because table rows are in the same
page, but were found in different places in the index.

Kris Jurka

Re: planner/optimizer question

From
Manfred Koizar
Date:
On Fri, 30 Apr 2004 19:46:24 +0200, Jochem van Dieten
<jochemd@oli.tudelft.nl> wrote:
>> While the storage overhead could be reduced to 1 bit (not a joke)
>
>You mean adding an isLossy bit and only where it is set the head
>tuple has to be checked for visibility, if it is not set the head
>tuple does not have to be checked?

Yes, something like this.  Actually I imagined it the other way round: a
visible-to-all flag similar to the existing dead-to-all flag (search for
LP_DELETE and ItemIdDeleted in nbtree.c).

>> we'd
>> still have the I/O overhead of locating and updating index tuples for
>> every heap tuple deleted/updated.
>
>Would there be additional I/O for the additional bit in the index
>tuple (I am unable to find the layout of index tuple headers in
>the docs)?

Yes, the visible-to-all flag would be set as a by-product of an index
scan, if the heap tuple is found to be visible to all active
transactions.  This update is non-critical and, I think, not very
expensive.

Deleting (and hence updating) a tuple is more critical, regarding both
consistency and performance.  We'd have to locate all index entries
pointing to the heap tuple and set their visible-to-all flags to false.


Re: planner/optimizer question

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> Yes, the visible-to-all flag would be set as a by-product of an index
> scan, if the heap tuple is found to be visible to all active
> transactions.  This update is non-critical

Oh really?  I think you need to think harder about the transition
conditions.

Dead-to-all is reasonably safe to treat as a hint bit because *it does
not ever need to be undone*.  Visible-to-all does not have that
property.

            regards, tom lane

Re: planner/optimizer question

From
Jochem van Dieten
Date:
Tom Lane wrote:
> Manfred Koizar <mkoi-pg@aon.at> writes:
>>
>> Yes, the visible-to-all flag would be set as a by-product of an index
>> scan, if the heap tuple is found to be visible to all active
>> transactions.  This update is non-critical
>
> Oh really?  I think you need to think harder about the transition
> conditions.
>
> Dead-to-all is reasonably safe to treat as a hint bit because *it does
> not ever need to be undone*.  Visible-to-all does not have that
> property.

Yes, really :-)

When a tuple is inserted the visible-to-all flag is set to false.
The effect of this is that every index scan that finds this tuple
has to visit the heap to verify visibility. If it turns out the
tuple is not only visible to the current transaction, but to all
current transactions, the visible-to-all flag can be set to true.
This is non-critical, because if it is set to false scans will
not miss the tuple, they will just visit the heap to verify
visibility.

The moment the heap tuple is updated/deleted the visible-to-all
flag needs to be set to false again in all indexes. This is
critical, and the I/O and (dead)lock costs of unsetting the
visible-to-all flag are unknown and might be big enough to ofset
any advantage on the selects.

But I believe that for applications with a "load, select, drop"
usage pattern (warehouses, archives etc.) having this
visible-to-all flag would be a clear winner.

Jochem

--
I don't get it
immigrants don't work
and steal our jobs
     - Loesje


Re: planner/optimizer question

From
"Gary Doades"
Date:
On 1 May 2004 at 13:18, Jochem van Dieten wrote:

> Yes, really :-)
>
> When a tuple is inserted the visible-to-all flag is set to false.
> The effect of this is that every index scan that finds this tuple
> has to visit the heap to verify visibility. If it turns out the
> tuple is not only visible to the current transaction, but to all
> current transactions, the visible-to-all flag can be set to true.
> This is non-critical, because if it is set to false scans will
> not miss the tuple, they will just visit the heap to verify
> visibility.
>
> The moment the heap tuple is updated/deleted the visible-to-all
> flag needs to be set to false again in all indexes. This is
> critical, and the I/O and (dead)lock costs of unsetting the
> visible-to-all flag are unknown and might be big enough to ofset
> any advantage on the selects.
>
> But I believe that for applications with a "load, select, drop"
> usage pattern (warehouses, archives etc.) having this
> visible-to-all flag would be a clear winner.
>
> Jochem
>

If needs be this index maintenance could be set as a configuration
option. It is likely that database usage patterns are reasonably well
known for a particular installation. This option could be set on or off
dependant on typical transactions.

In my case with frequent large/complex selects and few very short
insert/updates I think it could be a big performance boost. If it works :-)

Regards,
Gary.


Re: planner/optimizer question

From
Tom Lane
Date:
Jochem van Dieten <jochemd@oli.tudelft.nl> writes:
> The moment the heap tuple is updated/deleted the visible-to-all
> flag needs to be set to false again in all indexes. This is
> critical,

Exactly.  This gets you out of the hint-bit semantics and into a ton
of interesting problems, such as race conditions.  (Process A determines
that tuple X is visible-to-all, and goes to update the index tuple.
Before it can reacquire lock on the index page, process B updates the
heap tuple and visits the index to clear the flag bit.  Once A obtains
lock it will set the flag bit.  Oops.)

Basically what you are buying into with such a thing is multiple copies
of critical state.  It may be only one bit rather than several words,
but updating it is no less painful than if it were a full copy of the
tuple's commit status.

            regards, tom lane

Re: planner/optimizer question

From
Manfred Koizar
Date:
On Sat, 01 May 2004 13:18:04 +0200, Jochem van Dieten
<jochemd@oli.tudelft.nl> wrote:
>Tom Lane wrote:
>> Oh really?  I think you need to think harder about the transition
>> conditions.

Indeed.

>>
>> Dead-to-all is reasonably safe to treat as a hint bit because *it does
>> not ever need to be undone*.  Visible-to-all does not have that
>> property.
>
>Yes, really :-)

No, not really :-(

As Tom has explained in a nearby message his concern is that -- unlike
dead-to-all -- visible-to-all starts as false, is set to true at some
point in time, and is eventually set to false again.  Problems arise if
one backend wants to set visible-to-all to true while at the same time
another backend wants to set it to false.

This could be curable by using a second bit as a deleted flag (might be
even the same bit that's now used as dead-to-all, but I'm not sure).  An
index tuple having both the visible flag (formerly called
visible-to-all) and the deleted flag set would cause a heap tuple access
to check visibility.  But that leaves the question of what to do after
the deleting transaction has rolled back.  I see no clean way from the
visible-and-deleted state to visible-to-all.

This obviously needs another round of hard thinking ...


Re: planner/optimizer question

From
"Jochem van Dieten"
Date:
Manfred Koizar said:
>
> As Tom has explained in a nearby message his concern is that --
> unlike dead-to-all -- visible-to-all starts as false, is set to true
> at some point in time, and is eventually set to false again.
> Problems arise if one backend wants to set visible-to-all to true
> while at the same time another backend wants to set it to false.

Got it, I misinterpreted his concern as "visible-to-all should not be
set to true when the tuple is inserted".


> This could be curable by using a second bit as a deleted flag (might
> be even the same bit that's now used as dead-to-all, but I'm not
> sure).  An index tuple having both the visible flag (formerly called
> visible-to-all) and the deleted flag set would cause a heap tuple
> access to check visibility.

Or in a more generalized way: with 2 bits written at the same time you
can express 4 states. But only 3 actions need to be signalled:
dead-to-all, visible-to-all and check-heap. So we can have 2 states
that both signal check-heap.

The immediate solution to the race condition Tom presented would be to
have the transaction that invalidates the heap tuple switch the index
tuple from the one check-heap state to the other. The transaction that
wants to update to visible-to-all can now see that the state has
changed (but not the meaning) and aborts its change.


> But that leaves the question of what to
> do after the deleting transaction has rolled back.  I see no clean
> way from the visible-and-deleted state to visible-to-all.

I'm afraid I don't know enough about the inner workings of rollbacks
to determine how the scenario "A determines visible-to-all should be
set, B invalidates tuple, B rolls back, C invalidates tuple, C
commits, A reaquires lock on index" would work out. I guess I have
some more reading to do.

But if you don't roll back too often it wouldn't even be a huge
problem to just leave them in visible-and-deleted state until
eventually they go into the dead-to-all state.

Jochem