Thread: Re: PostgreSQL and Linux 2.6 kernel.

Re: PostgreSQL and Linux 2.6 kernel.

From
"Gary Doades"
Date:
On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote:

OK, some more detail:

Before wiping 2.4 off my test box for the second time:

SQL Statement for update:
update staff_booking set time_from = r.time_from from order_reqt r where r.reqt_id =
staff_booking.reqt_id;

Explain: (on 2.4)
QUERY PLAN
Merge Join  (cost=0.00..185731.30 rows=2845920 width=92)
  Merge Cond: ("outer".reqt_id = "inner".reqt_id)
  ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..53068.20
rows=2206291 width=6)
  ->  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..99579.21
rows=2845920 width=90)

Total execution time: 18 hours 12 minutes

vacuum full analyze: total time 3 hours 22 minutes

Wait 2 hours for re-install 2.6, set params etc.
restore database.

Same SQL Statement
Explain: (on 2.6)
QUERY PLAN
Merge Join  (cost=0.00..209740.24 rows=2845920 width=92)
  Merge Cond: ("outer".reqt_id = "inner".reqt_id)
  ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..50734.20
rows=2206291 width=6)
  ->  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..117921.92
rows=2845920 width=90)

Total execution time: 2 hours 53 minutes

vacuum full analyze: total time 1 hours 6 minutes

Table definitions for the two tables involved:
CREATE TABLE ORDER_REQT
(
    REQT_ID               SERIAL,
    ORDER_ID               integer NOT NULL,
    DAYOFWEEK           smallint NOT NULL CHECK (DAYOFWEEK
BETWEEN 0 AND 6),
    TIME_FROM             smallint NOT NULL CHECK (TIME_FROM
BETWEEN 0 AND 1439),
    DURATION               smallint NOT NULL CHECK (DURATION
BETWEEN 0 AND 1439),
    PRODUCT_ID            integer NOT NULL,
    NUMBER_REQT              smallint NOT NULL DEFAULT (1),
    WROPTIONS            integer NOT NULL DEFAULT 0,
    UID_REF              integer NOT NULL,
    DT_STAMP              timestamp NOT NULL DEFAULT
current_timestamp,
    Sentinel_Priority         integer NOT NULL DEFAULT 0,
    PERIOD            smallint NOT NULL DEFAULT 1 CHECK
(PERIOD BETWEEN -2 AND 4),
    FREQUENCY            smallint NOT NULL DEFAULT 1,
    PRIMARY KEY (REQT_ID)
);

CREATE TABLE STAFF_BOOKING
(
    BOOKING_ID            SERIAL,
    REQT_ID            integer NOT NULL,
    ENTITY_TYPE           smallint NOT NULL DEFAULT 3
check(ENTITY_TYPE in(3,4)),
    STAFF_ID              integer NOT NULL,
    CONTRACT_ID          integer NOT NULL,
    TIME_FROM             smallint NOT NULL CHECK (TIME_FROM
BETWEEN 0 AND 1439),
    DURATION               smallint NOT NULL CHECK (DURATION
BETWEEN 0 AND 1439),
    PERIOD            smallint NOT NULL DEFAULT 1 CHECK
(PERIOD BETWEEN -2 AND 4),
    FREQUENCY            smallint NOT NULL DEFAULT 1,
    TRAVEL_TO             smallint NOT NULL DEFAULT 0,
    UID_REF              integer NOT NULL,
    DT_STAMP              timestamp NOT NULL DEFAULT
current_timestamp,
    SELL_PRICE            numeric(10,4) NOT NULL DEFAULT 0,
    COST_PRICE            numeric(10,4) NOT NULL DEFAULT 0,
    MIN_SELL_PRICE        numeric(10,4) NOT NULL DEFAULT 0,
    MIN_COST_PRICE        numeric(10,4) NOT NULL DEFAULT 0,
    Sentinel_Priority         integer NOT NULL DEFAULT 0,
    CHECK_INTERVAL            smallint NOT NULL DEFAULT 0,
      STATUS            smallint NOT NULL DEFAULT 0,
    WROPTIONS            integer NOT NULL DEFAULT 0,
    PRIMARY KEY (BOOKING_ID)
);

Foreign keys:

ALTER TABLE ORDER_REQT ADD
     FOREIGN KEY
    (
        ORDER_ID
    ) REFERENCES MAIN_ORDER (
        ORDER_ID
    ) ON DELETE CASCADE;

ALTER TABLE ORDER_REQT ADD
     FOREIGN KEY
    (
        PRODUCT_ID
    ) REFERENCES PRODUCT (
        PRODUCT_ID
    );

ALTER TABLE STAFF_BOOKING ADD
     FOREIGN KEY
    (
        CONTRACT_ID
    ) REFERENCES STAFF_CONTRACT (
        CONTRACT_ID
    );

ALTER TABLE STAFF_BOOKING ADD
     FOREIGN KEY
    (
        STAFF_ID
    ) REFERENCES STAFF (
        STAFF_ID
    );


Indexes:

CREATE INDEX FK_IDX_ORDER_REQT
     ON ORDER_REQT
    (
        ORDER_ID
    );

CREATE INDEX FK_IDX_ORDER_REQT_2
     ON ORDER_REQT
    (
        PRODUCT_ID
    );

CREATE INDEX ORDER_REQT_IDX ON ORDER_REQT
(
    ORDER_ID,
    PRODUCT_ID
);

CREATE INDEX ORDER_REQT_IDX4 ON ORDER_REQT
(
    REQT_ID,
    TIME_FROM,
    DURATION
);

CREATE INDEX FK_IDX_STAFF_BOOKING
     ON STAFF_BOOKING
    (
        CONTRACT_ID
    );

CREATE INDEX FK_IDX_STAFF_BOOKING_2
     ON STAFF_BOOKING
    (
        STAFF_ID
    );

CREATE INDEX STAFF_BOOK_IDX1 ON STAFF_BOOKING
(
    STAFF_ID,
    REQT_ID
);

CREATE INDEX STAFF_BOOK_IDX2 ON STAFF_BOOKING
(
    REQT_ID
);

CREATE INDEX STAFF_BOOK_IDX3 ON STAFF_BOOKING
(
    BOOKING_ID,
    REQT_ID
);


CREATE INDEX STAFF_BOOK_IDX4 ON STAFF_BOOKING
(
    BOOKING_ID,
    CONTRACT_ID
);

There are no indexes on the columns involved in the update, they are
not required for my usual select statements. This is an attempt to
slightly denormalise the design to get the performance up comparable
to SQL Server 2000. We hope to move some of our databases over to
PostgreSQL later in the year and this is part of the ongoing testing.
SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
so I am hand optimising some of the more frequently used
SQL and/or tweaking the database design slightly.

Later, after deciphering SQLServers graphical plans I will attempt to
post comparitive performance/access plans, using the same data of
course, if anyone would be interested....

Cheers,
Gary.



On 2 Apr 2004 at 1:32, Tom Lane wrote:

> "Gary Doades" <gpd@gpdnet.co.uk> writes:
> > As a test in PosgreSQL I issued a statement to update a single column
> > of a table containing 2.8 million rows with the values of a column in
> > a table with similar rowcount.  Using the above spec I had to stop the
> > server after 17 hours. The poor thing was thrashing the hard disk and
> > doing more swapping than useful work.
>
> This statement is pretty much content-free, since you did not show us
> the table schemas, the query, or the EXPLAIN output for the query.
> (I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily
> have provided all the other hard facts.)  There's really no way to tell
> where the bottleneck is.  Maybe it's a kernel-level issue, but I would
> not bet on that without more evidence.  I'd definitely not bet on it
> without direct confirmation that the same query plan was used in both
> setups.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
>
> --
> Incoming mail is certified Virus Free.
> Checked by AVG Anti-Virus (http://www.grisoft.com).
> Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
>



Re: PostgreSQL and Linux 2.6 kernel.

From
Josh Berkus
Date:
Gary,

> There are no indexes on the columns involved in the update, they are
> not required for my usual select statements. This is an attempt to
> slightly denormalise the design to get the performance up comparable
> to SQL Server 2000. We hope to move some of our databases over to
> PostgreSQL later in the year and this is part of the ongoing testing.
> SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> so I am hand optimising some of the more frequently used
> SQL and/or tweaking the database design slightly.

Hmmm ... that hasn't been my general experience on complex queries.   However,
it may be due to a difference in ANALYZE statistics.   I'd love to see you
increase your default_stats_target, re-analyze, and see if PostgreSQL gets
"smarter".

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: PostgreSQL and Linux 2.6 kernel.

From
Gary Doades
Date:
Actually it hasn't been my experience either. Most of my queries against
the database, large and small are either a little quicker or no real
difference. I have only really noticed big differences under stress when
memory (RAM) is being squeezed. The main winner on 2.6 seems to be
write performance and memory management.

Unfortunately I only have one test machine and I can't really keep
switching between 2.4 and 2.6 to do the comparisons. I had written
down 27 timings from a set of SQL of varying complexity using the 2.4
kernel. Each SQL statement was executed 10 times and the average of
the last 5 was used. I can only really compare those timings against the
new installation on 2.6. I know that this is not ideal "real world" testing,
but it is good enough for me at the moment. Unless anyone has
contradictory indications then I will proceed with 2.6.

I did increase the default stats target from 10 to 50 and re-analysed.
The explain numbers are slightly different, but the time to run was
almost the same. Not surprising since the plan was the same.

QUERY PLAN
Merge Join  (cost=0.00..192636.20 rows=2845920 width=92)
  Merge Cond: ("outer".reqt_id = "inner".reqt_id)
  ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..52662.40
rows=2206291 width=6)
  ->  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..102529.28
rows=2845920 width=90)


On 3 Apr 2004 at 10:59, Josh Berkus wrote:

Gary,

> There are no indexes on the columns involved in the update, they are
> not required for my usual select statements. This is an attempt to
> slightly denormalise the design to get the performance up comparable
> to SQL Server 2000. We hope to move some of our databases over to
> PostgreSQL later in the year and this is part of the ongoing testing.
> SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> so I am hand optimising some of the more frequently used
> SQL and/or tweaking the database design slightly.

Hmmm ... that hasn't been my general experience on complex queries.   However,
it may be due to a difference in ANALYZE statistics.   I'd love to see you
increase your default_stats_target, re-analyze, and see if PostgreSQL gets
"smarter".

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly


--
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


Re: PostgreSQL and Linux 2.6 kernel.

From
Gary Doades
Date:
Sorry, I think I misread your post in my last reply. I thought you were still talking about
the big update....

The main thing I have noticed about SQLServer is it seems more willing to do hash or
merge joins than PostgreSQL. I have experimented with various postgresql.conf
parameters and even turned off nested loops to see the difference. When actually
getting a merge join out of PostgreSQL when it wanted to do a nested loop it, not
surprisingly, took longer to execute.

Looking at the SQLServer plan it seemed to be spending MUCH less time in the sort
operations than PostgreSQL. This is probably what leads SQLServer to go for
hash/merge joins more often. The other problem is that the SQLServer timings are
skewed by its query plan caching.

For one query SQLserver plan said it spent 2% of its time in a big sort, the same query
in PostgreSQL when hash join was forced spent 23% of its time on the sort (from explain
analyse actual stats). I have played about with the sort_mem, but it doesn't make much
diffrence.

I have also noticed that SQLServer tends to fold more complex IN subselects into the
main query using merge joins, maybe for the same reason as above.

SQLServer seems to have some more "exotic" joins ("nested loop/left semi join","nested
loop/left anti semi join"). These are probably just variants of nested loops, but I don't
know enough about it to say if they make a difference. Clustered indexes and clustered
index seeks also seem to be a big player in the more complex queries.

I still have quite a lot comparitive testing and tuning to do before I can nail it down
further, but I will let you know when I have some hard stats to go on.


On 3 Apr 2004 at 10:59, Josh Berkus wrote:

Gary,

> There are no indexes on the columns involved in the update, they are
> not required for my usual select statements. This is an attempt to
> slightly denormalise the design to get the performance up comparable
> to SQL Server 2000. We hope to move some of our databases over to
> PostgreSQL later in the year and this is part of the ongoing testing.
> SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> so I am hand optimising some of the more frequently used
> SQL and/or tweaking the database design slightly.

Hmmm ... that hasn't been my general experience on complex queries.   However,
it may be due to a difference in ANALYZE statistics.   I'd love to see you
increase your default_stats_target, re-analyze, and see if PostgreSQL gets
"smarter".

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly


--
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


Re: PostgreSQL and Linux 2.6 kernel.

From
Gary Doades
Date:
Following on from Josh's response and my previous reply on SQLServer planning.

The main problem query is this one:

SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS,
SC.MIN_HOURS,
 (SELECT COUNT(*) FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON
(MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID AND
MAIN_ORDER.CLIENT_ID = 6) WHERE TIMESHEET_DETAIL.CONTRACT_ID =
SC.CONTRACT_ID) AS VISITS,
(SELECT (SUM(R.DURATION+1))/60.0 FROM ORDER_REQT R
 JOIN STAFF_BOOKING B ON (B.REQT_ID = R.REQT_ID)
 JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) WHERE
B.CONTRACT_ID = SC.CONTRACT_ID
 AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18') AS RHOURS
FROM VSTAFF VS
JOIN STAFF_CONTRACT SC ON (SC.STAFF_ID = VS.STAFF_ID)
JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID)
JOIN SEARCH_REQT_RESULT SR ON (SR.STAFF_ID = VS.STAFF_ID)
WHERE SR.SEARCH_ID = 1 AND SC.CONTRACT_ID IN
(SELECT C.CONTRACT_ID FROM STAFF_PRODUCT P,STAFF_CONTRACT C
WHERE P.CONTRACT_ID=C.CONTRACT_ID AND C.STAFF_ID = VS.STAFF_ID AND
P.PRODUCT_ID IN (SELECT PRODUCT_ID FROM SEARCH_ORDER_REQT WHERE
SEARCH_ID = 1)  AND C.AVAIL_DATE_FROM <= '2004-06-12' AND
C.AVAIL_DATE_TO >= '2004-06-18'  GROUP BY C.CONTRACT_ID
 HAVING (COUNT(C.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID)
FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1)))

The explain analyze is:
QUERY PLAN
Nested Loop  (cost=101.54..1572059.57 rows=135 width=152) (actual
time=13749.100..1304586.501 rows=429 loops=1)
  InitPlan
    ->  Index Scan using fk_idx_wruserarea on wruserarea  (cost=3.26..6.52 rows=1
width=4) (actual time=0.944..0.944 rows=1 loops=1)
          Index Cond: (area_id = 1)
          Filter: (uid = $4)
          InitPlan
            ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4) (actual
time=0.686..0.691 rows=1 loops=1)
                  Filter: ((username)::name = "current_user"())
  ->  Hash Join  (cost=95.02..3701.21 rows=215 width=138) (actual
time=100.476..1337.392 rows=429 loops=1)
        Hash Cond: ("outer".staff_id = "inner".staff_id)
        Join Filter: (subplan)
        ->  Seq Scan on staff_contract sc  (cost=0.00..33.24 rows=1024 width=37) (actual
time=0.114..245.366 rows=1024 loops=1)
        ->  Hash  (cost=93.95..93.95 rows=430 width=109) (actual time=38.563..38.563
rows=0 loops=1)
              ->  Hash Join  (cost=47.47..93.95 rows=430 width=109) (actual
time=15.502..36.627 rows=429 loops=1)
                    Hash Cond: ("outer".staff_id = "inner".staff_id)
                    ->  Seq Scan on staff  (cost=34.61..66.48 rows=1030 width=105) (actual
time=9.655..15.264 rows=1030 loops=1)
                          Filter: ((hashed subplan) OR $5)
                          SubPlan
                            ->  Seq Scan on staff_area  (cost=10.73..33.38 rows=493 width=4)
(actual time=8.452..8.452 rows=0 loops=1)
                                  Filter: ((hashed subplan) OR (area_id = 1))
                                  SubPlan
                                    ->  Seq Scan on wruserarea  (cost=3.26..10.72 rows=5 width=4)
(actual time=0.977..1.952 rows=1 loops=1)
                                          Filter: (uid = $1)
                                          InitPlan
                                            ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4)
(actual time=0.921..0.926 rows=1 loops=1)
                                                  Filter: ((username)::name = "current_user"())
                    ->  Hash  (cost=11.79..11.79 rows=430 width=4) (actual time=5.705..5.705
rows=0 loops=1)
                          ->  Index Scan using fk_idx_search_reqt_result on search_reqt_result
sr  (cost=0.00..11.79 rows=430 width=4) (actual time=0.470..4.482 rows=429 loops=1)
                                Index Cond: (search_id = 1)
        SubPlan
          ->  HashAggregate  (cost=8.32..8.32 rows=1 width=4) (actual time=2.157..2.157
rows=1 loops=429)
                Filter: (count(contract_id) = $9)
                InitPlan
                  ->  Aggregate  (cost=1.04..1.04 rows=1 width=4) (actual time=0.172..0.173
rows=1 loops=1)
                        ->  Seq Scan on search_order_reqt  (cost=0.00..1.04 rows=1 width=4)
(actual time=0.022..0.038 rows=1 loops=1)
                              Filter: (search_id = 1)
                ->  Hash IN Join  (cost=1.04..7.27 rows=1 width=4) (actual time=2.064..2.117
rows=1 loops=429)
                      Hash Cond: ("outer".product_id = "inner".product_id)
                      ->  Nested Loop  (cost=0.00..6.19 rows=7 width=8) (actual
time=1.112..2.081 rows=8 loops=429)
                            ->  Index Scan using fk_idx_staff_contract_2 on staff_contract c
(cost=0.00..3.03 rows=1 width=4) (actual time=0.206..0.245 rows=1 loops=429)
                                  Index Cond: (staff_id = $8)
                                  Filter: ((avail_date_from <= '2004-06-12'::date) AND (avail_date_to
>= '2004-06-18'::date))
                            ->  Index Scan using fk_idx_staff_product on staff_product p
(cost=0.00..3.08 rows=6 width=8) (actual time=0.873..1.764 rows=8 loops=429)
                                  Index Cond: (p.contract_id = "outer".contract_id)
                      ->  Hash  (cost=1.04..1.04 rows=1 width=4) (actual time=0.086..0.086
rows=0 loops=1)
                            ->  Seq Scan on search_order_reqt  (cost=0.00..1.04 rows=1 width=4)
(actual time=0.037..0.050 rows=1 loops=1)
                                  Filter: (search_id = 1)
  ->  Index Scan using location_pkey on "location"  (cost=0.00..12.66 rows=1 width=18)
(actual time=0.876..0.887 rows=1 loops=429)
        Index Cond: ("location".location_id = "outer".location_id)
        Filter: ((area_id = 1) OR (subplan))
        SubPlan
          ->  Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on wruserarea
(cost=3.26..9.64 rows=1 width=4) (never executed)
                Index Cond: ((area_id = 1) OR (area_id = $7))
                Filter: (uid = $6)
                InitPlan
                  ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4) (never executed)
                        Filter: ((username)::name = "current_user"())
  SubPlan
    ->  Aggregate  (cost=11233.28..11233.29 rows=1 width=2) (actual
time=3036.814..3036.815 rows=1 loops=429)
          ->  Nested Loop  (cost=10391.71..11233.21 rows=30 width=2) (actual
time=2817.923..3036.516 rows=34 loops=429)
                ->  Hash Join  (cost=10391.71..11142.43 rows=30 width=4) (actual
time=2813.349..3007.936 rows=34 loops=429)
                      Hash Cond: ("outer".booking_id = "inner".booking_id)
                      ->  Index Scan using booking_plan_idx2 on booking_plan bp
(cost=0.00..572.52 rows=23720 width=4) (actual time=0.070..157.028 rows=24613
loops=429)
                            Index Cond: ((booking_date >= '2004-06-12'::date) AND
(booking_date <= '2004-06-18'::date))
                      ->  Hash  (cost=10382.78..10382.78 rows=3571 width=8) (actual
time=2746.122..2746.122 rows=0 loops=429)
                            ->  Index Scan using fk_idx_staff_booking on staff_booking b
(cost=0.00..10382.78 rows=3571 width=8) (actual time=14.168..2733.315 rows=3815
loops=429)
                                  Index Cond: (contract_id = $0)
                ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..3.01
rows=1 width=6) (actual time=0.826..0.832 rows=1 loops=14401)
                      Index Cond: ("outer".reqt_id = r.reqt_id)
    ->  Aggregate  (cost=363.94..363.94 rows=1 width=0) (actual time=0.057..0.058
rows=1 loops=429)
          ->  Nested Loop  (cost=0.00..363.94 rows=1 width=0) (actual time=0.034..0.034
rows=0 loops=429)
                ->  Index Scan using fk_idx_main_order on main_order  (cost=0.00..4.99
rows=1 width=4) (actual time=0.031..0.031 rows=0 loops=429)
                      Index Cond: (client_id = 6)
                ->  Index Scan using fk_idx_timesheet_detail_3 on timesheet_detail
(cost=0.00..358.93 rows=1 width=4) (never executed)
                      Index Cond: ("outer".order_id = timesheet_detail.order_id)
                      Filter: (contract_id = $0)
Total runtime: 1304591.861 ms

Long Time! The main issue here is that the RHOURS subselect is executed as a nested
join 429 times. unfortunately this is an expensive subquery.

SQLServer executed this in just over 1 second on comparable hardware. Looking at its
execution plan it flattens out the two subselects with a merge join. So I manually rewrote
the query using derived tables and joins as:

SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS,
SC.MIN_HOURS, TBOOK.RHOURS, TVIS.VISITS FROM SEARCH_REQT_RESULT
SR
JOIN STAFF_CONTRACT SC ON (SR.STAFF_ID = SC.STAFF_ID) AND
SC.AVAIL_DATE_FROM <= '2004-06-12' AND SC.AVAIL_DATE_TO >= '2004-06-18'
JOIN VSTAFF VS ON (VS.STAFF_ID = SC.STAFF_ID)
JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID)
LEFT OUTER JOIN (SELECT B.CONTRACT_ID, SUM(R.DURATION+1)/60.0 AS
RHOURS FROM STAFF_BOOKING B
JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND
BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18'
JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID)
 GROUP BY B.CONTRACT_ID) AS TBOOK
ON (SC.CONTRACT_ID = TBOOK.CONTRACT_ID)
LEFT OUTER JOIN (SELECT CONTRACT_ID,COUNT(*) AS VISITS FROM
TIMESHEET_DETAIL
JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID =
TIMESHEET_DETAIL.ORDER_ID) WHERE MAIN_ORDER.CLIENT_ID = 6
GROUP BY CONTRACT_ID) AS TVIS ON (TVIS.CONTRACT_ID = SC.CONTRACT_ID)
JOIN (SELECT P.CONTRACT_ID FROM STAFF_PRODUCT P,
SEARCH_ORDER_REQT SR
WHERE P.PRODUCT_ID = SR.PRODUCT_ID AND SR.SEARCH_ID = 1
GROUP BY P.CONTRACT_ID
HAVING (COUNT(P.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID)
FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1))) AS TCONT ON
(TCONT.CONTRACT_ID = SC.CONTRACT_ID)
WHERE SR.SEARCH_ID = 1

With the explain analyze as:
QUERY PLAN
Hash Join  (cost=137054.42..137079.74 rows=159 width=192) (actual
time=6228.354..6255.058 rows=429 loops=1)
  Hash Cond: ("outer".contract_id = "inner".contract_id)
  InitPlan
    ->  Index Scan using fk_idx_wruserarea on wruserarea  (cost=3.26..6.52 rows=1
width=4) (actual time=0.850..0.850 rows=1 loops=1)
          Index Cond: (area_id = 1)
          Filter: (uid = $3)
          InitPlan
            ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4) (actual
time=0.670..0.675 rows=1 loops=1)
                  Filter: ((username)::name = "current_user"())
  ->  Subquery Scan tcont  (cost=152.63..161.81 rows=612 width=4) (actual
time=36.312..42.268 rows=612 loops=1)
        ->  HashAggregate  (cost=152.63..155.69 rows=612 width=4) (actual
time=36.301..40.040 rows=612 loops=1)
              Filter: (count(contract_id) = $7)
              InitPlan
                ->  Aggregate  (cost=1.04..1.04 rows=1 width=4) (actual time=0.107..0.108
rows=1 loops=1)
                      ->  Seq Scan on search_order_reqt  (cost=0.00..1.04 rows=1 width=4)
(actual time=0.025..0.037 rows=1 loops=1)
                            Filter: (search_id = 1)
              ->  Hash Join  (cost=1.04..148.53 rows=612 width=4) (actual
time=0.419..32.284 rows=612 loops=1)
                    Hash Cond: ("outer".product_id = "inner".product_id)
                    ->  Seq Scan on staff_product p  (cost=0.00..109.91 rows=6291 width=8)
(actual time=0.117..17.943 rows=6291 loops=1)
                    ->  Hash  (cost=1.04..1.04 rows=1 width=4) (actual time=0.190..0.190
rows=0 loops=1)
                          ->  Seq Scan on search_order_reqt sr  (cost=0.00..1.04 rows=1
width=4) (actual time=0.165..0.177 rows=1 loops=1)
                                Filter: (search_id = 1)
  ->  Hash  (cost=136894.61..136894.61 rows=266 width=192) (actual
time=6191.923..6191.923 rows=0 loops=1)
        ->  Merge Left Join  (cost=136886.03..136894.61 rows=266 width=192) (actual
time=6143.315..6189.685 rows=429 loops=1)
              Merge Cond: ("outer".contract_id = "inner".contract_id)
              ->  Merge Left Join  (cost=136517.64..136525.04 rows=266 width=184) (actual
time=6142.896..6171.676 rows=429 loops=1)
                    Merge Cond: ("outer".contract_id = "inner".contract_id)
                    ->  Sort  (cost=5529.68..5530.34 rows=266 width=152) (actual
time=129.548..130.027 rows=429 loops=1)
                          Sort Key: sc.contract_id
                          ->  Nested Loop  (cost=88.35..5518.96 rows=266 width=152) (actual
time=33.213..121.666 rows=429 loops=1)
                                ->  Hash Join  (cost=88.35..143.88 rows=424 width=138) (actual
time=32.739..76.357 rows=429 loops=1)
                                      Hash Cond: ("outer".staff_id = "inner".staff_id)
                                      ->  Hash Join  (cost=47.47..93.95 rows=430 width=109) (actual
time=15.232..40.040 rows=429 loops=1)
                                            Hash Cond: ("outer".staff_id = "inner".staff_id)
                                            ->  Seq Scan on staff  (cost=34.61..66.48 rows=1030
width=105) (actual time=9.412..16.105 rows=1030 loops=1)
                                                  Filter: ((hashed subplan) OR $4)
                                                  SubPlan
                                                    ->  Seq Scan on staff_area  (cost=10.73..33.38
rows=493 width=4) (actual time=8.380..8.380 rows=0 loops=1)
                                                          Filter: ((hashed subplan) OR (area_id = 1))
                                                          SubPlan
                                                            ->  Seq Scan on wruserarea  (cost=3.26..10.72
rows=5 width=4) (actual time=0.953..1.941 rows=1 loops=1)
                                                                  Filter: (uid = $0)
                                                                  InitPlan
                                                                    ->  Seq Scan on wruser  (cost=0.00..3.26
rows=1 width=4) (actual time=0.902..0.908 rows=1 loops=1)
                                                                          Filter: ((username)::name =
"current_user"())
                                            ->  Hash  (cost=11.79..11.79 rows=430 width=4) (actual
time=5.670..5.670 rows=0 loops=1)
                                                  ->  Index Scan using fk_idx_search_reqt_result on
search_reqt_result sr  (cost=0.00..11.79 rows=430 width=4) (actual time=0.448..4.516
rows=429 loops=1)
                                                        Index Cond: (search_id = 1)
                                      ->  Hash  (cost=38.36..38.36 rows=1008 width=37) (actual
time=17.386..17.386 rows=0 loops=1)
                                            ->  Seq Scan on staff_contract sc  (cost=0.00..38.36
rows=1008 width=37) (actual time=0.222..14.063 rows=1008 loops=1)
                                                  Filter: ((avail_date_from <= '2004-06-12'::date) AND
(avail_date_to >= '2004-06-18'::date))
                                ->  Index Scan using location_pkey on "location"  (cost=0.00..12.66
rows=1 width=18) (actual time=0.043..0.050 rows=1 loops=429)
                                      Index Cond: ("location".location_id = "outer".location_id)
                                      Filter: ((area_id = 1) OR (subplan))
                                      SubPlan
                                        ->  Index Scan using fk_idx_wruserarea, fk_idx_wruserarea on
wruserarea  (cost=3.26..9.64 rows=1 width=4) (never executed)
                                              Index Cond: ((area_id = 1) OR (area_id = $6))
                                              Filter: (uid = $5)
                                              InitPlan
                                                ->  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4)
(never executed)
                                                      Filter: ((username)::name = "current_user"())
                    ->  Sort  (cost=130987.97..130989.96 rows=797 width=36) (actual
time=6013.254..6014.112 rows=746 loops=1)
                          Sort Key: tbook.contract_id
                          ->  Subquery Scan tbook  (cost=130933.62..130949.56 rows=797
width=36) (actual time=5993.070..6007.677 rows=746 loops=1)
                                ->  HashAggregate  (cost=130933.62..130941.59 rows=797
width=6) (actual time=5993.055..6004.099 rows=746 loops=1)
                                      ->  Merge Join  (cost=74214.90..130815.02 rows=23720
width=6) (actual time=4950.951..5807.985 rows=24613 loops=1)
                                            Merge Cond: ("outer".reqt_id = "inner".reqt_id)
                                            ->  Index Scan using order_reqt_pkey on order_reqt r
(cost=0.00..50734.20 rows=2206291 width=6) (actual time=0.444..2753.374
rows=447439 loops=1)
                                            ->  Sort  (cost=74214.90..74274.20 rows=23720 width=8)
(actual time=1822.405..1856.081 rows=24613 loops=1)
                                                  Sort Key: b.reqt_id
                                                  ->  Nested Loop  (cost=0.00..72491.19 rows=23720
width=8) (actual time=1.955..1633.124 rows=24613 loops=1)
                                                        ->  Index Scan using booking_plan_idx2 on
booking_plan bp  (cost=0.00..572.52 rows=23720 width=4) (actual time=1.468..243.827
rows=24613 loops=1)
                                                              Index Cond: ((booking_date >= '2004-06-
12'::date) AND (booking_date <= '2004-06-18'::date))
                                                        ->  Index Scan using staff_booking_pkey on
staff_booking b  (cost=0.00..3.02 rows=1 width=12) (actual time=0.037..0.042 rows=1
loops=24613)
                                                              Index Cond: ("outer".booking_id = b.booking_id)
              ->  Sort  (cost=368.38..368.55 rows=68 width=12) (actual time=0.338..0.338
rows=0 loops=1)
                    Sort Key: tvis.contract_id
                    ->  Subquery Scan tvis  (cost=365.46..366.31 rows=68 width=12) (actual
time=0.307..0.307 rows=0 loops=1)
                          ->  HashAggregate  (cost=365.46..365.63 rows=68 width=4) (actual
time=0.302..0.302 rows=0 loops=1)
                                ->  Nested Loop  (cost=0.00..365.12 rows=68 width=4) (actual
time=0.290..0.290 rows=0 loops=1)
                                      ->  Index Scan using fk_idx_main_order on main_order
(cost=0.00..4.99 rows=1 width=4) (actual time=0.286..0.286 rows=0 loops=1)
                                            Index Cond: (client_id = 6)
                                      ->  Index Scan using fk_idx_timesheet_detail_3 on
timesheet_detail  (cost=0.00..358.63 rows=120 width=8) (never executed)
                                            Index Cond: ("outer".order_id = timesheet_detail.order_id)
Total runtime: 6266.205 ms

This now gives me the same results, but with orders of magnitude better execution
times!

Oddly enough, SQLServer really struggles with the second query, taking longer then
PostgreSQL!!!!

Regards,
Gary.


On 3 Apr 2004 at 10:59, Josh Berkus wrote:

Gary,

> There are no indexes on the columns involved in the update, they are
> not required for my usual select statements. This is an attempt to
> slightly denormalise the design to get the performance up comparable
> to SQL Server 2000. We hope to move some of our databases over to
> PostgreSQL later in the year and this is part of the ongoing testing.
> SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> so I am hand optimising some of the more frequently used
> SQL and/or tweaking the database design slightly.

Hmmm ... that hasn't been my general experience on complex queries.   However,
it may be due to a difference in ANALYZE statistics.   I'd love to see you
increase your default_stats_target, re-analyze, and see if PostgreSQL gets
"smarter".

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly


--
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


Re: PostgreSQL and Linux 2.6 kernel.

From
Gary Doades
Date:
Thanks,

I know about set showplan_text, but it is only the equivalent of explain,
not explain analyze. The graphical plan gives full statistics, runtime,
percentage cost, loop execution counts etc. which is much more useful.
I don't know of a way of getting the graphical plan content in text form.

Cheers,
Gary.

On 3 Apr 2004 at 6:50, @g v t c wrote:

Use "Set Show_Plan" or something of the sort in Query Analyzer.  Then
run your SQL.  This will change the graphical plan to a text plan
similar to Postgresql or at least something close to readable.

Gary Doades wrote:

>On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote:
>
>OK, some more detail:
>
>Before wiping 2.4 off my test box for the second time:
>
>SQL Statement for update:
>update staff_booking set time_from = r.time_from from order_reqt r where r.reqt_id =
>staff_booking.reqt_id;
>
>Explain: (on 2.4)
>QUERY PLAN
>Merge Join  (cost=0.00..185731.30 rows=2845920 width=92)
>  Merge Cond: ("outer".reqt_id = "inner".reqt_id)
>  ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..53068.20
>rows=2206291 width=6)
>  ->  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..99579.21
>rows=2845920 width=90)
>
>Total execution time: 18 hours 12 minutes
>
>vacuum full analyze: total time 3 hours 22 minutes
>
>Wait 2 hours for re-install 2.6, set params etc.
>restore database.
>
>Same SQL Statement
>Explain: (on 2.6)
>QUERY PLAN
>Merge Join  (cost=0.00..209740.24 rows=2845920 width=92)
>  Merge Cond: ("outer".reqt_id = "inner".reqt_id)
>  ->  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..50734.20
>rows=2206291 width=6)
>  ->  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..117921.92
>rows=2845920 width=90)
>
>Total execution time: 2 hours 53 minutes
>
>vacuum full analyze: total time 1 hours 6 minutes
>
>Table definitions for the two tables involved:
>CREATE TABLE ORDER_REQT
>(
>    REQT_ID               SERIAL,
>    ORDER_ID               integer NOT NULL,
>    DAYOFWEEK           smallint NOT NULL CHECK (DAYOFWEEK
>BETWEEN 0 AND 6),
>    TIME_FROM             smallint NOT NULL CHECK (TIME_FROM
>BETWEEN 0 AND 1439),
>    DURATION               smallint NOT NULL CHECK (DURATION
>BETWEEN 0 AND 1439),
>    PRODUCT_ID            integer NOT NULL,
>    NUMBER_REQT              smallint NOT NULL DEFAULT (1),
>    WROPTIONS            integer NOT NULL DEFAULT 0,
>    UID_REF              integer NOT NULL,
>    DT_STAMP              timestamp NOT NULL DEFAULT
>current_timestamp,
>    Sentinel_Priority         integer NOT NULL DEFAULT 0,
>    PERIOD            smallint NOT NULL DEFAULT 1 CHECK
>(PERIOD BETWEEN -2 AND 4),
>    FREQUENCY            smallint NOT NULL DEFAULT 1,
>    PRIMARY KEY (REQT_ID)
>);
>
>CREATE TABLE STAFF_BOOKING
>(
>    BOOKING_ID            SERIAL,
>    REQT_ID            integer NOT NULL,
>    ENTITY_TYPE           smallint NOT NULL DEFAULT 3
>check(ENTITY_TYPE in(3,4)),
>    STAFF_ID              integer NOT NULL,
>    CONTRACT_ID          integer NOT NULL,
>    TIME_FROM             smallint NOT NULL CHECK (TIME_FROM
>BETWEEN 0 AND 1439),
>    DURATION               smallint NOT NULL CHECK (DURATION
>BETWEEN 0 AND 1439),
>    PERIOD            smallint NOT NULL DEFAULT 1 CHECK
>(PERIOD BETWEEN -2 AND 4),
>    FREQUENCY            smallint NOT NULL DEFAULT 1,
>    TRAVEL_TO             smallint NOT NULL DEFAULT 0,
>    UID_REF              integer NOT NULL,
>    DT_STAMP              timestamp NOT NULL DEFAULT
>current_timestamp,
>    SELL_PRICE            numeric(10,4) NOT NULL DEFAULT 0,
>    COST_PRICE            numeric(10,4) NOT NULL DEFAULT 0,
>    MIN_SELL_PRICE        numeric(10,4) NOT NULL DEFAULT 0,
>    MIN_COST_PRICE        numeric(10,4) NOT NULL DEFAULT 0,
>    Sentinel_Priority         integer NOT NULL DEFAULT 0,
>    CHECK_INTERVAL            smallint NOT NULL DEFAULT 0,
>      STATUS            smallint NOT NULL DEFAULT 0,
>    WROPTIONS            integer NOT NULL DEFAULT 0,
>    PRIMARY KEY (BOOKING_ID)
>);
>
>Foreign keys:
>
>ALTER TABLE ORDER_REQT ADD
>     FOREIGN KEY
>    (
>        ORDER_ID
>    ) REFERENCES MAIN_ORDER (
>        ORDER_ID
>    ) ON DELETE CASCADE;
>
>ALTER TABLE ORDER_REQT ADD
>     FOREIGN KEY
>    (
>        PRODUCT_ID
>    ) REFERENCES PRODUCT (
>        PRODUCT_ID
>    );
>
>ALTER TABLE STAFF_BOOKING ADD
>     FOREIGN KEY
>    (
>        CONTRACT_ID
>    ) REFERENCES STAFF_CONTRACT (
>        CONTRACT_ID
>    );
>
>ALTER TABLE STAFF_BOOKING ADD
>     FOREIGN KEY
>    (
>        STAFF_ID
>    ) REFERENCES STAFF (
>        STAFF_ID
>    );
>
>
>Indexes:
>
>CREATE INDEX FK_IDX_ORDER_REQT
>     ON ORDER_REQT
>    (
>        ORDER_ID
>    );
>
>CREATE INDEX FK_IDX_ORDER_REQT_2
>     ON ORDER_REQT
>    (
>        PRODUCT_ID
>    );
>
>CREATE INDEX ORDER_REQT_IDX ON ORDER_REQT
>(
>    ORDER_ID,
>    PRODUCT_ID
>);
>
>CREATE INDEX ORDER_REQT_IDX4 ON ORDER_REQT
>(
>    REQT_ID,
>    TIME_FROM,
>    DURATION
>);
>
>CREATE INDEX FK_IDX_STAFF_BOOKING
>     ON STAFF_BOOKING
>    (
>        CONTRACT_ID
>    );
>
>CREATE INDEX FK_IDX_STAFF_BOOKING_2
>     ON STAFF_BOOKING
>    (
>        STAFF_ID
>    );
>
>CREATE INDEX STAFF_BOOK_IDX1 ON STAFF_BOOKING
>(
>    STAFF_ID,
>    REQT_ID
>);
>
>CREATE INDEX STAFF_BOOK_IDX2 ON STAFF_BOOKING
>(
>    REQT_ID
>);
>
>CREATE INDEX STAFF_BOOK_IDX3 ON STAFF_BOOKING
>(
>    BOOKING_ID,
>    REQT_ID
>);
>
>
>CREATE INDEX STAFF_BOOK_IDX4 ON STAFF_BOOKING
>(
>    BOOKING_ID,
>    CONTRACT_ID
>);
>
>There are no indexes on the columns involved in the update, they are
>not required for my usual select statements. This is an attempt to
>slightly denormalise the design to get the performance up comparable
>to SQL Server 2000. We hope to move some of our databases over to
>PostgreSQL later in the year and this is part of the ongoing testing.
>SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
>so I am hand optimising some of the more frequently used
>SQL and/or tweaking the database design slightly.
>
>Later, after deciphering SQLServers graphical plans I will attempt to
>post comparitive performance/access plans, using the same data of
>course, if anyone would be interested....
>
>Cheers,
>Gary.
>
>
>
>On 2 Apr 2004 at 1:32, Tom Lane wrote:
>
>
>
>>"Gary Doades" <gpd@gpdnet.co.uk> writes:
>>
>>
>>>As a test in PosgreSQL I issued a statement to update a single column
>>>of a table containing 2.8 million rows with the values of a column in
>>>a table with similar rowcount.  Using the above spec I had to stop the
>>>server after 17 hours. The poor thing was thrashing the hard disk and
>>>doing more swapping than useful work.
>>>
>>>
>>
>>This statement is pretty much content-free, since you did not show us
>>the table schemas, the query, or the EXPLAIN output for the query.
>>(I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily
>>have provided all the other hard facts.)  There's really no way to tell
>>where the bottleneck is.  Maybe it's a kernel-level issue, but I would
>>not bet on that without more evidence.  I'd definitely not bet on it
>>without direct confirmation that the same query plan was used in both
>>setups.
>>
>>            regards, tom lane
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 3: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>>
>>--
>>Incoming mail is certified Virus Free.
>>Checked by AVG Anti-Virus (http://www.grisoft.com).
>>Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
>>
>>
>>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
>
>


--
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


Re: PostgreSQL and Linux 2.6 kernel.

From
"Aaron Werman"
Date:
Almost any cross dbms migration shows a drop in performance. The engine
effectively trains developers and administrators in what works and what
doesn't. The initial migration thus compares a tuned to an untuned version.

/Aaron

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Gary Doades" <gpd@gpdnet.co.uk>; <pgsql-performance@postgresql.org>
Sent: Saturday, April 03, 2004 1:59 PM
Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.


> Gary,
>
> > There are no indexes on the columns involved in the update, they are
> > not required for my usual select statements. This is an attempt to
> > slightly denormalise the design to get the performance up comparable
> > to SQL Server 2000. We hope to move some of our databases over to
> > PostgreSQL later in the year and this is part of the ongoing testing.
> > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> > so I am hand optimising some of the more frequently used
> > SQL and/or tweaking the database design slightly.
>
> Hmmm ... that hasn't been my general experience on complex queries.
However,
> it may be due to a difference in ANALYZE statistics.   I'd love to see you
> increase your default_stats_target, re-analyze, and see if PostgreSQL gets
> "smarter".
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Re: PostgreSQL and Linux 2.6 kernel.

From
Cott Lang
Date:
On Sat, 2004-04-03 at 03:50, Gary Doades wrote:
> On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote:
>
> OK, some more detail:
>
> Before wiping 2.4 off my test box for the second time:

Perhaps I missed it, but which io scheduler are you using under 2.6?



Re: PostgreSQL and Linux 2.6 kernel.

From
Mike Nolan
Date:
> Almost any cross dbms migration shows a drop in performance. The engine
> effectively trains developers and administrators in what works and what
> doesn't. The initial migration thus compares a tuned to an untuned version.

I think it is also possible that Microsoft has more programmers working
on tuning issues for SQL Server than PostgreSQL has working on the
whole project.
--
Mike Nolan

Substitute for this oracle query in postGre

From
"Kamalraj Singh Madhan"
Date:
Hi Friends,
        Does anybody know the substitute of the oracle function 'connect by
prior' in postgre sql.
  The query is basically being used to get a tree structure of records. The
query in oracle is :-

select pkmsgid
from mstmessage
connect by prior pkmsgid = msgparentid
start with msgparentid = 1

Kindly suggest.

regards
Kamal



*********************************************************************
Network Programs is a SEI CMM Level 5 Certified Company
********************************************************************
The information contained in this communication (including any attachments) is
intended solely for the use of the individual or entity to whom it is addressed
and others authorized to receive it. It may contain confidential or legally
privileged information. If you are not the intended recipient you are hereby
notified that any disclosure, copying, distribution or taking any action in
reliance on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify us
immediately by responding to this email and then delete it from your system.
Network Programs (India) Limited is neither liable for the proper and complete
transmission of the information contained in this communication nor for any
delay in its receipt.
*********************************************************************


Re: PostgreSQL and Linux 2.6 kernel.

From
"Gary Doades"
Date:
Possibly.

A lot of my queries show comparable performance, some a little slower
and a few a little faster. There are a few, however, that really grind on
PostgreSQL. I am leaning patterns from these to try and and target the
most likely performance problems to come and hand tune these types
of SQL.

I'm not complaining about PostgreSQL or saying that SQLServer is
better, in most cases it is not. SQLServer seems to be more predictable
and forgiving in performance which tends to make for lazy SQL
programming. It also has implications when the SQL is dynamically
created based on user input, there are more chances of PostgreSQL
hitting a performance problem than SQLServer.

Overall I'm still very impressed with PostgreSQL. Given the $7000 per
processor licence for SQLServer makes the case for PostgreSQL even
stronger!

Cheers,
Gary.

On 3 Apr 2004 at 17:43, Aaron Werman wrote:

Almost any cross dbms migration shows a drop in performance. The engine
effectively trains developers and administrators in what works and what
doesn't. The initial migration thus compares a tuned to an untuned version.

/Aaron

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: "Gary Doades" <gpd@gpdnet.co.uk>; <pgsql-performance@postgresql.org>
Sent: Saturday, April 03, 2004 1:59 PM
Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.


> Gary,
>
> > There are no indexes on the columns involved in the update, they are
> > not required for my usual select statements. This is an attempt to
> > slightly denormalise the design to get the performance up comparable
> > to SQL Server 2000. We hope to move some of our databases over to
> > PostgreSQL later in the year and this is part of the ongoing testing.
> > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> > so I am hand optimising some of the more frequently used
> > SQL and/or tweaking the database design slightly.
>
> Hmmm ... that hasn't been my general experience on complex queries.
However,
> it may be due to a difference in ANALYZE statistics.   I'd love to see you
> increase your default_stats_target, re-analyze, and see if PostgreSQL gets
> "smarter".
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org


--
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


Re: PostgreSQL and Linux 2.6 kernel.

From
"Gary Doades"
Date:
Unfortunately I don't understand the question!

My background is the primarily Win32. The last time I used a *nix OS
was about 20 years ago apart from occasional dips into the linux OS
over the past few years. If you can tell be how to find out what you want
I will gladly give you the information.

Regards,
Gary.

On 3 Apr 2004 at 16:52, Cott Lang wrote:

> On Sat, 2004-04-03 at 03:50, Gary Doades wrote:
> > On 2 Apr 2004 at 22:36, pgsql-performance@postgresql. wrote:
> >
> > OK, some more detail:
> >
> > Before wiping 2.4 off my test box for the second time:
>
> Perhaps I missed it, but which io scheduler are you using under 2.6?
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
>
> --
> Incoming mail is certified Virus Free.
> Checked by AVG Anti-Virus (http://www.grisoft.com).
> Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
>



Re: PostgreSQL and Linux 2.6 kernel.

From
"Gary Doades"
Date:
Unfortunately I have to try and keep both SQLServer and PostgreSQL
compatibilty. Our main web application is currently SQLServer, but we
want to migrate customers who don't care what the DB server is over to
PostgreSQL. Some of our larger customers demand SQLServer, you
know how it is!

I don't want to maintain two sets of code or SQL, so I am trying to find
common ground. The code is not a problem, but the SQL sometimes is.

Cheers,
Gary.


On 3 Apr 2004 at 17:43, Aaron Werman wrote:

> Almost any cross dbms migration shows a drop in performance. The engine
> effectively trains developers and administrators in what works and what
> doesn't. The initial migration thus compares a tuned to an untuned version.
>
> /Aaron
>
> ----- Original Message -----
> From: "Josh Berkus" <josh@agliodbs.com>
> To: "Gary Doades" <gpd@gpdnet.co.uk>; <pgsql-performance@postgresql.org>
> Sent: Saturday, April 03, 2004 1:59 PM
> Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
>
>
> > Gary,
> >
> > > There are no indexes on the columns involved in the update, they are
> > > not required for my usual select statements. This is an attempt to
> > > slightly denormalise the design to get the performance up comparable
> > > to SQL Server 2000. We hope to move some of our databases over to
> > > PostgreSQL later in the year and this is part of the ongoing testing.
> > > SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
> > > so I am hand optimising some of the more frequently used
> > > SQL and/or tweaking the database design slightly.
> >
> > Hmmm ... that hasn't been my general experience on complex queries.
> However,
> > it may be due to a difference in ANALYZE statistics.   I'd love to see you
> > increase your default_stats_target, re-analyze, and see if PostgreSQL gets
> > "smarter".
> >
> > --
> > -Josh Berkus
> >  Aglio Database Solutions
> >  San Francisco
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@postgresql.org so that your
> >       message can get through to the mailing list cleanly
> >
>
>
> --
> Incoming mail is certified Virus Free.
> Checked by AVG Anti-Virus (http://www.grisoft.com).
> Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
>



Re: PostgreSQL and Linux 2.6 kernel.

From
Gary Doades
Date:
On 3 Apr 2004 at 21:23, Mike Nolan wrote:

> > Almost any cross dbms migration shows a drop in performance. The engine
> > effectively trains developers and administrators in what works and what
> > doesn't. The initial migration thus compares a tuned to an untuned version.
>
> I think it is also possible that Microsoft has more programmers working
> on tuning issues for SQL Server than PostgreSQL has working on the
> whole project.
> --
> Mike Nolan
>

Agreed. Also considering the high price of SQLServer it is in their
interests to spend a lot of resources on tuning/performance to give it a
commercial edge over it rivals and in silly benchmark scores.

Cheers,
Gary.


> --
> Incoming mail is certified Virus Free.
> Checked by AVG Anti-Virus (http://www.grisoft.com).
> Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
>



Re: PostgreSQL and Linux 2.6 kernel.

From
Paul Thomas
Date:
On 04/04/2004 09:56 Gary Doades wrote:
> Unfortunately I don't understand the question!
>
> My background is the primarily Win32. The last time I used a *nix OS
> was about 20 years ago apart from occasional dips into the linux OS
> over the past few years. If you can tell be how to find out what you want
>
> I will gladly give you the information.


Googling threw up

http://spider.tm/apr2004/cstory2.html

Interesting and possibly relevant quote:

"Benchmarks have shown that in certain conditions the anticipatory
algorithm is almost 10 times faster than what 2.4 kernel supports".

HTH

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for
Business             |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

Re: Substitute for this oracle query in postGre

From
Christopher Kings-Lynne
Date:
Hi,

Try looking at the contrib/tablefunc add-in module.

Chris

Kamalraj Singh Madhan wrote:
> Hi Friends,
>         Does anybody know the substitute of the oracle function 'connect by
> prior' in postgre sql.
>   The query is basically being used to get a tree structure of records. The
> query in oracle is :-
>
> select pkmsgid
> from mstmessage
> connect by prior pkmsgid = msgparentid
> start with msgparentid = 1
>
> Kindly suggest.
>
> regards
> Kamal
>
>
>
> *********************************************************************
> Network Programs is a SEI CMM Level 5 Certified Company
> ********************************************************************
> The information contained in this communication (including any attachments) is
> intended solely for the use of the individual or entity to whom it is addressed
> and others authorized to receive it. It may contain confidential or legally
> privileged information. If you are not the intended recipient you are hereby
> notified that any disclosure, copying, distribution or taking any action in
> reliance on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> Network Programs (India) Limited is neither liable for the proper and complete
> transmission of the information contained in this communication nor for any
> delay in its receipt.
> *********************************************************************
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: PostgreSQL and Linux 2.6 kernel.

From
Cott Lang
Date:
On Sun, 2004-04-04 at 01:56, Gary Doades wrote:
> Unfortunately I don't understand the question!
>
> My background is the primarily Win32. The last time I used a *nix OS
> was about 20 years ago apart from occasional dips into the linux OS
> over the past few years. If you can tell be how to find out what you want
> I will gladly give you the information.

There are two available io schedulers in 2.6 (new feature), deadline and
anticipatory. It should show be listed in the boot messages:

dmesg | grep scheduler

I've seen people arguing for each of the two schedulers, saying one is
better than the other for databases. I'm curious which one you're
using.  :)




Re: PostgreSQL and Linux 2.6 kernel.

From
"Gary Doades"
Date:
It says:

Using anticipatory io scheduler.

This then fits with the earlier post on other observations of up to 10
times better performance, which I what I was seeing in in certain
circumstances.

Cheers,
Gary.


On 4 Apr 2004 at 6:04, Cott Lang wrote:

> On Sun, 2004-04-04 at 01:56, Gary Doades wrote:
> > Unfortunately I don't understand the question!
> >
> > My background is the primarily Win32. The last time I used a *nix OS
> > was about 20 years ago apart from occasional dips into the linux OS
> > over the past few years. If you can tell be how to find out what you want
> > I will gladly give you the information.
>
> There are two available io schedulers in 2.6 (new feature), deadline and
> anticipatory. It should show be listed in the boot messages:
>
> dmesg | grep scheduler
>
> I've seen people arguing for each of the two schedulers, saying one is
> better than the other for databases. I'm curious which one you're
> using.  :)
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
> --
> Incoming mail is certified Virus Free.
> Checked by AVG Anti-Virus (http://www.grisoft.com).
> Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
>



Re: PostgreSQL and Linux 2.6 kernel.

From
Josh Berkus
Date:
Mike,

> I think it is also possible that Microsoft has more programmers working
> on tuning issues for SQL Server than PostgreSQL has working on the
> whole project.

Ah, but quantity != quality.    Or they wouldn't be trolling our mailing lists
trying to hire PostgreSQL programmers for the SQL Server project (really!).
And we had nearly 200 contributors between 7.3 and 7.4 ... a respectable
development staff for even a large corporation.

Point taken, though, SQL Server has done a better job in opitimizing for
"dumb" queries.   This is something that PostgreSQL needs to work on, as is
self-referential updates for large tables, which also tend to be really slow.
Mind you, in SQL Server 7 I used to be able to crash the server with a big
self-referential update, so this is a common database problem.

Unfortunately, these days only Tom and Neil seem to be seriously working on
the query planner (beg pardon in advance if I've missed someone) so I think
the real answer is that we need another person interested in this kind of
optimization before it's going to get much better.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: PostgreSQL and Linux 2.6 kernel.

From
markw@osdl.org
Date:
On  4 Apr, Cott Lang wrote:
> On Sun, 2004-04-04 at 01:56, Gary Doades wrote:
>> Unfortunately I don't understand the question!
>>
>> My background is the primarily Win32. The last time I used a *nix OS
>> was about 20 years ago apart from occasional dips into the linux OS
>> over the past few years. If you can tell be how to find out what you want
>> I will gladly give you the information.
>
> There are two available io schedulers in 2.6 (new feature), deadline and
> anticipatory. It should show be listed in the boot messages:
>
> dmesg | grep scheduler
>
> I've seen people arguing for each of the two schedulers, saying one is
> better than the other for databases. I'm curious which one you're
> using.  :)

Our database tests (TPC fair use implementations) show that the deadline
scheduler has an edge on the anticipatory scheduler.  Depending on the
current state of the AS scheduler, it can be within a few percent to 10%
or so.

I have some data with one of our tests here:
    http://developer.osdl.org/markw/fs/dbt2_project_results.html

Mark

Re: PostgreSQL and Linux 2.6 kernel.

From
"Gary Doades"
Date:
On 5 Apr 2004 at 8:36, Josh Berkus wrote:

>
> Point taken, though, SQL Server has done a better job in opitimizing for
> "dumb" queries.   This is something that PostgreSQL needs to work on, as is
> self-referential updates for large tables, which also tend to be really slow.
> Mind you, in SQL Server 7 I used to be able to crash the server with a big
> self-referential update, so this is a common database problem.
>

I agree about the "dumb" queries (I'm not mine are *that* dumb :) )

When you can write SQL that looks right, feels right, gives the right
answers during testing and SQLServer runs them really fast, you stop
there and tend not to tinker with the SQL further.

You *can* (I certainly do) achieve comparable performance with
PostgreSQL, but you just have to work harder for it. Now that I have
learned the characteristics of both servers I can write SQL that is pretty
good on both. I suspect that there are people who evaluate PostgreSQL
by executing their favorite SQLSever queries against it, see that it is
slower and never bother to go further.

Cheers,
Gary.


Re: PostgreSQL and Linux 2.6 kernel.

From
Neil Conway
Date:
On Mon, 2004-04-05 at 11:36, Josh Berkus wrote:
> Unfortunately, these days only Tom and Neil seem to be seriously working on
> the query planner (beg pardon in advance if I've missed someone)

Actually, Tom is the only person actively working on the planner --
while I hope to contribute to it in the future, I haven't done so yet.

-Neil



Re: PostgreSQL and Linux 2.6 kernel.

From
"Simon Riggs"
Date:
> Josh Berkus wrote:
> Unfortunately, these days only Tom and Neil seem to be
> seriously working on
> the query planner (beg pardon in advance if I've missed
> someone) so I think
> the real answer is that we need another person interested in
> this kind of
> optimization before it's going to get much better.
>

Hmmmm. Interesting line of thought.

Is the problem "a person interested" or is there another issue there?

I was thinking the other day that maybe removing the ability to control
join order through explicitly manipulating the FROM clause might
actually be counter productive, in terms of longer term improvement of
the optimizer.

Treating the optimizer as a black box is something I'm very used to from
other RDBMS. My question is, how can you explicitly re-write a query now
to "improve" it? If there's no way of manipulating queries without
actually re-writing the optimizer, we're now in a position where we
aren't able to diagnose when the optimizer isn't working effectively.

For my mind, all the people on this list are potential "optimizer
developers" in the sense that we can all look at queries and see whether
there is a problem with particular join plans. Providing good cases of
poor optimization is just what's needed to assist those few that do
understand the internals to continue improving things.

I guess what I'm saying is it's not how many people you've got working
on the optimizer, its how many accurate field reports of less-than
perfect optimization reach them. In that case, PostgreSQL is likely in a
better position than Microsoft, since the accessibility of the pg
discussion lists makes such cases much more likely to get aired.

Any thoughts?

Best Regards, Simon Riggs


Re: PostgreSQL and Linux 2.6 kernel.

From
Bruno Wolff III
Date:
On Wed, Apr 14, 2004 at 21:12:18 +0100,
  Simon Riggs <simon@2ndquadrant.com> wrote:
>
> I guess what I'm saying is it's not how many people you've got working
> on the optimizer, its how many accurate field reports of less-than
> perfect optimization reach them. In that case, PostgreSQL is likely in a
> better position than Microsoft, since the accessibility of the pg
> discussion lists makes such cases much more likely to get aired.
>
> Any thoughts?

I have seen exactly this happen a number of times over the last several
years. However there is still only one Tom Lane implementing the
improvements.

Re: PostgreSQL and Linux 2.6 kernel.

From
"Simon Riggs"
Date:
> Bruno Wolff
>   Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > I guess what I'm saying is it's not how many people you've
> got working
> > on the optimizer, its how many accurate field reports of less-than
> > perfect optimization reach them. In that case, PostgreSQL
> is likely in a
> > better position than Microsoft, since the accessibility of the pg
> > discussion lists makes such cases much more likely to get aired.
> >
> > Any thoughts?
>
> I have seen exactly this happen a number of times over the
> last several
> years. However there is still only one Tom Lane implementing the
> improvements.
>

...and very few Mr.Microsofts too.

[I'm uncomfortable with, and it was not my intent, to discuss such an
issue with direct reference to particular individuals. There is no
intent to critiscise or malign anybody named]

Regards, Simon


Re: PostgreSQL and Linux 2.6 kernel.

From
Josh Berkus
Date:
Simon,

> Is the problem "a person interested" or is there another issue there?

IMHO, it's "a person interested".

> Treating the optimizer as a black box is something I'm very used to from
> other RDBMS. My question is, how can you explicitly re-write a query now
> to "improve" it? If there's no way of manipulating queries without
> actually re-writing the optimizer, we're now in a position where we
> aren't able to diagnose when the optimizer isn't working effectively.

Well, there is ... all of the various query cost parameters.

> For my mind, all the people on this list are potential "optimizer
> developers" in the sense that we can all look at queries and see whether
> there is a problem with particular join plans. Providing good cases of
> poor optimization is just what's needed to assist those few that do
> understand the internals to continue improving things.

... which is what this list is for.

But, ultimately, improvements on the planner are still bottlenecked by having
only one developer actually hacking the changes.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: PostgreSQL and Linux 2.6 kernel.

From
Greg Stark
Date:
Bruno Wolff III <bruno@wolff.to> writes:

> I have seen exactly this happen a number of times over the last several
> years. However there is still only one Tom Lane implementing the
> improvements.

Ob: Well clearly the problem is we need more Tom Lanes.

--
greg

Re: PostgreSQL and Linux 2.6 kernel.

From
Geoffrey
Date:
Greg Stark wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
>
>
>>I have seen exactly this happen a number of times over the last several
>>years. However there is still only one Tom Lane implementing the
>>improvements.
>
>
> Ob: Well clearly the problem is we need more Tom Lanes.
>

my $pgGuru = "Tom Lane"; my @morepgGurus; my $howmany = 10;

while($howmany--) { push @morepgGurus, $pgGuru; }

--
Until later, Geoffrey                     Registered Linux User #108567
Building secure systems in spite of Microsoft

Re: PostgreSQL and Linux 2.6 kernel.

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Ob: Well clearly the problem is we need more Tom Lanes.

ObHHGReference: "Haven't you heard?  I come in six-packs!"

            regards, tom lane

Re: PostgreSQL and Linux 2.6 kernel.

From
"Joshua D. Drake"
Date:
>
> my $pgGuru = "Tom Lane"; my @morepgGurus; my $howmany = 10;
>
> while($howmany--) { push @morepgGurus, $pgGuru; }
>
This is just wrong...

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Re: PostgreSQL and Linux 2.6 kernel.

From
Greg Stark
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:

> > while($howmany--) { push @morepgGurus, $pgGuru; }
>
> This is just wrong...

yeah, it would have been much clearer written as:
 push @morepgGurus, ($pgGuru)x$howmany;

Or at least the perlish:
 for (1..$howmany)
instead of C style while syntax.

Ok. I stop now.

--
greg

Re: PostgreSQL and Linux 2.6 kernel.

From
"Simon Riggs"
Date:
>Josh Berkus
> > Treating the optimizer as a black box is something I'm very
> used to from
> > other RDBMS. My question is, how can you explicitly
> re-write a query now
> > to "improve" it? If there's no way of manipulating queries without
> > actually re-writing the optimizer, we're now in a position where we
> > aren't able to diagnose when the optimizer isn't working
> effectively.
>
> Well, there is ... all of the various query cost parameters.

They are very blunt instruments for such a delicate task.

Surely someone of your experience might have benefit from something
more?

My feeling is, I would, though I want those tools as *a developer*
rather than for tuning specific queries for people, which is always so
sensitive to upgrades etc.

> But, ultimately, improvements on the planner are still
> bottlenecked by having
> only one developer actually hacking the changes.
>

Do we have a clear list of optimizations we'd like to be working on?

The TODO items aren't very related to specific optimizations...

The only ones I was aware of was deferred subselect evaluation for
DBT-3.



...sounds like there's more to discuss here, so I'll duck out now and
get back to my current project...

Best Regards, Simon Riggs