Thread: how postgresql request the computer resources

how postgresql request the computer resources

From
Sidar López Cruz
Date:
Is there something that tells postgres to take the resorces from computer
(RAM, HDD, SWAP on linux) as it need, not modifying variables on
postgresql.conf and other operating system things?

A days ago i am trying to show that postgres is better than mssql but when
execute a simple query like:

(1)
select count(*) from
(
    select archivo from fotos
        except
    select archivo from archivos
) x;
Aggregate  (cost=182162.83..182162.84 rows=1 width=0) (actual
time=133974.495..133974.498 rows=1 loops=1)
  ->  Subquery Scan x  (cost=173857.98..181830.63 rows=132878 width=0)
(actual time=109148.158..133335.279 rows=169672 loops=1)
        ->  SetOp Except  (cost=173857.98..180501.86 rows=132878 width=58)
(actual time=109148.144..132094.382 rows=169672 loops=1)
              ->  Sort  (cost=173857.98..177179.92 rows=1328775 width=58)
(actual time=109147.656..113870.975 rows=1328775 loops=1)
                    Sort Key: archivo
                    ->  Append  (cost=0.00..38710.50 rows=1328775 width=58)
(actual time=27.062..29891.075 rows=1328775 loops=1)
                          ->  Subquery Scan "*SELECT* 1"
(cost=0.00..17515.62 rows=523431 width=58) (actual time=27.052..9560.719
rows=523431 loops=1)
                                ->  Seq Scan on fotos  (cost=0.00..12281.31
rows=523431 width=58) (actual time=27.038..5390.238 rows=523431 loops=1)
                          ->  Subquery Scan "*SELECT* 2"
(cost=0.00..21194.88 rows=805344 width=58) (actual time=10.803..12117.788
rows=805344 loops=1)
                                ->  Seq Scan on archivos
(cost=0.00..13141.44 rows=805344 width=58) (actual time=10.784..5420.164
rows=805344 loops=1)
Total runtime: 134552.325 ms


(2)
select count(*) from fotos where archivo not in (select archivo from
archivos)
Aggregate  (cost=29398.98..29398.99 rows=1 width=0) (actual
time=26660.565..26660.569 rows=1 loops=1)
  ->  Seq Scan on fotos  (cost=15154.80..28744.69 rows=261716 width=0)
(actual time=13930.060..25859.340 rows=169799 loops=1)
        Filter: (NOT (hashed subplan))
        SubPlan
          ->  Seq Scan on archivos  (cost=0.00..13141.44 rows=805344
width=58) (actual time=0.319..5647.043 rows=805344 loops=1)
Total runtime: 26747.236 ms


(3)
select count(1) from fotos f where not exists (select a.archivo from
archivos a where a.archivo=f.archivo)
Aggregate  (cost=1761354.08..1761354.09 rows=1 width=0) (actual
time=89765.384..89765.387 rows=1 loops=1)
  ->  Seq Scan on fotos f  (cost=0.00..1760699.79 rows=261716 width=0)
(actual time=75.556..88880.234 rows=169799 loops=1)
        Filter: (NOT (subplan))
        SubPlan
          ->  Index Scan using archivos_archivo_idx on archivos a
(cost=0.00..13451.40 rows=4027 width=58) (actual time=0.147..0.147 rows=1
loops=523431)
                Index Cond: ((archivo)::text = ($0)::text)
Total runtime: 89765.714 ms



(4)
SELECT count(*)
FROM fotos f
LEFT JOIN archivos a USING(archivo)
WHERE a.archivo IS NULL
Aggregate  (cost=31798758.40..31798758.41 rows=1 width=0) (actual
time=114267.337..114267.341 rows=1 loops=1)
  ->  Merge Left Join  (cost=154143.73..31772412.02 rows=10538550 width=0)
(actual time=85472.696..113392.399 rows=169799 loops=1)
        Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
        Filter: ("inner".archivo IS NULL)
        ->  Sort  (cost=62001.08..63309.66 rows=523431 width=58) (actual
time=38018.343..39998.201 rows=523431 loops=1)
              Sort Key: (f.archivo)::text
              ->  Seq Scan on fotos f  (cost=0.00..12281.31 rows=523431
width=58) (actual time=0.158..4904.410 rows=523431 loops=1)
        ->  Sort  (cost=92142.65..94156.01 rows=805344 width=58) (actual
time=47453.790..50811.216 rows=805701 loops=1)
              Sort Key: (a.archivo)::text
              ->  Seq Scan on archivos a  (cost=0.00..13141.44 rows=805344
width=58) (actual time=0.206..7160.148 rows=805344 loops=1)
Total runtime: 114893.116 ms




WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS....


PLEASE HELP ME

_________________________________________________________________
Consigue aquí las mejores y mas recientes ofertas de trabajo en América
Latina y USA: http://latam.msn.com/empleos/


Re: how postgresql request the computer resources

From
Richard Huxton
Date:
Sidar López Cruz wrote:
> Is there something that tells postgres to take the resorces from
> computer (RAM, HDD, SWAP on linux) as it need, not modifying variables
> on postgresql.conf and other operating system things?

Ah, and how is it to know what to share with other processes?

> A days ago i am trying to show that postgres is better than mssql but
> when execute a simple query like:
>
> (1)
> select count(*) from
> Total runtime: 134552.325 ms
>
> (2)
> select count(*) from fotos where archivo not in (select archivo from
> Total runtime: 26747.236 ms
>
> (3)
> select count(1) from fotos f where not exists (select a.archivo from
> Total runtime: 89765.714 ms
>
> (4)
> SELECT count(*)
> Total runtime: 114893.116 ms

> WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS....

In which case they make a bad choice for showing PostgreSQL is faster
than MSSQL. Is this the only query you have, or are others giving you
problems too?

I think count(*) is about the weakest point in PG, but I don't think
there'll be a general solution available soon. As I'm sure someone has
mentioned, whatever else, PG needs to check the row for its visibility
information.

 From the start of your email, you seem to suspect your configuration
needs some work. Once you are happy that your settings in general are
good, you can override some by issuing set statements before your query.
For example:
    SET work_mem = 10000;
might well improve example #2 where you had a hash.

--
   Richard Huxton
   Archonet Ltd


Re: how postgresql request the computer resources

From
Michael Best
Date:
Richard Huxton wrote:
>> WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS....
>
>
> In which case they make a bad choice for showing PostgreSQL is faster
> than MSSQL. Is this the only query you have, or are others giving you
> problems too?
>
> I think count(*) is about the weakest point in PG, but I don't think
> there'll be a general solution available soon. As I'm sure someone has
> mentioned, whatever else, PG needs to check the row for its visibility
> information.
>
>  From the start of your email, you seem to suspect your configuration
> needs some work. Once you are happy that your settings in general are
> good, you can override some by issuing set statements before your query.
> For example:
>     SET work_mem = 10000;
> might well improve example #2 where you had a hash.
>
> --
>   Richard Huxton
>   Archonet Ltd

Someone had suggested keeping a vector table with +1 and -1 for row
insertion and deletion and then running a cron to sum the vectors and
update a table so that you could select from that table to get the row
count.  Perhaps some sort of SUM() on a column function.

Since this seems like a reasonable approach (or perhaps there may be yet
another better mechanism), cannot someone add this sort of functionality
to Postgresql to do behind the scenes?

-Mike

Re: how postgresql request the computer resources

From
"Jim C. Nasby"
Date:
On Thu, Oct 27, 2005 at 03:58:55PM -0600, Michael Best wrote:
> Richard Huxton wrote:
> >>WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS....
> >
> >
> >In which case they make a bad choice for showing PostgreSQL is faster
> >than MSSQL. Is this the only query you have, or are others giving you
> >problems too?
> >
> >I think count(*) is about the weakest point in PG, but I don't think
> >there'll be a general solution available soon. As I'm sure someone has
> >mentioned, whatever else, PG needs to check the row for its visibility
> >information.
> >
> > From the start of your email, you seem to suspect your configuration
> >needs some work. Once you are happy that your settings in general are
> >good, you can override some by issuing set statements before your query.
> >For example:
> >    SET work_mem = 10000;
> >might well improve example #2 where you had a hash.
> >
> >--
> >  Richard Huxton
> >  Archonet Ltd
>
> Someone had suggested keeping a vector table with +1 and -1 for row
> insertion and deletion and then running a cron to sum the vectors and
> update a table so that you could select from that table to get the row
> count.  Perhaps some sort of SUM() on a column function.
>
> Since this seems like a reasonable approach (or perhaps there may be yet
> another better mechanism), cannot someone add this sort of functionality
> to Postgresql to do behind the scenes?

There's all kinds of things that could be added; the issue is
ascertaining what the performance trade-offs are (there's no such thing
as a free lunch) and if the additional code complexity is worth it.

Note that your suggestion probably wouldn't work in this case because
the user isn't doing a simple SELECT count(*) FROM table;. I'd bet that
MSSQL is using index covering to answer his queries so quickly,
something that currently just isn't possible with PostgreSQL. But if you
search the -hackers archives, you'll find a discussion on adding limited
heap tuple visibility information to indexes. That would allow for
partial index covering in many cases, which would probably be a huge win
for the queries the user was asking about.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461