Thread: SELECT INTO large FKyed table is slow

SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
The database for monitoring certain drone statuses is quite simple:

CREATE TABLE samples (
    sample_id integer not null primary key,
    sample_timestamp timestamp not null default now()
);

CREATE TABLE drones (
    drone_id integer not null primary key,
    drone_log_notice character varying,
    crone_coordinates point not null,
    drone_temperature float,
    drone_pressure float
);

CREATE TABLE drones_history (
    drone_id integer not null,
    sample_id integer not null,
    drone_log_notice character varying,
    drone_temperature float,
    drone_pressure float,
    constraint drones_history_pk primary key (drone_id, sample_id),
    constraint drones_history_fk__samples foreign key (sample_id)
references samples(sample_id),
    constraint drones_history_fk__drones foreign key (drone_id) references
drones(drone_id)
);

Every ten to twenty minutes I receive CSV file with most of the drones
statuses. CSV file includes data for new drones, if they're put into
use. When I receive new data I load whole CSV file to a database, then
call stored procedure that 'deals' with that data.

So far I have around 6000 samples, around 160k drones and drones_history
is around 25M rows.

The CSV file contains around 15k-20k of 'rows', mostly data about old
drones. Every now and then (on every 5th - 10th CSV-insert) there is
data with around 1000-5000 new drones.

Here is what I do in stored procedure, after i COPYed the data from the
CSV to temporary.drones table:

First, I create temporary table, inside the procedure, that holds rows
for the new drones:

CREATE TEMPORARY TABLE tmpNew ON COMMIT DROP AS
SELECT drone_id, log_notice, coord_x, coord_y, temp, press
FROM temp.drones WHERE NOT EXISTS (SELECT 1 FROM public.drones WHERE
public.drones.drone_id = temporary.drone.drone_id);

This is done in miliseconds, even if the count for the new drones is
large (i've tested it with 10k new drones although I real-life action
I'd never get more thatn 5k new drones per CSV).

INSERT INTO public.drones (drone_id, drone_log_notice, coordinates,
drone_temperature, drone_temperature)
SELECT drone_id, log_notice, point(coord_x, coord_y) as coordinates,
temp, press FROM tmpNew;
INSERT INTO public.drones_history (sample_id, drone_id,
drone_log_notice, drone_temperature, drone_pressure)
SELECT a_sample_id, drone_id, log_notice, temp, pressue FROM tmpNew;

This is also done in miliseconds.

Now, I 'update' data for the existing drones, and fill in the history
table on those drones. First I create temporary table with just the
changed rows:

CREATE TEMPORARY TABLE tmpUpdate ON COMMIT DROP AS
SELECT a_batch_id, t.drone_id, t.log_notice, t.temp, t.press
    FROM temporary.drones t
    JOIN public.drones p
    ON t.drone_id = p.drone_id
WHERE p.drone_log_notice != t.log_notice OR p.temp != t.temp OR p.press
!= t.press;

Now, that part is also fast. I usualy have around 100-1000 drones that
changed 'state', but sometimes I get even half of the drones change
states (around 50k) and creation of the tmpUpdate takes no more than ten
to twenty milliseconds.

This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice,
drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes
around 40 seconds. For 5000 rows this takes around 5 minutes.
For 50k rows this takes around 30 minutes! Now this is where I start lag
because I get new CSV every 10 minutes or so.

And the last part is to upadte the actual drones table:
UPDATE public.drones p
SET drone_log_notice = t.log_notice, drone_temperature = t.temp,
drone_pressure = t.press
FROM temporary.drones t
WHERE t.drone_id = p.drone_id
AND (t.log_notice != p.drone_log_notice OR t.temp != p.drone_temperature
OR p.press != t.drone_pressure);

This is also very fast, even when almost half the table is updated the
UPDATE takes around 10 seconds. Usualy it's around 30-50 ms.

The machine I'm doing this has 4 GB of RAM, dual-Xeon something (3GHz).
Two SAS drives in mirror, capable of around 100 MB/s in sequential r/w
(i know it means nothing, but just to get an idea).

Database is around 2 GB is size (pg_database_size). When I dump/recreate
the database I can speedup things a bit, but after half day of
operations the INSERTs are slow again.
When I do dump/restore of the database I get around 40/50 MB/sec
reding/writing from the disk (COPYing data, PK/FK constraints creation),
but when that INSERT gets stuck io-wait goes to skies - iostat shows
that Postgres is mainly reading from the disks, around 800k/sec - 1024k/sec.

I've set shared_buffers to 256M, work_mem to 96M, wal_buffers to 16M and
checkpoint_segments to 16. I've turned off autovaccum, I do
analyze/vacuum after each insert-job is done, after TRUNCATEing
temporary.drones table.

Out of despair I tried to set fsync=off, but that gave me just a small
performance improvement.

When I remove foreign constraints (drones_history_fk__samples and
drones_history_fk__drones) (I leave the primary key on drones_history)
than that INSERT, even for 50k rows, takes no more than a second.

So, my question is - is there anything I can do to make INSERTS with PK
faster? Or, since all the reference checking is done inside the
procedure for loading data, shall I abandon those constraints entirely?

    Mario


Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
> When I remove foreign constraints (drones_history_fk__samples and
> drones_history_fk__drones) (I leave the primary key on drones_history)
> than that INSERT, even for 50k rows, takes no more than a second.
>
> So, my question is - is there anything I can do to make INSERTS with PK
> faster? Or, since all the reference checking is done inside the
> procedure for loading data, shall I abandon those constraints entirely?
>
>     Mario

Maybe... or not. Can you post details about :

- the foreign keys
- the tables that are referred to (including indexes)


CREATE TABLE foo (x INTEGER PRIMARY KEY); INSERT INTO foo SELECT * FROM
generate_series( 1,100000 );
Temps : 766,182 ms
test=> VACUUM ANALYZE foo;
Temps : 71,938 ms
test=> CREATE TABLE bar ( x INTEGER REFERENCES foo(x) );
CREATE TABLE
test=> INSERT INTO bar SELECT * FROM generate_series( 1,100000 );
Temps : 2834,430 ms

As you can see, 100.000 FK checks take less than 3 seconds on this very
simple example. There is probably something that needs fixing.

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 11/28/2010 07:56 PM, Pierre C wrote:
>
>> When I remove foreign constraints (drones_history_fk__samples and
>> drones_history_fk__drones) (I leave the primary key on drones_history)
>> than that INSERT, even for 50k rows, takes no more than a second.
>>
>> So, my question is - is there anything I can do to make INSERTS with
>> PK faster? Or, since all the reference checking is done inside the
>> procedure for loading data, shall I abandon those constraints entirely?
>>
>> Mario
>
> Maybe... or not. Can you post details about :
>
> - the foreign keys
> - the tables that are referred to (including indexes)

I pasted DDL at the begining of my post. The only indexes tables have
are the ones created because of PK constraints. Table drones has around
100k rows. Table drones_history has around 30M rows. I'm not sure what
additional info you'd want but I'll be more than happy to provide more
relevant information.


> CREATE TABLE foo (x INTEGER PRIMARY KEY); I
> generate_series( 1,100000 );
> Temps : 766,182 ms
> test=> VACUUM ANALYZE foo;
> Temps : 71,938 ms
> test=> CREATE TABLE bar ( x INTEGER REFERENCES foo(x) );
> CREATE TABLE
> test=> INSERT INTO bar SELECT * FROM generate_series( 1,100000 );
> Temps : 2834,430 ms
>
> As you can see, 100.000 FK checks take less than 3 seconds on this very
> simple example. There is probably something that needs fixing.


Yes, when the FKyed table is small enough inserts are quite fast. But
when they grow larger the whole system slows down.

I just repeated your test and I'm getting similar results - on my
desktop. I'll try to assemble some code to recreate workload and see if
I'll run into same problems.

    Mario

Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
> I pasted DDL at the begining of my post.

Ah, sorry, didn't see it ;)

> The only indexes tables have are the ones created because of PK
> constraints. Table drones has around 100k rows. Table drones_history has
> around 30M rows. I'm not sure what additional info you'd want but I'll
> be more than happy to provide more relevant information.

Can you post the following :

- pg version
- output of VACCUM ANALYZE VERBOSE for your 2 tables

Re: SELECT INTO large FKyed table is slow

From
Mark Kirkwood
Date:
On 29/11/10 00:46, Mario Splivalo wrote:
>
> This is the slow part:
> INSERT INTO drones_history (sample_id, drone_id, drone_log_notice,
> drone_temperature, drone_pressure)
> SELECT * FROM tmpUpdate;
>
> For 100 rows this takes around 2 seconds. For 1000 rows this takes
> around 40 seconds. For 5000 rows this takes around 5 minutes.
> For 50k rows this takes around 30 minutes! Now this is where I start
> lag because I get new CSV every 10 minutes or so.

Have you created indexes on drones_history(sample_id) and
drones_history(drone_id)? That would probably help speed up your INSERT
quite a bit if you have not done so.

Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
drone_log_notice, drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.

Cheers

Mark

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 11/28/2010 10:50 PM, Pierre C wrote:
>
>> I pasted DDL at the begining of my post.
>
> Ah, sorry, didn't see it ;)
>
>> The only indexes tables have are the ones created because of PK
>> constraints. Table drones has around 100k rows. Table drones_history
>> has around 30M rows. I'm not sure what additional info you'd want but
>> I'll be more than happy to provide more relevant information.
>
> Can you post the following :
>
> - pg version
> - output of VACCUM ANALYZE VERBOSE for your 2 tables

Here it is:

realm_51=# vacuum analyze verbose drones;
INFO:  vacuuming "public.drones"
INFO:  scanned index "drones_pk" to remove 242235 row versions
DETAIL:  CPU 0.02s/0.11u sec elapsed 0.28 sec.
INFO:  "drones": removed 242235 row versions in 1952 pages
DETAIL:  CPU 0.01s/0.02u sec elapsed 0.03 sec.
INFO:  index "drones_pk" now contains 174068 row versions in 721 pages
DETAIL:  107716 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "drones": found 486 removable, 174068 nonremovable row versions
in 1958 out of 1958 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 64 unused item pointers.
0 pages are entirely empty.
CPU 0.22s/0.90u sec elapsed 22.29 sec.
INFO:  vacuuming "pg_toast.pg_toast_2695558"
INFO:  index "pg_toast_2695558_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_2695558": found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.drones"
INFO:  "drones": scanned 1958 of 1958 pages, containing 174068 live rows
and 0 dead rows; 174068 rows in sample, 174068 estimated total rows
VACUUM
realm_51=# vacuum analyze verbose drones_history;
INFO:  vacuuming "public.drones_history"
INFO:  index "drones_history_pk" now contains 25440352 row versions in
69268 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.12u sec elapsed 16.56 sec.
INFO:  "drones_history": found 0 removable, 16903164 nonremovable row
versions in 129866 out of 195180 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 2.00s/1.42u sec elapsed 49.24 sec.
INFO:  vacuuming "pg_toast.pg_toast_2695510"
INFO:  index "pg_toast_2695510_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_2695510": found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.drones_history"
INFO:  "drones_history": scanned 195180 of 195180 pages, containing
25440352 live rows and 0 dead rows; 600000 rows in sample, 25440352
estimated total rows
VACUUM
realm_51=# select version();
                                            version

---------------------------------------------------------------------------------------------
  PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (Debian
4.3.2-1.1) 4.3.2, 32-bit
(1 row)


    Mario

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 11/29/2010 08:11 AM, Mark Kirkwood wrote:
> On 29/11/10 00:46, Mario Splivalo wrote:
>>
>> This is the slow part:
>> INSERT INTO drones_history (sample_id, drone_id, drone_log_notice,
>> drone_temperature, drone_pressure)
>> SELECT * FROM tmpUpdate;
>>
>> For 100 rows this takes around 2 seconds. For 1000 rows this takes
>> around 40 seconds. For 5000 rows this takes around 5 minutes.
>> For 50k rows this takes around 30 minutes! Now this is where I start
>> lag because I get new CSV every 10 minutes or so.
>
> Have you created indexes on drones_history(sample_id) and
> drones_history(drone_id)? That would probably help speed up your INSERT
> quite a bit if you have not done so.

Yes, since (sample_id, drone_id) is primary key, postgres created
composite index on those columns. Are you suggesting I add two more
indexes, one for drone_id and one for sample_id?

> Also it would be worthwhile for you to post the output of:
>
> EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
> drone_log_notice, drone_temperature, drone_pressure)
> SELECT * FROM tmpUpdate;
>
> to the list, so we can see what is taking the time.

Is there a way to do so inside plpgsql function?

I can recreate the whole process within psql and then post the explain
analyze, it would just take me some time to do so. I'll post as soon as
I'm done.

    Mario

Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
> realm_51=# vacuum analyze verbose drones;
> INFO:  vacuuming "public.drones"
> INFO:  scanned index "drones_pk" to remove 242235 row versions
> DETAIL:  CPU 0.02s/0.11u sec elapsed 0.28 sec.
> INFO:  "drones": removed 242235 row versions in 1952 pages
> DETAIL:  CPU 0.01s/0.02u sec elapsed 0.03 sec.
> INFO:  index "drones_pk" now contains 174068 row versions in 721 pages
> DETAIL:  107716 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.

As you can see your index contains 174068 active rows and 242235 dead rows
that probably should have been removed a long time ago by autovacuum, but
you seem to have it turned off. It does not take a long time to vacuum
this table (only 0.3 sec) so it is not a high cost, you should enable
autovacuum and let it do the job (note that this doesn't stop you from
manual vacuuming after big updates).

> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "drones": found 486 removable, 174068 nonremovable row versions
> in 1958 out of 1958 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 64 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.22s/0.90u sec elapsed 22.29 sec.

Here, the table itself seems quite normal... strange.

> INFO:  vacuuming "pg_toast.pg_toast_2695558"
> INFO:  index "pg_toast_2695558_index" now contains 0 row versions in 1
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "pg_toast_2695558": found 0 removable, 0 nonremovable row
> versions in 0 out of 0 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.

Since you don't have large fields, the toast table is empty...

> realm_51=# vacuum analyze verbose drones_history;
> INFO:  vacuuming "public.drones_history"
> INFO:  index "drones_history_pk" now contains 25440352 row versions in
> 69268 pages
> DETAIL:  0 index row versions were removed.

good

> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.38s/0.12u sec elapsed 16.56 sec.
> INFO:  "drones_history": found 0 removable, 16903164 nonremovable row
> versions in 129866 out of 195180 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 2.00s/1.42u sec elapsed 49.24 sec.

good

> INFO:  vacuuming "pg_toast.pg_toast_2695510"
> INFO:  index "pg_toast_2695510_index" now contains 0 row versions in 1
> pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "pg_toast_2695510": found 0 removable, 0 nonremovable row
> versions in 0 out of 0 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.

same as above, no toast


> realm_51=# select version();
>                                             version
> ---------------------------------------------------------------------------------------------
>   PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (Debian
> 4.3.2-1.1) 4.3.2, 32-bit
> (1 row)
>
>
>     Mario

ok

Try this :

CLUSTER drones_pkey ON drones;

Then check if your slow query gets a bit faster. If it does, try :

ALTER TABLE drones SET ( fillfactor = 50 );
ALTER INDEX drones_pkey SET ( fillfactor = 50 );
CLUSTER drones_pkey ON drones; (again)

This will make the updates on this table less problematic. VACUUM it after
each mass update.

Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
> Yes, since (sample_id, drone_id) is primary key, postgres created
> composite index on those columns. Are you suggesting I add two more
> indexes, one for drone_id and one for sample_id?

(sample_id,drone_id) covers sample_id but if you make searches on drone_id
alone it is likely to be very slow since you got a large number of
sample_ids. Postgres can use any column of a multicolumn index but it is
only interesting performance-wise if the cardinality of the first
(ignored) columns is low. If you often make searches on drone_id, create
an index. But this isn't what is slowing your foreign key checks.

>> Also it would be worthwhile for you to post the output of:
>>
>> EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
>> drone_log_notice, drone_temperature, drone_pressure)
>> SELECT * FROM tmpUpdate;
>>
>> to the list, so we can see what is taking the time.
>
> Is there a way to do so inside plpgsql function?
>
> I can recreate the whole process within psql and then post the explain
> analyze, it would just take me some time to do so. I'll post as soon as
> I'm done.

Yes, this would be interesting.

Re: SELECT INTO large FKyed table is slow

From
Mladen Gogala
Date:
I'm just back from vacation, so I apologize in advance if I missed
anything of importance. Here is something to consider:

Instead of using the statement you used to create the table, try the
following:

CREATE TABLE drones_history (
    drone_id integer not null,
    sample_id integer not null,
    drone_log_notice character varying,
    drone_temperature float,
    drone_pressure float,
    constraint drones_history_pk primary key (drone_id, sample_id),
    constraint drones_history_fk__samples foreign key (sample_id)
references samples(sample_id),
    constraint drones_history_fk__drones foreign key (drone_id) references  drones(drone_id) deferrable
);


At the beginning of the load, you should defer all of the deferrable
constraints, setting constraints deferred and  issuing the copy
statement within a transaction block, like this:

    scott=# begin;
    BEGIN
    Time: 0.203 ms
    scott=# set constraints all deferred;
    SET CONSTRAINTS
    Time: 0.201 ms
    scott=# copy test1 from '/tmp/test1.csv';
    COPY 100
    Time: 11.939 ms
    scott=# commit;
    ERROR:  insert or update on table "test1" violates foreign key
    constraint "fk_tst1_deptno"
    DETAIL:  Key (col1)=(1) is not present in table "dept".


Of course, that will require complete rewrite of your load script,
because the errors will be checked at the commit time and transaction
can either fail as a whole or succeed as a whole. It's all or nothing
situation. How frequently do you see records with an incorrect drone_id?
If that happens only once in a blue moon, you may need no stinkin'
foreign keys in the first place, you may be able
to have a batch job that will flag all the records with an invalid
drone_id instead.
Furthermore, you can make sure that you have enough shared buffers to
cache the entire "drones" table. Also, do "strace" on the postgres
process handling your session and see whether the time is spent writing
to WAL archives. If that is slowing you down, you should consider buying
a SSD or a high end disk drive. I have never had such problem, but you
should also check whether pg_loader can do anything for you.

As far as speed is concerned, inserting with deferred foreign keys is
almost as fast as inserting without foreign keys:


scott=# alter table test1 drop constraint fk_tst1_deptno;
ALTER TABLE
Time: 16.219 ms
scott=# copy test1 from '/tmp/test1.csv';
COPY 100
Time: 10.418 ms

If you take a look at the example above, you will see that inserting
with a deferred FK took 11.939 milliseconds while inserting into  the
same table without the FK took 10.418 milliseconds, the difference of
1.5 milliseconds per 100 rows. The timing of 2 seconds per 100
rows looks suspiciously high.  Me thinks that your problem is not just
the foreign key, there must be something else devouring the time. You
should have a test instance, compiled with "-g" option and do profiling.

Mario Splivalo wrote:
> The database for monitoring certain drone statuses is quite simple:
>
> CREATE TABLE samples (
>     sample_id integer not null primary key,
>     sample_timestamp timestamp not null default now()
> );
>
> CREATE TABLE drones (
>     drone_id integer not null primary key,
>     drone_log_notice character varying,
>     crone_coordinates point not null,
>     drone_temperature float,
>     drone_pressure float
> );
>
> CREATE TABLE drones_history (
>     drone_id integer not null,
>     sample_id integer not null,
>     drone_log_notice character varying,
>     drone_temperature float,
>     drone_pressure float,
>     constraint drones_history_pk primary key (drone_id, sample_id),
>     constraint drones_history_fk__samples foreign key (sample_id)
> references samples(sample_id),
>     constraint drones_history_fk__drones foreign key (drone_id) references
> drones(drone_id)
> );
>
> Every ten to twenty minutes I receive CSV file with most of the drones
> statuses. CSV file includes data for new drones, if they're put into
> use. When I receive new data I load whole CSV file to a database, then
> call stored procedure that 'deals' with that data.
>
> So far I have around 6000 samples, around 160k drones and drones_history
> is around 25M rows.
>
> The CSV file contains around 15k-20k of 'rows', mostly data about old
> drones. Every now and then (on every 5th - 10th CSV-insert) there is
> data with around 1000-5000 new drones.
>
> Here is what I do in stored procedure, after i COPYed the data from the
> CSV to temporary.drones table:
>
> First, I create temporary table, inside the procedure, that holds rows
> for the new drones:
>
> CREATE TEMPORARY TABLE tmpNew ON COMMIT DROP AS
> SELECT drone_id, log_notice, coord_x, coord_y, temp, press
> FROM temp.drones WHERE NOT EXISTS (SELECT 1 FROM public.drones WHERE
> public.drones.drone_id = temporary.drone.drone_id);
>
> This is done in miliseconds, even if the count for the new drones is
> large (i've tested it with 10k new drones although I real-life action
> I'd never get more thatn 5k new drones per CSV).
>
> INSERT INTO public.drones (drone_id, drone_log_notice, coordinates,
> drone_temperature, drone_temperature)
> SELECT drone_id, log_notice, point(coord_x, coord_y) as coordinates,
> temp, press FROM tmpNew;
> INSERT INTO public.drones_history (sample_id, drone_id,
> drone_log_notice, drone_temperature, drone_pressure)
> SELECT a_sample_id, drone_id, log_notice, temp, pressue FROM tmpNew;
>
> This is also done in miliseconds.
>
> Now, I 'update' data for the existing drones, and fill in the history
> table on those drones. First I create temporary table with just the
> changed rows:
>
> CREATE TEMPORARY TABLE tmpUpdate ON COMMIT DROP AS
> SELECT a_batch_id, t.drone_id, t.log_notice, t.temp, t.press
>     FROM temporary.drones t
>     JOIN public.drones p
>     ON t.drone_id = p.drone_id
> WHERE p.drone_log_notice != t.log_notice OR p.temp != t.temp OR p.press
> != t.press;
>
> Now, that part is also fast. I usualy have around 100-1000 drones that
> changed 'state', but sometimes I get even half of the drones change
> states (around 50k) and creation of the tmpUpdate takes no more than ten
> to twenty milliseconds.
>
> This is the slow part:
> INSERT INTO drones_history (sample_id, drone_id, drone_log_notice,
> drone_temperature, drone_pressure)
> SELECT * FROM tmpUpdate;
>
> For 100 rows this takes around 2 seconds. For 1000 rows this takes
> around 40 seconds. For 5000 rows this takes around 5 minutes.
> For 50k rows this takes around 30 minutes! Now this is where I start lag
> because I get new CSV every 10 minutes or so.
>
> And the last part is to upadte the actual drones table:
> UPDATE public.drones p
> SET drone_log_notice = t.log_notice, drone_temperature = t.temp,
> drone_pressure = t.press
> FROM temporary.drones t
> WHERE t.drone_id = p.drone_id
> AND (t.log_notice != p.drone_log_notice OR t.temp != p.drone_temperature
> OR p.press != t.drone_pressure);
>
> This is also very fast, even when almost half the table is updated the
> UPDATE takes around 10 seconds. Usualy it's around 30-50 ms.
>
> The machine I'm doing this has 4 GB of RAM, dual-Xeon something (3GHz).
> Two SAS drives in mirror, capable of around 100 MB/s in sequential r/w
> (i know it means nothing, but just to get an idea).
>
> Database is around 2 GB is size (pg_database_size). When I dump/recreate
> the database I can speedup things a bit, but after half day of
> operations the INSERTs are slow again.
> When I do dump/restore of the database I get around 40/50 MB/sec
> reding/writing from the disk (COPYing data, PK/FK constraints creation),
> but when that INSERT gets stuck io-wait goes to skies - iostat shows
> that Postgres is mainly reading from the disks, around 800k/sec - 1024k/sec.
>
> I've set shared_buffers to 256M, work_mem to 96M, wal_buffers to 16M and
> checkpoint_segments to 16. I've turned off autovaccum, I do
> analyze/vacuum after each insert-job is done, after TRUNCATEing
> temporary.drones table.
>
> Out of despair I tried to set fsync=off, but that gave me just a small
> performance improvement.
>
> When I remove foreign constraints (drones_history_fk__samples and
> drones_history_fk__drones) (I leave the primary key on drones_history)
> than that INSERT, even for 50k rows, takes no more than a second.
>
> So, my question is - is there anything I can do to make INSERTS with PK
> faster? Or, since all the reference checking is done inside the
> procedure for loading data, shall I abandon those constraints entirely?
>
>     Mario
>
>
>


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 11/29/2010 05:47 PM, Pierre C wrote:
>> realm_51=# vacuum analyze verbose drones;
>> INFO: vacuuming "public.drones"
>> INFO: scanned index "drones_pk" to remove 242235 row versions
>> DETAIL: CPU 0.02s/0.11u sec elapsed 0.28 sec.
>> INFO: "drones": removed 242235 row versions in 1952 pages
>> DETAIL: CPU 0.01s/0.02u sec elapsed 0.03 sec.
>> INFO: index "drones_pk" now contains 174068 row versions in 721 pages
>> DETAIL: 107716 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>
> As you can see your index contains 174068 active rows and 242235 dead
> rows that probably should have been removed a long time ago by
> autovacuum, but you seem to have it turned off. It does not take a long
> time to vacuum this table (only 0.3 sec) so it is not a high cost, you
> should enable autovacuum and let it do the job (note that this doesn't
> stop you from manual vacuuming after big updates).

Yes, you're right. I was doing some testing and I neglected to enable
vacuuming after inserts. But what this shows is that table drones is
having dead rows, and that table does get updated a lot. However, I
don't have any performance problems here. The UPDATE takes no more than
10 seconds even if I update 50k (out of 150k) rows.

I disabled autovacuum because I got a lot of "WARNING:  pgstat wait
timeout" and I could see the autovacuum job (pg_stat_activity) running
during the run of the plpgsql function that handles inserts.

I left the autovacuum off but I do VACUUM after each CSV insert.

> good
>
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 0.38s/0.12u sec elapsed 16.56 sec.
>> INFO: "drones_history": found 0 removable, 16903164 nonremovable row
>> versions in 129866 out of 195180 pages
>> DETAIL: 0 dead row versions cannot be removed yet.
>> There were 0 unused item pointers.
>> 0 pages are entirely empty.
>> CPU 2.00s/1.42u sec elapsed 49.24 sec.
>
> good
>
>> INFO: vacuuming "pg_toast.pg_toast_2695510"
>> INFO: index "pg_toast_2695510_index" now contains 0 row versions in 1
>> pages
>> DETAIL: 0 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> INFO: "pg_toast_2695510": found 0 removable, 0 nonremovable row
>> versions in 0 out of 0 pages
>> DETAIL: 0 dead row versions cannot be removed yet.
>> There were 0 unused item pointers.
>> 0 pages are entirely empty.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>
> same as above, no toast

Yes. Just to make things clear, I never update/delete drones_history. I
just INSERT, and every now and then I'll be doing SELECTs.

>
>
>> realm_51=# select version();
>> version
>> ---------------------------------------------------------------------------------------------
>>
>> PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (Debian
>> 4.3.2-1.1) 4.3.2, 32-bit
>> (1 row)
>>
>>
>> Mario
>
> ok
>
> Try this :
>
> CLUSTER drones_pkey ON drones;
>
> Then check if your slow query gets a bit faster. If it does, try :
>
> ALTER TABLE drones SET ( fillfactor = 50 );
> ALTER INDEX drones_pkey SET ( fillfactor = 50 );
> CLUSTER drones_pkey ON drones; (again)
>
> This will make the updates on this table less problematic. VACUUM it
> after each mass update.

Is this going to make any difference considering slow insert on
drones_history? Because INSERTs/UPDATEs on drones tables are fast. The
only noticable difference is that drones is 150k rows 'large' and
drones_history has around 25M rows:

realm_51=# select count(*) from drones_history ;
   count
----------
  25550475
(1 row)

    Mario

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 11/29/2010 05:53 PM, Pierre C wrote:
>
>> Yes, since (sample_id, drone_id) is primary key, postgres created
>> composite index on those columns. Are you suggesting I add two more
>> indexes, one for drone_id and one for sample_id?
>
> (sample_id,drone_id) covers sample_id but if you make searches on
> drone_id alone it is likely to be very slow since you got a large number
> of sample_ids. Postgres can use any column of a multicolumn index but it
> is only interesting performance-wise if the cardinality of the first
> (ignored) columns is low. If you often make searches on drone_id, create
> an index. But this isn't what is slowing your foreign key checks.

Again, you have a point there. When I get to SELECTs to the history
table I'll be doing most of the filtering on the drone_id (but also on
sample_id, because I'll seldom drill all the way back in time, I'll be
interested in just some periods), so I'll take this into consideration.

But, as you've said, that's not what it's slowing my FK checks.

>
>>> Also it would be worthwhile for you to post the output of:
>>>
>>> EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
>>> drone_log_notice, drone_temperature, drone_pressure)
>>> SELECT * FROM tmpUpdate;
>>>
>>> to the list, so we can see what is taking the time.
>>
>> Is there a way to do so inside plpgsql function?
>>
>> I can recreate the whole process within psql and then post the explain
>> analyze, it would just take me some time to do so. I'll post as soon
>> as I'm done.
>
> Yes, this would be interesting.

So, I did. I run the whole script in psql, and here is the result for
the INSERT:

realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id,
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM
tmp_drones_history;
                                                       QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
  Seq Scan on tmp_drones_history  (cost=0.00..81.60 rows=4160 width=48)
(actual time=0.008..5.296 rows=5150 loops=1)
  Trigger for constraint drones_history_fk__drones: time=92.948 calls=5150
  Total runtime: 16779.644 ms
(3 rows)


Now, this is only 16 seconds. In this 'batch' I've inserted 5150 rows.
The batch before, I run that one 'the usual way', it inserted 9922 rows,
and it took 1 minute and 16 seconds.

I did not, however, enclose the process into begin/end.

So, here are results when I, in psql, first issued BEGIN:

realm_51=# explain analyze INSERT INTO drones_history (2772, drone_id,
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM
tmp_drones_history;
                                                       QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
  Seq Scan on tmp_drones_history  (cost=0.00..79.56 rows=4056 width=48)
(actual time=0.008..6.490 rows=5059 loops=1)
  Trigger for constraint drones_history_fk__drones: time=120.224 calls=5059
  Total runtime: 39658.250 ms
(3 rows)

Time: 39658.906 ms



    Mario

Re: SELECT INTO large FKyed table is slow

From
Mark Kirkwood
Date:
On 30/11/10 05:53, Pierre C wrote:
>
>> Yes, since (sample_id, drone_id) is primary key, postgres created
>> composite index on those columns. Are you suggesting I add two more
>> indexes, one for drone_id and one for sample_id?
>
> (sample_id,drone_id) covers sample_id but if you make searches on
> drone_id alone it is likely to be very slow since you got a large
> number of sample_ids. Postgres can use any column of a multicolumn
> index but it is only interesting performance-wise if the cardinality
> of the first (ignored) columns is low. If you often make searches on
> drone_id, create an index. But this isn't what is slowing your foreign
> key checks.

Exactly, sorry - I was having a brain fade moment about which way your
foreign key checks were going when I suggested adding those indexes... :-(

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 11/30/2010 05:26 PM, Mladen Gogala wrote:
> At the beginning of the load, you should defer all of the deferrable
> constraints, setting constraints deferred and issuing the copy statement
> within a transaction block, like this:
>
> scott=# begin; BEGIN
> Time: 0.203 ms
> scott=# set constraints all deferred;
> SET CONSTRAINTS
> Time: 0.201 ms
> scott=# copy test1 from '/tmp/test1.csv';
> COPY 100
> Time: 11.939 ms
> scott=# commit;
> ERROR: insert or update on table "test1" violates foreign key
> constraint "fk_tst1_deptno"
> DETAIL: Key (col1)=(1) is not present in table "dept".
>
>
> Of course, that will require complete rewrite of your load script,
> because the errors will be checked at the commit time and transaction
> can either fail as a whole or succeed as a whole. It's all or nothing

Well, it is like that now. First I load the data from the CSV into the
temporary table (just named temporary, exists on the server). That table
is usualy aroun 10k rows. Then I call the function which does the job.

> situation. How frequently do you see records with an incorrect drone_id?

Seldom.

> If that happens only once in a blue moon, you may need no stinkin'
> foreign keys in the first place, you may be able
> to have a batch job that will flag all the records with an invalid
> drone_id instead.

I did have that idea, yes, but still, I'd like to know what is slowing
postgres down. Because when I look at the disk I/O, it seems very random
- i get around 800k of disk reads and ocasionaly 1500k of writes (during
insert into history table).

> Furthermore, you can make sure that you have enough shared buffers to
> cache the entire "drones" table. Also, do "strace" on the postgres
> process handling your session and see whether the time is spent writing
> to WAL archives. If that is slowing you down, you should consider buying
> a SSD or a high end disk drive. I have never had such problem, but you
> should also check whether pg_loader can do anything for you.
>
> As far as speed is concerned, inserting with deferred foreign keys is
> almost as fast as inserting without foreign keys:
>
> scott=# alter table test1 drop constraint fk_tst1_deptno;
> ALTER TABLE
> Time: 16.219 ms
> scott=# copy test1 from '/tmp/test1.csv';
> COPY 100
> Time: 10.418 ms
>
> If you take a look at the example above, you will see that inserting
> with a deferred FK took 11.939 milliseconds while inserting into the
> same table without the FK took 10.418 milliseconds, the difference of
> 1.5 milliseconds per 100 rows. The timing of 2 seconds per 100
> rows looks suspiciously high. Me thinks that your problem is not just
> the foreign key, there must be something else devouring the time. You
> should have a test instance, compiled with "-g" option and do profiling.

I'll have to. So far I've been doing this only on that dedicated server.
I'll try to download the database to my desktop and try the tests there.

Concerning the shared_buffers, it's 256M, and the drones table is just 15M.

I have tried your recommendation and it yielded no difference.

Now I tried removing the constraints from the history table (including
the PK) and the inserts were fast. After few 'rounds' of inserts I added
constraints back, and several round after that were fast again. But then
all the same. Insert of some 11k rows took 4 seconds (with all
constraints) and now the last one of only 4k rows took one minute. I did
vacuum after each insert.


    Mario

Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
> Now I tried removing the constraints from the history table (including
> the PK) and the inserts were fast. After few 'rounds' of inserts I added
> constraints back, and several round after that were fast again. But then
> all the same. Insert of some 11k rows took 4 seconds (with all
> constraints) and now the last one of only 4k rows took one minute. I did
> vacuum after each insert.
>
>
>     Mario

Hm, so for each line of drones_history you insert, you also update the
correspoding drones table to reflect the latest data, right ?
How many times is the same row in "drones" updated ? ie, if you insert N
rows in drones_nistory, how may drone_id's do you have ?

Re: SELECT INTO large FKyed table is slow

From
"Joshua D. Drake"
Date:
On Sun, 2010-11-28 at 12:46 +0100, Mario Splivalo wrote:
> The database for monitoring certain drone statuses is quite simple:
>

> This is the slow part:
> INSERT INTO drones_history (sample_id, drone_id, drone_log_notice,
> drone_temperature, drone_pressure)
> SELECT * FROM tmpUpdate;
>
> For 100 rows this takes around 2 seconds. For 1000 rows this takes
> around 40 seconds. For 5000 rows this takes around 5 minutes.
> For 50k rows this takes around 30 minutes! Now this is where I start lag
> because I get new CSV every 10 minutes or so.

Have you considered making the foreign key check deferrable?

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 12/01/2010 01:51 AM, Pierre C wrote:
>
>> Now I tried removing the constraints from the history table (including
>> the PK) and the inserts were fast. After few 'rounds' of inserts I
>> added constraints back, and several round after that were fast again.
>> But then all the same. Insert of some 11k rows took 4 seconds (with
>> all constraints) and now the last one of only 4k rows took one minute.
>> I did vacuum after each insert.
>>
>>
>>     Mario
>
> Hm, so for each line of drones_history you insert, you also update the
> correspoding drones table to reflect the latest data, right ?

Yes.

> How many times is the same row in "drones" updated ? ie, if you insert N
> rows in drones_nistory, how may drone_id's do you have ?

Just once.

If I have 5000 lines in CSV file (that I load into 'temporary' table
using COPY) i can be sure that drone_id there is PK. That is because CSV
file contains measurements from all the drones, one measurement per
drone. I usualy have around 100 new drones, so I insert those to drones
and to drones_history. Then I first insert into drones_history and then
update those rows in drones. Should I try doing the other way around?

Although, I think I'm having some disk-related problems because when
inserting to the tables my IO troughput is pretty low. For instance,
when I drop constraints and then recreate them that takes around 15-30
seconds (on a 25M rows table) - disk io is steady, around 60 MB/s in
read and write.

It just could be that the ext3 partition is so fragmented. I'll try
later this week on a new set of disks and ext4 filesystem to see how it
goes.

    Mario

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 12/01/2010 02:47 AM, Joshua D. Drake wrote:
> On Sun, 2010-11-28 at 12:46 +0100, Mario Splivalo wrote:
>> The database for monitoring certain drone statuses is quite simple:
>>
>
>> This is the slow part:
>> INSERT INTO drones_history (sample_id, drone_id, drone_log_notice,
>> drone_temperature, drone_pressure)
>> SELECT * FROM tmpUpdate;
>>
>> For 100 rows this takes around 2 seconds. For 1000 rows this takes
>> around 40 seconds. For 5000 rows this takes around 5 minutes.
>> For 50k rows this takes around 30 minutes! Now this is where I start lag
>> because I get new CSV every 10 minutes or so.
>
> Have you considered making the foreign key check deferrable?
>

Yes, as Mladen Gogala had advised. No noticable change in performance -
it's still slow :)

But, just for the sake of clarification - I tought that DEFERRABLE would
matter if I do a lot of INSERTs, inside a FOR loop or something like
that. Since I'm doing INSERT INTO ... SELECT, does it makes any difference?

    Mario

Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
> Just once.

OK, another potential problem eliminated, it gets strange...

> If I have 5000 lines in CSV file (that I load into 'temporary' table
> using COPY) i can be sure that drone_id there is PK. That is because CSV
> file contains measurements from all the drones, one measurement per
> drone. I usualy have around 100 new drones, so I insert those to drones
> and to drones_history. Then I first insert into drones_history and then
> update those rows in drones. Should I try doing the other way around?

No, it doesn't really matter.

> Although, I think I'm having some disk-related problems because when
> inserting to the tables my IO troughput is pretty low. For instance,
> when I drop constraints and then recreate them that takes around 15-30
> seconds (on a 25M rows table) - disk io is steady, around 60 MB/s in
> read and write.
>
> It just could be that the ext3 partition is so fragmented. I'll try
> later this week on a new set of disks and ext4 filesystem to see how it
> goes.

If you CLUSTER a table, it is entirely rebuilt so if your disk free space
isn't heavily fragmented, you can hope the table and indexes will get
allocated in a nice contiguous segment.

Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
> So, I did. I run the whole script in psql, and here is the result for
> the INSERT:
>
> realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id,
> drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM
> tmp_drones_history;
>                                                        QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>   Seq Scan on tmp_drones_history  (cost=0.00..81.60 rows=4160 width=48)
> (actual time=0.008..5.296 rows=5150 loops=1)
>   Trigger for constraint drones_history_fk__drones: time=92.948
> calls=5150
>   Total runtime: 16779.644 ms
> (3 rows)
>
>
> Now, this is only 16 seconds. In this 'batch' I've inserted 5150 rows.
> The batch before, I run that one 'the usual way', it inserted 9922 rows,
> and it took 1 minute and 16 seconds.
>
> I did not, however, enclose the process into begin/end.
>
> So, here are results when I, in psql, first issued BEGIN:
>
> realm_51=# explain analyze INSERT INTO drones_history (2772, drone_id,
> drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM
> tmp_drones_history;
>                                                        QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>   Seq Scan on tmp_drones_history  (cost=0.00..79.56 rows=4056 width=48)
> (actual time=0.008..6.490 rows=5059 loops=1)
>   Trigger for constraint drones_history_fk__drones: time=120.224
> calls=5059
>   Total runtime: 39658.250 ms
> (3 rows)
>
> Time: 39658.906 ms
>
>
>
>     Mario
>

Note that in both cases postgres reports that the FK checks take 92-120
milliseconds... which is a normal time for about 4000 rows.
Inserting 4000 lines with just a few fields like you got should take quite
much less than 1 s...

Where the rest of the time goes, I have no idea. Disk thrashing ? Locks ?
Gremlins ?

- try it on a fresh copy of all your tables (CREATE TABLE, INSERT INTO
SELECT)
- try to put the WAL on a separate physical disk (or do a check with
fsync=off)
- try it on another computer
- try it on another harddisk
- run oprofile on a debug compile of postgres
- it could even be the process title updates (I don't think so but...)
- try a ramdisk tablespace

Re: SELECT INTO large FKyed table is slow

From
Mladen Gogala
Date:
Mario Splivalo wrote: <blockquote cite="mid:4CF5FF0A.9040605@megafon.hr" type="cite"><br /><pre wrap="">
Yes, as Mladen Gogala had advised. No noticable change in performance -
it's still slow :) </pre></blockquote><br /> Declaring constraints as deferrable  doesn't do anything as such, you have
toactually set the constraints deferred to have an effect. You have to do it within a transaction block. If done
outsideof the transaction block, there is no effect:<br /><br /> This is what happens when "set constraints" is issued
outsidethe transaction block:<br /><br /><blockquote>< constraint test1_pk primary key(col1) deferrable);           
<br/> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "test1_pk" for table "test1"<br /> CREATE TABLE<br
/>Time: 41.218 ms<br /> scott=# set constraints all deferred;                           <br /> SET CONSTRAINTS<br />
Time:0.228 ms<br /> scott=# begin;                                      <br /> BEGIN<br /> Time: 0.188 ms<br />
scott=# insert into test1 values(1);               <br /> INSERT 0 1<br /> Time: 0.929 ms<br /> scott=#  insert into
test1values(1);   <br /> ERROR:  duplicate key value violates unique constraint "test1_pk"<br /> DETAIL:  Key
(col1)=(1)already exists.<br /> scott=# end;<br /> ROLLBACK<br /> Time: 0.267 ms<br /> scott=# <br /><br
/></blockquote>It works like a charm when issued within the transaction block:<br /><blockquote>scott=#
begin;                         <br /> BEGIN<br /> Time: 0.202 ms<br /> scott=# set constraints all deferred;   <br />
SETCONSTRAINTS<br /> Time: 0.196 ms<br /> scott=#  insert into test1 values(1);   <br /> INSERT 0 1<br /> Time: 0.334
ms<br/> scott=#  insert into test1 values(1);   <br /> INSERT 0 1<br /> Time: 0.327 ms<br /> scott=# end;<br /> ERROR: 
duplicatekey value violates unique constraint "test1_pk"<br /> DETAIL:  Key (col1)=(1) already exists.<br /> scott=#
<br/></blockquote> I was able to insert the same value twice, it only failed at the end of the transaction.<br
/><blockquotecite="mid:4CF5FF0A.9040605@megafon.hr" type="cite"><pre wrap="">
 
But, just for the sake of clarification - I tought that DEFERRABLE would
matter if I do a lot of INSERTs, inside a FOR loop or something like
that. Since I'm doing INSERT INTO ... SELECT, does it makes any difference? </pre></blockquote> You cannot tell which
parttakes a long time, select or insert, without profiling. I certainly cannot do it over the internet.<br /><br /><pre
class="moz-signature"cols="72">-- 
 
Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
<a class="moz-txt-link-freetext" href="http://www.vmsinfo.com">http://www.vmsinfo.com</a> 
The Leader in Integrated Media Intelligence Solutions


</pre>

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 12/01/2010 05:34 PM, Mladen Gogala wrote:
> Mario Splivalo wrote:
>>
>>
>> Yes, as Mladen Gogala had advised. No noticable change in performance -
>> it's still slow :)
>>
>
> Declaring constraints as deferrable  doesn't do anything as such, you
> have to actually set the constraints deferred to have an effect. You
> have to do it within a transaction block. If done outside of the
> transaction block, there is no effect:

I understand, I did as you suggested.

Begin; Set constraints all deferred; select my_insert_drones_function();
commit


> I was able to insert the same value twice, it only failed at the end of
> the transaction.
>> But, just for the sake of clarification - I tought that DEFERRABLE would
>> matter if I do a lot of INSERTs, inside a FOR loop or something like
>> that. Since I'm doing INSERT INTO ... SELECT, does it makes any difference?
>>
> You cannot tell which part takes a long time, select or insert, without
> profiling. I certainly cannot do it over the internet.

If I first select to a dummy temprary table, that SELECT is fast. Just
INSERT INTO SELECT is slow.

I'll try what Pierre suggested, on whole new filesystem. This one did
get quite filled with thousands of files that I deleted while the
database was working.

    Mario

Re: SELECT INTO large FKyed table is slow

From
Mladen Gogala
Date:
Mario Splivalo wrote:
> I'll try what Pierre suggested, on whole new filesystem. This one did
> get quite filled with thousands of files that I deleted while the
> database was working.
>
>     Mario
>

Yes, that is a good idea. That's the reason why we need a
defragmentation tool on Linux. Unfortunately, the only file system that
currently has a decent defragmentation tool is XFS and that is a paid
option, at least with Red Hat. Greg Smith has recently posted a
wonderful review of PostgreSQL on various file systems:

http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html

There is a operating system which comes with a very decent extent based
file system and a defragmentation tool, included in the OS. The file
system is called "NTFS" and company is in the land of Redmond, WA where
the shadows lie. One OS to rule them all...

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: SELECT INTO large FKyed table is slow

From
Kenneth Marshall
Date:
On Wed, Dec 01, 2010 at 12:15:19PM -0500, Mladen Gogala wrote:
> Mario Splivalo wrote:
>> I'll try what Pierre suggested, on whole new filesystem. This one did get
>> quite filled with thousands of files that I deleted while the database was
>> working.
>>
>>     Mario
>>
>
> Yes, that is a good idea. That's the reason why we need a defragmentation
> tool on Linux. Unfortunately, the only file system that currently has a
> decent defragmentation tool is XFS and that is a paid option, at least with
> Red Hat. Greg Smith has recently posted a wonderful review of PostgreSQL on
> various file systems:
>
> http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html
>
> There is a operating system which comes with a very decent extent based
> file system and a defragmentation tool, included in the OS. The file system
> is called "NTFS" and company is in the land of Redmond, WA where the
> shadows lie. One OS to rule them all...
>
> --
> Mladen Gogala Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com The Leader in Integrated Media Intelligence
> Solutions
>

Redhat6 comes with ext4 which is an extent based filesystem with
decent performance.

Ken

Re: SELECT INTO large FKyed table is slow

From
"Kevin Grittner"
Date:
Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:

> There is a operating system which comes with a very decent extent
> based file system and a defragmentation tool, included in the OS.
> The file system is called "NTFS"

Been there, done that.  Not only was performance quite poor compared
to Linux, but reliability and staff time to manage things suffered
in comparison to Linux.

We had the luxury of identical hardware and the ability to load
balance a web site with millions of hits per day evenly between them
in both environments, as well as off-line saturation load testing.
At least for running a PostgreSQL database, my experience suggests
that the only reasonable excuse for running database on a Windows
server is that you're under a mandate from ill-informed managers to
do so.

-Kevin

Re: SELECT INTO large FKyed table is slow

From
Mladen Gogala
Date:
Kenneth Marshall wrote:
> Redhat6 comes with ext4 which is an extent based filesystem with
> decent performance.
>
> Ken
>
But e4defrag is still not available. And, of course, Red Hat 6 is still
not available, either.  Maybe Red Hat 7 will do the trick? I assume it
will work beautifully with PostgreSQL 15.0.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: SELECT INTO large FKyed table is slow

From
Mladen Gogala
Date:
Kevin Grittner wrote:
> Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
>
>
>
> Been there, done that.  Not only was performance quite poor compared
> to Linux, but reliability and staff time to manage things suffered
> in comparison to Linux.
>
>
I must say that I am quite impressed with Windows 7 servers, especially
64 bit version. Unfortunately, I don't have any PostgreSQL instances on
those, but Exchange works very, very well. Also, personal impressions
from clicking and running office applications are quite good. Don't get
me wrong, I am an old Unix/Linux hack and I would like nothing better
but to see Linux succeed, but I don't like
what I see.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: SELECT INTO large FKyed table is slow

From
"Pierre C"
Date:
On Wed, 01 Dec 2010 18:24:35 +0100, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

> Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
>
>> There is a operating system which comes with a very decent extent
>> based file system and a defragmentation tool, included in the OS.
>> The file system is called "NTFS"
> Been there, done that.  Not only was performance quite poor compared
> to Linux, but reliability and staff time to manage things suffered
> in comparison to Linux.

Please don't start with NTFS. It is the worst excuse for a filesystem I've
ever seen.

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 12/01/2010 09:43 AM, Pierre C wrote:
>
> Note that in both cases postgres reports that the FK checks take 92-120
> milliseconds... which is a normal time for about 4000 rows.
> Inserting 4000 lines with just a few fields like you got should take
> quite much less than 1 s...
>
> Where the rest of the time goes, I have no idea. Disk thrashing ? Locks
> ? Gremlins ?
>
> - try it on a fresh copy of all your tables (CREATE TABLE, INSERT INTO
> SELECT)
> - try to put the WAL on a separate physical disk (or do a check with
> fsync=off)
> - try it on another computer
> - try it on another harddisk
> - run oprofile on a debug compile of postgres
> - it could even be the process title updates (I don't think so but...)
> - try a ramdisk tablespace

I'm allready running it with fsync=off. The funny thing is, as I add new
realm it runs fine until the history table grows around 5M rows. After
that the slowdown is huge.

I'm trying this on new hardware this weekend, I'll post here the results.

    Mario

Re: SELECT INTO large FKyed table is slow

From
Mario Splivalo
Date:
On 12/01/2010 10:43 PM, Pierre C wrote:
> On Wed, 01 Dec 2010 18:24:35 +0100, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>
>> Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:
>>
>>> There is a operating system which comes with a very decent extent
>>> based file system and a defragmentation tool, included in the OS.
>>> The file system is called "NTFS"
>> Been there, done that. Not only was performance quite poor compared
>> to Linux, but reliability and staff time to manage things suffered
>> in comparison to Linux.
>
> Please don't start with NTFS. It is the worst excuse for a filesystem
> I've ever seen.

It is OT, but, could you please shead just some light on that? Part of
my next project is to test performance of pg9 on both windows and linux
systems so I'd appreciate any data/info you both may have.

    Mario

Re: SELECT INTO large FKyed table is slow

From
"Kevin Grittner"
Date:
Mario Splivalo <mario.splivalo@megafon.hr> wrote:

> It is OT, but, could you please shead just some light on that?
> Part of my next project is to test performance of pg9 on both
> windows and linux systems so I'd appreciate any data/info you both
> may have.

I don't know how much was the filesystem, but with both tuned to the
best of our ability Linux on xfs ran much faster than Windows on
NTFS.  The lack of atomic operations and a lockfile utility on
Windows/NTFS was something of a handicap.  I have found Linux to be
much more reliable and (once I got my bash scripting knowledge of
common Linux utilities to a certain level), much easier to
administer.  Getting my head around xargs was, I think, the tipping
point.  ;-)

-Kevin

Re: SELECT INTO large FKyed table is slow

From
Robert Haas
Date:
On Thu, Dec 2, 2010 at 3:36 AM, Mario Splivalo
<mario.splivalo@megafon.hr> wrote:
> On 12/01/2010 09:43 AM, Pierre C wrote:
>>
>> Note that in both cases postgres reports that the FK checks take 92-120
>> milliseconds... which is a normal time for about 4000 rows.
>> Inserting 4000 lines with just a few fields like you got should take
>> quite much less than 1 s...
>>
>> Where the rest of the time goes, I have no idea. Disk thrashing ? Locks
>> ? Gremlins ?
>>
>> - try it on a fresh copy of all your tables (CREATE TABLE, INSERT INTO
>> SELECT)
>> - try to put the WAL on a separate physical disk (or do a check with
>> fsync=off)
>> - try it on another computer
>> - try it on another harddisk
>> - run oprofile on a debug compile of postgres
>> - it could even be the process title updates (I don't think so but...)
>> - try a ramdisk tablespace
>
> I'm allready running it with fsync=off. The funny thing is, as I add new
> realm it runs fine until the history table grows around 5M rows. After that
> the slowdown is huge.

Perhaps - that's the point at which the WAL volume becomes large
enough to force a checkpoint in the middle of the operation?  You
might try turning on log_checkpoints.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company