Thread: PRIMARY KEY Indexes.

PRIMARY KEY Indexes.

From

"Marc Mitchell"

Date:

28 January 2002, 11:43:57

pgsqlPostgres automatically creates a unique index to assure data integrity
(see CREATE INDEX statement).

The online docs at
http://www.postgresql.org/idocs/index.php?sql-createtable.html state the
following:

" Postgres automatically creates a unique index to assure data integrity
(see CREATE INDEX statement). "  ... when PRIMARY KEY is specified within a
CREATE TABLE statement.

My questions:

1) Are these indexes at all visible to a tool like PGAdmin?  How are they
named?

2) If FORIGN KEY constraints were present on tables, I assume a lookup is
done at time of INSERT to ensure the referenced value can be found.  Would
such a lookup and the index used be seen in the EXPLAIN or EXPLAIN VERBOSE
output of an INSERT?

For the record:

OS is RedHat Linux 2.2.19 #8 SMP Fri Sep 21 10:04:24 CDT
Postgres is  PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.96






-admin@postgresql.org <pgsql-admin@postgresql.org>

Re: PRIMARY KEY Indexes.

From

Stephan Szabo

Date:

28 January 2002, 12:04:29

On Mon, 28 Jan 2002, Marc Mitchell wrote:

> pgsqlPostgres automatically creates a unique index to assure data integrity
> (see CREATE INDEX statement).
>
> The online docs at
> http://www.postgresql.org/idocs/index.php?sql-createtable.html state the
> following:
>
> " Postgres automatically creates a unique index to assure data integrity
> (see CREATE INDEX statement). "  ... when PRIMARY KEY is specified within a
> CREATE TABLE statement.
>
> My questions:
>
> 1) Are these indexes at all visible to a tool like PGAdmin?  How are they
> named?

The name is <table>_pkey.  I'd assume they are visible assuming PGAdmin
doesn't hide them and just show the primary key constraint.

> 2) If FORIGN KEY constraints were present on tables, I assume a lookup is
> done at time of INSERT to ensure the referenced value can be found.  Would
> such a lookup and the index used be seen in the EXPLAIN or EXPLAIN VERBOSE
> output of an INSERT?

It should generally use the index.  There are cases due to the saved plan
where the best plan might change during a session and we won't notice that
until the next session.  The explain output won't show the fk lookups as
far as I know, however.

performance tuning on inserts

From

"Peter T. Brown"

Date:

28 January 2002, 13:44:17

I'm confused because after implementing much performance tuning advice I've
found that postgresql is no faster than when configured with the factory
defaults. The particular query I used for testing exemplifies the type of
query I need to run often.

All of my queries rely heavily on doing INSERT INTO. So is there some
special behavior with insert's, where they are flushed to disk one by one?
If I simply increase checkpoint_segments to 50 or so would this cause
inserts to occur only in memory and be flushed to disk at a later time? As
far as I can tell, there is no performance gain in turning fsync=false. Am I
way off by having so many wal_buffers? And do many of them not even get used
since checkpoint_segments is only set to 6?

If I can figure all this out, I think I'll make a website dedicated to
postgres performance tuning.

Thanks very much,
Peter T. Brown


Postgres 7.1.3 is all running on Redhat Linux (2.4.x), dual p3 1G, with a
large RAID array. The database I'm using has ~10 tables with most having
over 1.5 million rows.

Sample SQL:
INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT
51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27
AND "Tidbit"."Value" LIKE 'asd1834%'
CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM
"VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer"
WHERE "CohortGroupID" = 51;
INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946;

Part of My postgresql.conf:
max_connections = 100 # 1-1024
sort_mem = 32168
shared_buffers = 65536 # min 16
fsync = true
wal_buffers = 100 # min 4
wal_files = 10 # range 0-64
wal_sync_method = fsync # fsync or fdatasync or open_sync or open_datasync
# Note: default wal_sync_method varies across platforms
wal_debug = 0 # range 0-16
commit_delay = 100 # range 0-100000
commit_siblings = 5 # range 1-1000
checkpoint_segments = 6 # in logfile segments (16MB each), min 1
checkpoint_timeout = 300 # in seconds, range 30-3600

Re: performance tuning on inserts

From

"Mitch Vincent"

Date:

28 January 2002, 14:44:38

Check this
http://candle.pha.pa.us/main/writings/pgsql/hw_performance/index.html

I would think that you would see *some* increase in performance by turning
fsync off.... I see huge increases in performance of INSERT/UPDATE with
fsync off.

-Mitch

----- Original Message -----
From: "Peter T. Brown" <peter@memeticsystems.com>
To: <pgsql-admin@postgresql.org>
Sent: Monday, January 28, 2002 11:42 AM
Subject: [ADMIN] performance tuning on inserts


> I'm confused because after implementing much performance tuning advice
I've
> found that postgresql is no faster than when configured with the factory
> defaults. The particular query I used for testing exemplifies the type of
> query I need to run often.
>
> All of my queries rely heavily on doing INSERT INTO. So is there some
> special behavior with insert's, where they are flushed to disk one by one?
> If I simply increase checkpoint_segments to 50 or so would this cause
> inserts to occur only in memory and be flushed to disk at a later time? As
> far as I can tell, there is no performance gain in turning fsync=false. Am
I
> way off by having so many wal_buffers? And do many of them not even get
used
> since checkpoint_segments is only set to 6?
>
> If I can figure all this out, I think I'll make a website dedicated to
> postgres performance tuning.
>
> Thanks very much,
> Peter T. Brown
>
>
> Postgres 7.1.3 is all running on Redhat Linux (2.4.x), dual p3 1G, with a
> large RAID array. The database I'm using has ~10 tables with most having
> over 1.5 million rows.
>
> Sample SQL:
> INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT
> 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27
> AND "Tidbit"."Value" LIKE 'asd1834%'
> CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM
> "VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer"
> WHERE "CohortGroupID" = 51;
> INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946;
>
> Part of My postgresql.conf:
> max_connections = 100 # 1-1024
> sort_mem = 32168
> shared_buffers = 65536 # min 16
> fsync = true
> wal_buffers = 100 # min 4
> wal_files = 10 # range 0-64
> wal_sync_method = fsync # fsync or fdatasync or open_sync or open_datasync
> # Note: default wal_sync_method varies across platforms
> wal_debug = 0 # range 0-16
> commit_delay = 100 # range 0-100000
> commit_siblings = 5 # range 1-1000
> checkpoint_segments = 6 # in logfile segments (16MB each), min 1
> checkpoint_timeout = 300 # in seconds, range 30-3600
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Re: performance tuning on inserts

From

Tom Lane

Date:

28 January 2002, 15:05:19

"Peter T. Brown" <peter@memeticsystems.com> writes:
> All of my queries rely heavily on doing INSERT INTO. So is there some
> special behavior with insert's, where they are flushed to disk one by one?
> If I simply increase checkpoint_segments to 50 or so would this cause
> inserts to occur only in memory and be flushed to disk at a later
> time?

Increasing checkpoint_segments is a good idea if you do lots of bulky
inserts.  Basically you don't want checkpoints happening every few
seconds; at most one every couple minutes would be my recommendation.
If checkpoint_segments is too small then you're forcing frequent
checkpoints.

Whether 6 is enough is hard to tell from the data you've given.  You
could look at the file timestamps in pg_xlog to try to estimate how
often a new segment is started.  Note that there's some interaction
here: reducing the frequency of checkpoints will actually reduce the
volume of WAL traffic.

> Sample SQL:
> INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT
> 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27
> AND "Tidbit"."Value" LIKE 'asd1834%'
> CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM
> "VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer"
> WHERE "CohortGroupID" = 51;
> INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946;

Seems like a little work on improving your SQL wouldn't hurt either.
Couldn't the above mess be reduced to a single command?  Viz

INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID")
  SELECT DISTINCT 51,"Tidbit"."VisitorID" FROM "Tidbit"
    WHERE "Tidbit"."CustomFieldID" = 27 AND "Tidbit"."Value" LIKE 'asd1834%'

All that inserting of rows you're only going to delete a moment later is
costing you.

            regards, tom lane

Re: performance tuning on inserts

From

"Peter T. Brown"

Date:

28 January 2002, 15:34:39

Regarding the SQL: The query I showed there is built dynamically from a
library of queries chosen by the application user (using a web gui). For
now, I don't have any way to intelligently condense the often complex series
of operations into a single 'smart' query.

That being said, I still don't understand why doing all those inserts should
take so long since the entire table should be in memory... I am pretty sure
I've allowed enough shared_buffers.

Regarding timestamps in pg_xlog: as I understand things, if wal_buffers and
checkpoint_segments are high enough the files in pg_xlog should never be
used, right?

Thanks Again,

Peter T. Brown

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, January 28, 2002 12:02 PM
To: Peter T. Brown
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] performance tuning on inserts

"Peter T. Brown" <peter@memeticsystems.com> writes:
> All of my queries rely heavily on doing INSERT INTO. So is there some
> special behavior with insert's, where they are flushed to disk one by one?
> If I simply increase checkpoint_segments to 50 or so would this cause
> inserts to occur only in memory and be flushed to disk at a later
> time?

Increasing checkpoint_segments is a good idea if you do lots of bulky
inserts.  Basically you don't want checkpoints happening every few
seconds; at most one every couple minutes would be my recommendation.
If checkpoint_segments is too small then you're forcing frequent
checkpoints.

Whether 6 is enough is hard to tell from the data you've given.  You
could look at the file timestamps in pg_xlog to try to estimate how
often a new segment is started.  Note that there's some interaction
here: reducing the frequency of checkpoints will actually reduce the
volume of WAL traffic.

> Sample SQL:
> INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT
> 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27
> AND "Tidbit"."Value" LIKE 'asd1834%'
> CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM
> "VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer"
> WHERE "CohortGroupID" = 51;
> INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946;

Seems like a little work on improving your SQL wouldn't hurt either.
Couldn't the above mess be reduced to a single command?  Viz

INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID")
  SELECT DISTINCT 51,"Tidbit"."VisitorID" FROM "Tidbit"
    WHERE "Tidbit"."CustomFieldID" = 27 AND "Tidbit"."Value" LIKE 'asd1834%'

All that inserting of rows you're only going to delete a moment later is
costing you.

            regards, tom lane

Re: performance tuning on inserts

From

Tom Lane

Date:

28 January 2002, 15:45:34

"Peter T. Brown" <peter@memeticsystems.com> writes:
> Regarding the SQL: The query I showed there is built dynamically from a
> library of queries chosen by the application user (using a web gui). For
> now, I don't have any way to intelligently condense the often complex series
> of operations into a single 'smart' query.

Nonetheless, I wonder whether you couldn't reconsider the structure.
The fragment you showed seemed to be of two minds about whether
VisitorPointer is a permanent data structure or a suitable place for
temporary row insertions...

> That being said, I still don't understand why doing all those inserts should
> take so long since the entire table should be in memory... I am pretty sure
> I've allowed enough shared_buffers.

You still need WAL traffic.  Or don't you care whether those inserts
will survive a system crash?  In-RAM buffers surely don't count as
stable storage.

> Regarding timestamps in pg_xlog: as I understand things, if wal_buffers and
> checkpoint_segments are high enough the files in pg_xlog should never be
> used, right?

No, you haven't understood things at all.  WAL *will* be written
whenever you make changes, at the latest when the transaction is
committed.

            regards, tom lane