Thread: PRIMARY KEY Indexes.
pgsqlPostgres automatically creates a unique index to assure data integrity (see CREATE INDEX statement). The online docs at http://www.postgresql.org/idocs/index.php?sql-createtable.html state the following: " Postgres automatically creates a unique index to assure data integrity (see CREATE INDEX statement). " ... when PRIMARY KEY is specified within a CREATE TABLE statement. My questions: 1) Are these indexes at all visible to a tool like PGAdmin? How are they named? 2) If FORIGN KEY constraints were present on tables, I assume a lookup is done at time of INSERT to ensure the referenced value can be found. Would such a lookup and the index used be seen in the EXPLAIN or EXPLAIN VERBOSE output of an INSERT? For the record: OS is RedHat Linux 2.2.19 #8 SMP Fri Sep 21 10:04:24 CDT Postgres is PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.96 -admin@postgresql.org <pgsql-admin@postgresql.org>
On Mon, 28 Jan 2002, Marc Mitchell wrote: > pgsqlPostgres automatically creates a unique index to assure data integrity > (see CREATE INDEX statement). > > The online docs at > http://www.postgresql.org/idocs/index.php?sql-createtable.html state the > following: > > " Postgres automatically creates a unique index to assure data integrity > (see CREATE INDEX statement). " ... when PRIMARY KEY is specified within a > CREATE TABLE statement. > > My questions: > > 1) Are these indexes at all visible to a tool like PGAdmin? How are they > named? The name is <table>_pkey. I'd assume they are visible assuming PGAdmin doesn't hide them and just show the primary key constraint. > 2) If FORIGN KEY constraints were present on tables, I assume a lookup is > done at time of INSERT to ensure the referenced value can be found. Would > such a lookup and the index used be seen in the EXPLAIN or EXPLAIN VERBOSE > output of an INSERT? It should generally use the index. There are cases due to the saved plan where the best plan might change during a session and we won't notice that until the next session. The explain output won't show the fk lookups as far as I know, however.
I'm confused because after implementing much performance tuning advice I've found that postgresql is no faster than when configured with the factory defaults. The particular query I used for testing exemplifies the type of query I need to run often. All of my queries rely heavily on doing INSERT INTO. So is there some special behavior with insert's, where they are flushed to disk one by one? If I simply increase checkpoint_segments to 50 or so would this cause inserts to occur only in memory and be flushed to disk at a later time? As far as I can tell, there is no performance gain in turning fsync=false. Am I way off by having so many wal_buffers? And do many of them not even get used since checkpoint_segments is only set to 6? If I can figure all this out, I think I'll make a website dedicated to postgres performance tuning. Thanks very much, Peter T. Brown Postgres 7.1.3 is all running on Redhat Linux (2.4.x), dual p3 1G, with a large RAID array. The database I'm using has ~10 tables with most having over 1.5 million rows. Sample SQL: INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27 AND "Tidbit"."Value" LIKE 'asd1834%' CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM "VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer" WHERE "CohortGroupID" = 51; INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946; Part of My postgresql.conf: max_connections = 100 # 1-1024 sort_mem = 32168 shared_buffers = 65536 # min 16 fsync = true wal_buffers = 100 # min 4 wal_files = 10 # range 0-64 wal_sync_method = fsync # fsync or fdatasync or open_sync or open_datasync # Note: default wal_sync_method varies across platforms wal_debug = 0 # range 0-16 commit_delay = 100 # range 0-100000 commit_siblings = 5 # range 1-1000 checkpoint_segments = 6 # in logfile segments (16MB each), min 1 checkpoint_timeout = 300 # in seconds, range 30-3600
Check this http://candle.pha.pa.us/main/writings/pgsql/hw_performance/index.html I would think that you would see *some* increase in performance by turning fsync off.... I see huge increases in performance of INSERT/UPDATE with fsync off. -Mitch ----- Original Message ----- From: "Peter T. Brown" <peter@memeticsystems.com> To: <pgsql-admin@postgresql.org> Sent: Monday, January 28, 2002 11:42 AM Subject: [ADMIN] performance tuning on inserts > I'm confused because after implementing much performance tuning advice I've > found that postgresql is no faster than when configured with the factory > defaults. The particular query I used for testing exemplifies the type of > query I need to run often. > > All of my queries rely heavily on doing INSERT INTO. So is there some > special behavior with insert's, where they are flushed to disk one by one? > If I simply increase checkpoint_segments to 50 or so would this cause > inserts to occur only in memory and be flushed to disk at a later time? As > far as I can tell, there is no performance gain in turning fsync=false. Am I > way off by having so many wal_buffers? And do many of them not even get used > since checkpoint_segments is only set to 6? > > If I can figure all this out, I think I'll make a website dedicated to > postgres performance tuning. > > Thanks very much, > Peter T. Brown > > > Postgres 7.1.3 is all running on Redhat Linux (2.4.x), dual p3 1G, with a > large RAID array. The database I'm using has ~10 tables with most having > over 1.5 million rows. > > Sample SQL: > INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT > 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27 > AND "Tidbit"."Value" LIKE 'asd1834%' > CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM > "VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer" > WHERE "CohortGroupID" = 51; > INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946; > > Part of My postgresql.conf: > max_connections = 100 # 1-1024 > sort_mem = 32168 > shared_buffers = 65536 # min 16 > fsync = true > wal_buffers = 100 # min 4 > wal_files = 10 # range 0-64 > wal_sync_method = fsync # fsync or fdatasync or open_sync or open_datasync > # Note: default wal_sync_method varies across platforms > wal_debug = 0 # range 0-16 > commit_delay = 100 # range 0-100000 > commit_siblings = 5 # range 1-1000 > checkpoint_segments = 6 # in logfile segments (16MB each), min 1 > checkpoint_timeout = 300 # in seconds, range 30-3600 > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
"Peter T. Brown" <peter@memeticsystems.com> writes: > All of my queries rely heavily on doing INSERT INTO. So is there some > special behavior with insert's, where they are flushed to disk one by one? > If I simply increase checkpoint_segments to 50 or so would this cause > inserts to occur only in memory and be flushed to disk at a later > time? Increasing checkpoint_segments is a good idea if you do lots of bulky inserts. Basically you don't want checkpoints happening every few seconds; at most one every couple minutes would be my recommendation. If checkpoint_segments is too small then you're forcing frequent checkpoints. Whether 6 is enough is hard to tell from the data you've given. You could look at the file timestamps in pg_xlog to try to estimate how often a new segment is started. Note that there's some interaction here: reducing the frequency of checkpoints will actually reduce the volume of WAL traffic. > Sample SQL: > INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT > 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27 > AND "Tidbit"."Value" LIKE 'asd1834%' > CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM > "VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer" > WHERE "CohortGroupID" = 51; > INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946; Seems like a little work on improving your SQL wouldn't hurt either. Couldn't the above mess be reduced to a single command? Viz INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT DISTINCT 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27 AND "Tidbit"."Value" LIKE 'asd1834%' All that inserting of rows you're only going to delete a moment later is costing you. regards, tom lane
Regarding the SQL: The query I showed there is built dynamically from a library of queries chosen by the application user (using a web gui). For now, I don't have any way to intelligently condense the often complex series of operations into a single 'smart' query. That being said, I still don't understand why doing all those inserts should take so long since the entire table should be in memory... I am pretty sure I've allowed enough shared_buffers. Regarding timestamps in pg_xlog: as I understand things, if wal_buffers and checkpoint_segments are high enough the files in pg_xlog should never be used, right? Thanks Again, Peter T. Brown -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, January 28, 2002 12:02 PM To: Peter T. Brown Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] performance tuning on inserts "Peter T. Brown" <peter@memeticsystems.com> writes: > All of my queries rely heavily on doing INSERT INTO. So is there some > special behavior with insert's, where they are flushed to disk one by one? > If I simply increase checkpoint_segments to 50 or so would this cause > inserts to occur only in memory and be flushed to disk at a later > time? Increasing checkpoint_segments is a good idea if you do lots of bulky inserts. Basically you don't want checkpoints happening every few seconds; at most one every couple minutes would be my recommendation. If checkpoint_segments is too small then you're forcing frequent checkpoints. Whether 6 is enough is hard to tell from the data you've given. You could look at the file timestamps in pg_xlog to try to estimate how often a new segment is started. Note that there's some interaction here: reducing the frequency of checkpoints will actually reduce the volume of WAL traffic. > Sample SQL: > INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT > 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27 > AND "Tidbit"."Value" LIKE 'asd1834%' > CREATE TEMP TABLE temp5946 AS SELECT DISTINCT ON ("VisitorID") * FROM > "VisitorPointer" WHERE "CohortGroupID" = 51; DELETE FROM "VisitorPointer" > WHERE "CohortGroupID" = 51; > INSERT INTO "VisitorPointer" SELECT * FROM temp5946; DROP TABLE temp5946; Seems like a little work on improving your SQL wouldn't hurt either. Couldn't the above mess be reduced to a single command? Viz INSERT INTO "VisitorPointer" ("CohortGroupID","VisitorID") SELECT DISTINCT 51,"Tidbit"."VisitorID" FROM "Tidbit" WHERE "Tidbit"."CustomFieldID" = 27 AND "Tidbit"."Value" LIKE 'asd1834%' All that inserting of rows you're only going to delete a moment later is costing you. regards, tom lane
"Peter T. Brown" <peter@memeticsystems.com> writes: > Regarding the SQL: The query I showed there is built dynamically from a > library of queries chosen by the application user (using a web gui). For > now, I don't have any way to intelligently condense the often complex series > of operations into a single 'smart' query. Nonetheless, I wonder whether you couldn't reconsider the structure. The fragment you showed seemed to be of two minds about whether VisitorPointer is a permanent data structure or a suitable place for temporary row insertions... > That being said, I still don't understand why doing all those inserts should > take so long since the entire table should be in memory... I am pretty sure > I've allowed enough shared_buffers. You still need WAL traffic. Or don't you care whether those inserts will survive a system crash? In-RAM buffers surely don't count as stable storage. > Regarding timestamps in pg_xlog: as I understand things, if wal_buffers and > checkpoint_segments are high enough the files in pg_xlog should never be > used, right? No, you haven't understood things at all. WAL *will* be written whenever you make changes, at the latest when the transaction is committed. regards, tom lane