[pgpool-general: 1805] Re: insert_lock hangs
Tatsuo Ishii
ishii at postgresql.org
Thu May 30 14:40:19 JST 2013
Thanks for the report. I found a nasty bug with pgpool-II. Please try
attached patches.
When insert lock is enabled and pgpool_catalog.insert_lock exists,
pgpool-II looks for the row which matches the oid of the target
table. If non, pgpool-II will insert the row to obtain a row lock.
The bug was in the process looking for the row. If extended protocol
is used like your perl script, pgpool-II was waiting for the row data
forever which will never come because there's no such a row in the
table. Attached patch considers the case.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> Hi!
>
> Sorry, it took some time to further isolate the problem, but now I have
> a testcase!
>
> Nozomi Anzai wrote:
>> Could you tell us some hints to reproduce this?
>
> pgpool with two nodes,
> pgpool running on node 0, node 1 offline because of the current problem.
>
> pgpool-II version 3.2.4 under Debian 6.0.6
> postgresql 9.1.9 under Debian 6.0.6
>
>> - pgpool.conf
>> - The mode do you use (replication mode, master slave mode with SR, etc.)
>> - Using load balancing or not
>
> replication_mode = true
> load_balance_mode = true
> replicate_select = false
> white_function_list = ''
> black_function_list = 'nextval,setval'
> insert_lock = true
> lobj_lock_table = ''
>
> enable_pool_hba = true
> master_slave_mode = false
> parallel_mode = false
>
> The table pgpool_catalog.insert_lock exists in all our databases
> (including template1), created by insert_lock.sql.
>
>> - Steps you executed
>
> As a test scenario, I created the following two tables:
>
> test=# \d kd_test_serial_pgpool
> Table "public.kd_test_serial_pgpool"
> Column | Type | Modifiers
> ---------+---------+--------------------------------------------------------------------
> id | integer | not null default
> . . nextval('kd_test_serial_pgpool_id_seq'::regclass)
> content | text |
>
> test=# \d kd_test_nonserial_pgpool
> Table "public.kd_test_nonserial_pgpool"
> Column | Type | Modifiers
> ---------+---------+-----------
> id | integer |
> content | text |
>
> Inserting stuff into kd_test_serial_pgpool should (and does) grab a lock
> on pgpool_catalog.insert_lock, while doing so for
> kd_test_nonserial_pgpool should (and does) not.
>
> The inserting script is written in perl using prepared queries as
> follows (line wrapping caused by MUA):
>
>
> #!/usr/bin/perl
> use strict;
> use DBI;
> my $dbh;
> my $sth;
> $dbh = DBI->connect('DBI:Pg:dbname=XXX;host=XXX','XXX','XXX');
> if ($dbh)
> {
> $sth = $dbh->prepare("insert into kd_test_serial_pgpool(content)
> values (?)"); # hangs
> # $sth = $dbh->prepare("insert into kd_test_nonserial_pgpool(content)
> values (?)"); # does not hang
> $sth->execute('perl lkhzfjktfjktf');
> $sth->finish;
> }
> $dbh->disconnect();
>
>
> Using the "insert into kd_test_serial_pgpool" line will hang
> indefinitely (I let it hang for about 20 days!) on calling
> $sth->execute, while "insert into kd_test_nonserial_pgpool" won't.
> (For log file content see below, as posted previously.)
>
>
> I made sure that no actual lock was set on insert_lock before running
> the test by checking the following query (and killing threads via select
> pg_terminate_backend($PID); as needed):
>
> select
> l.pid,a.datname,a.usename,t.relname,l.locktype,l.mode,l.granted,a.waiting,a.query_start,a.backend_start,a.current_query
> from pg_locks l, pg_stat_all_tables t,pg_stat_activity a where
> l.relation=t.relid and l.pid=a.procpid order by datname,relname,granted
> desc,query_start;
>
>
> I tested the perl script on several systems, with equal behaviour:
> - Debian 4.0 (32 bit), perl 5.8.8-7etch6, libdbi-perl 1.53-1etch1
> - Debian 6.0.7 (32 bit), perl 5.10.1-17squeeze6, libdbi-perl 1.612-1
> - Kubuntu 12.10 (64 bit), perl 5.14.2-13ubuntu0.2, libdbi-perl 1.622-1
>
>
> Log excerpt as posted previously (different tables, but same symptom):
>>> pgpool.log:
>>> > 2013-05-06 14:22:02.235 CEST [13003] [unknown] 00000 LOG: duration: 0.035 ms statement: BEGIN
>>> > 2013-05-06 14:22:02.235 CEST [13003] [unknown] 00000 LOG: duration: 0.070 ms parse <unnamed>:
>>> > DELETE FROM sessions
>>> > WHERE session_id = $1
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 LOG: duration: 0.088 ms bind <unnamed>:
>>> > DELETE FROM sessions
>>> > WHERE session_id = $1
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 DETAIL: parameters: $1 = '1eae4e7eb3f0adf2e2bc3f50f1446530f'
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 LOG: duration: 0.063 ms execute <unnamed>:
>>> > DELETE FROM sessions
>>> > WHERE session_id = $1
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 DETAIL: parameters: $1 = '1eae4e7eb3f0adf2e2bc3f50f1446530f'
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 LOG: duration: 0.024 ms statement: COMMIT
>>> > 2013-05-06 14:22:02.238 CEST [13003] [unknown] 00000 LOG: duration: 0.426 ms parse pgpool12980: SELECT count(*) FROM pg_catalog.pg_attrdef AS d, pg_catalog.pg_class AS c WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND c.oid = pgpool_regclass('sessions')
>>> > 2013-05-06 14:22:02.238 CEST [13003] [unknown] 00000 LOG: duration: 0.054 ms bind pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_attrdef AS d, pg_catalog.pg_class AS c WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND c.oid = pgpool_regclass('sessions')
>>> > 2013-05-06 14:22:02.239 CEST [13003] [unknown] 00000 LOG: duration: 0.993 ms execute pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_attrdef AS d, pg_catalog.pg_class AS c WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND c.oid = pgpool_regclass('sessions')
>>> > 2013-05-06 14:22:02.240 CEST [13003] [unknown] 00000 LOG: duration: 0.757 ms parse pgpool12980: SELECT attname, d.adsrc, coalesce((d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%') AND (a.atttypid = 'timestamp'::regtype::oid OR a.atttypid = 'timestamp with time zone'::regtype::oid OR a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR a.atttypid = 'time with time zone'::regtype::oid) , false) FROM pg_catalog.pg_class c, pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_attrdef d ON (a.attrelid = d.adrelid AND a.attnum = d.adnum) WHERE c.oid = a.attrelid AND a.attnum >= 1 AND a.attisdropped = 'f' AND c.oid = pgpool_regclass('sessions') ORDER BY a.attnum
>>> > 2013-05-06 14:22:02.240 CEST [13003] [unknown] 00000 LOG: duration: 0.087 ms bind pgpool12980/pgpool12980: SELECT attname, d.adsrc, coalesce((d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%') AND (a.atttypid = 'timestamp'::regtype::oid OR a.atttypid = 'timestamp with time zone'::regtype::oid OR a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR a.atttypid = 'time with time zone'::regtype::oid) , false) FROM pg_catalog.pg_class c, pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_attrdef d ON (a.attrelid = d.adrelid AND a.attnum = d.adnum) WHERE c.oid = a.attrelid AND a.attnum >= 1 AND a.attisdropped = 'f' AND c.oid = pgpool_regclass('sessions') ORDER BY a.attnum
>>> > 2013-05-06 14:22:02.241 CEST [13003] [unknown] 00000 LOG: duration: 0.787 ms execute pgpool12980/pgpool12980: SELECT attname, d.adsrc, coalesce((d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%') AND (a.atttypid = 'timestamp'::regtype::oid OR a.atttypid = 'timestamp with time zone'::regtype::oid OR a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR a.atttypid = 'time with time zone'::regtype::oid) , false) FROM pg_catalog.pg_class c, pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_attrdef d ON (a.attrelid = d.adrelid AND a.attnum = d.adnum) WHERE c.oid = a.attrelid AND a.attnum >= 1 AND a.attisdropped = 'f' AND c.oid = pgpool_regclass('sessions') ORDER BY a.attnum
>>> > 2013-05-06 14:22:02.241 CEST [13003] [unknown] 00000 LOG: duration: 0.026 ms statement: BEGIN
>>> > 2013-05-06 14:22:02.263 CEST [13003] [unknown] 00000 LOG: duration: 21.331 ms parse pgpool12980: SELECT count(*) FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid) WHERE nspname = 'pgpool_catalog' AND relname = 'insert_lock'
>>> > 2013-05-06 14:22:02.263 CEST [13003] [unknown] 00000 LOG: duration: 0.067 ms bind pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid) WHERE nspname = 'pgpool_catalog' AND relname = 'insert_lock'
>>> > 2013-05-06 14:22:02.263 CEST [13003] [unknown] 00000 LOG: duration: 0.038 ms execute pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid) WHERE nspname = 'pgpool_catalog' AND relname = 'insert_lock'
>>> > 2013-05-06 14:22:02.264 CEST [13003] [unknown] 00000 LOG: duration: 0.231 ms parse pgpool12980: SELECT 1 FROM pgpool_catalog.insert_lock WHERE reloid = pgpool_regclass('sessions') FOR UPDATE
>>> > 2013-05-06 14:22:02.264 CEST [13003] [unknown] 00000 LOG: duration: 0.044 ms bind pgpool12980/pgpool12980: SELECT 1 FROM pgpool_catalog.insert_lock WHERE reloid = pgpool_regclass('sessions') FOR UPDATE
>>> > 2013-05-06 14:22:02.267 CEST [13003] [unknown] 00000 LOG: duration: 3.819 ms execute pgpool12980/pgpool12980: SELECT 1 FROM pgpool_catalog.insert_lock WHERE reloid = pgpool_regclass('sessions') FOR UPDATE
>>> (… then nothing happens in this thread anymore.)
>>>
>>>
>>> The original hanging query isn't logged by pgpool, not even when turning
>>> on pgpool's extensive debug output, but running OTRS against postgresql
>>> directly (without pgpool) shows this as the offending query:
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 LOG: duration: 0.068 ms parse <unnamed>: INSERT INTO sessions (session_id, data_key, data_value, serialized) VALUES ($1,$2,$3,$4)
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 LOG: duration: 0.045 ms bind <unnamed>: INSERT INTO sessions (session_id, data_key, data_value, serialized) VALUES ($1,$2,$3,$4)
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 DETAIL: parameters: $1 = '11a98dce334f5dd318594a76edf08819b', $2 = 'LastScreenOverview', $3 = 'Action=AgentTicketQueue;QueueID=9;View=', $4 = '0'
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 LOG: duration: 0.106 ms execute <unnamed>: INSERT INTO sessions (session_id, data_key, data_value, serialized) VALUES ($1,$2,$3,$4)
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 DETAIL: parameters: $1 = '11a98dce334f5dd318594a76edf08819b', $2 = 'LastScreenOverview', $3 = 'Action=AgentTicketQueue;QueueID=9;View=', $4 = '0'
>>>
>>> The context of this query suggests that no other query is using/blocking
>>> pgpool's insert_lock table at this moment, especially not for
>>> minutes/hours, hence I'm quite at a loss here.
>
>
> Karsten
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
-------------- next part --------------
A non-text attachment was scrubbed...
Name: insert_lock_hangs.diff
Type: text/x-patch
Size: 617 bytes
Desc: not available
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20130530/249d961a/attachment.bin>
More information about the pgpool-general
mailing list