<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hello Tatsuo,<br>

    <br>

    Did the attached log provide any insight?<br>

    <br>

    Thanks,<br>

    Sean<br>

    <div class="moz-forward-container"><br>

      <br>

      -------- Original Message --------

      <table class="moz-email-headers-table" cellpadding="0"

        cellspacing="0" border="0">

        <tbody>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:

            </th>

            <td>Re: [pgpool-general: 2639] Re: pcp_recovery_node failing

              in stage 1</td>

          </tr>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date: </th>

            <td>Fri, 21 Mar 2014 10:59:21 -0230</td>

          </tr>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">From: </th>

            <td>Sean Hogan <a class="moz-txt-link-rfc2396E" href="mailto:sean@compusult.net"><sean@compusult.net></a></td>

          </tr>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">To: </th>

            <td>Tatsuo Ishii <a class="moz-txt-link-rfc2396E" href="mailto:ishii@postgresql.org"><ishii@postgresql.org></a></td>

          </tr>

          <tr>

            <th nowrap="nowrap" valign="BASELINE" align="RIGHT">CC: </th>

            <td><a class="moz-txt-link-abbreviated" href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a></td>

          </tr>

        </tbody>

      </table>

      <br>

      <br>

      <pre>I agree, it makes no sense.  The strace is attached.

Sean

On 14-03-21 10:02 AM, Tatsuo Ishii wrote:

> Ridiculous. There's no code in pgpool which sends signal 2 to recovery

> command. Is it possible to start pgpool from strace and do the

> recovery so that we could find who sends the signal?

>

> strace -f pgpool start

>

> Best regards,

> --

> Tatsuo Ishii

> SRA OSS, Inc. Japan

> English: <a class="moz-txt-link-freetext" href="http://www.sraoss.co.jp/index_en.php">http://www.sraoss.co.jp/index_en.php</a>

> Japanese: <a class="moz-txt-link-freetext" href="http://www.sraoss.co.jp">http://www.sraoss.co.jp</a>

>

>> The stage 1 script is not careful with exit codes, so it continues

>> after the failed rsync and eventually exits with success.  This tricks

>> pgpool into continuing with stage 2, but it's definitely the state 1

>> command that is failing.

>>

>> Sean

>>

>> On 14-03-21 06:20 AM, Tatsuo Ishii wrote:

>>>> Sorry, the subject line should have said stage *1*.

>>> Really? From what I read from pgpool log:

>>>

>>> 2014-03-20 12:42:43 LOG:   pid 18259: 1st stage is done

>>> 2014-03-20 12:42:43 LOG:   pid 18259: starting 2nd stage

>>> 2014-03-20 12:42:47 LOG:   pid 18259: CHECKPOINT in the 2nd stage done

>>> 2014-03-20 12:42:47 LOG: pid 18259: starting recovery command: "SELECT

>>> pgpool_recovery('pgpool_recovery_pitr.sh', 'psql02.compusult.net',

>>> '/var/lib/pgsql/9.2/data')"

>>> 2014-03-20 12:42:49 LOG: pid 18259: check_postmaster_started: try to

>>> connect to postmaster on hostname:psql02.compusult.net

>>> database:postgres user:postgres (retry 0 times)

>>>

>>> I saw "1st stage is done" and I guess the first stage has been

>>> succeeded but the second stage failed. What does the second stage look

>>> like?

>>>

>>> Best regards,

>>> --

>>> Tatsuo Ishii

>>> SRA OSS, Inc. Japan

>>> English: <a class="moz-txt-link-freetext" href="http://www.sraoss.co.jp/index_en.php">http://www.sraoss.co.jp/index_en.php</a>

>>> Japanese: <a class="moz-txt-link-freetext" href="http://www.sraoss.co.jp">http://www.sraoss.co.jp</a>

>>>

>>>> On 14-03-20 12:48 PM, Sean Hogan wrote:

>>>>> Hi,

>>>>>

>>>>> In my setup at the moment I have a pair of version 3.3.2 pgpool

>>>>> instances with two backend PostgreSQL 9.2.4 servers, all running on

>>>>> CentOS 6.4.  The PostgreSQL data directories are quite large - 144GB.

>>>>> I have run into a situation where pcp_recovery_node consistently fails

>>>>> with a BackendError.

>>>>>

>>>>> The stage 1 recovery command is a script called do-base-backup.sh that

>>>>> runs an rsync as follows:

>>>>>

>>>>>       rsync -Cacvv --delete \

>>>>>               --exclude postmaster.pid --exclude postmaster.opts \

>>>>>               --exclude recovery.done \

>>>>>               --exclude pg_log/\* --exclude pg_xlog/\* \

>>>>>               $SOURCE/ $DESTINATION/ 2>&1 |

>>>>>       mailx -s "rsync verbose output" <a class="moz-txt-link-abbreviated" href="mailto:sean@compusult.net">sean@compusult.net</a>

>>>>>

>>>>> For some reason this rsync is failing after some minutes (typically 10

>>>>> to 12) with undocumented exit code 255.  The verbose rsync logging

>>>>> says this:

>>>>>

>>>>>       Killed by signal 2.

>>>>>       rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]:

>>>>>       Broken pipe (32)

>>>>>       rsync: connection unexpectedly closed (50735 bytes received so far)

>>>>>       [sender]

>>>>>       rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]

>>>>>

>>>>> Googling has not brought up anything helpful other than bugs with

>>>>> large files in older versions of rsync.  I'm fairly certain that is

>>>>> not the case here, especially because of the "Killed by signal 2",

>>>>> which is suggestive of some sort of timeout on the pgpool end.

>>>>>

>>>>> The specific command line I'm using to recover the second database

>>>>> node is:

>>>>>

>>>>>       sudo -u postgres /usr/local/bin/pcp_recovery_node 10000 psql01 9898

>>>>>       postgres XXXXXX 1

>>>>>

>>>>> With such a large timeout value I shouldn't be hitting a timeout

>>>>> there.

>>>>>

>>>>> The weird thing, which makes me point the finger at either pgpool or

>>>>> pcp_recovery_node, is that if I run do-base-backup.sh manually it

>>>>> works fine (and takes much much longer, as expected).

>>>>>

>>>>> Does pgpool have some internal limit on how long it will wait for the

>>>>> 1st stage command to run?  I've attached the log file but it isn't

>>>>> very informative.  (Note that the do-base-backup.sh script isn't

>>>>> communicating the rsync failure back to pgpool, so pgpool goes ahead

>>>>> and runs stage 2.  Of course, that fails because not everything has

>>>>> been synced.)

>>>>>

>>>>> Thanks,

>>>>> Sean

>>>>>

>>>>>

>>>>> _______________________________________________

>>>>> pgpool-general mailing list

>>>>> <a class="moz-txt-link-abbreviated" href="mailto:pgpool-general@pgpool.net">pgpool-general@pgpool.net</a>

>>>>> <a class="moz-txt-link-freetext" href="http://www.pgpool.net/mailman/listinfo/pgpool-general">http://www.pgpool.net/mailman/listinfo/pgpool-general</a>

</pre>

      <br>

    </div>

    <br>

  </body>

</html>