[pgpool-general: 678] Re: strange load balancing issue in Solaris
Aravinth
aravinth at mafiree.com
Fri Jun 29 18:08:05 JST 2012
Hi Tatsuo,
If I reload the config file I face the issue. Even without reloading I am
face the same issue after a few hours of running pgpool
Regards,
Aravinth
On Fri, Jun 29, 2012 at 2:32 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> It seems the trouble occurs after pgpool receives signal 1 (HUP).
> Did you do pgpool reload?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Guys,
> >
> > I am facing another issue in same solaris.
> >
> > I have initialized 300 pre-forked connections using num_init_childresn in
> > streaming replication mode. Every thing works perfectly for a few hours.
> >
> > After a few hours the connections drop with the below error . Also pgpool
> > doesn't allow any new connections.
> >
> > Any ideas guys.....
> >
> >
> >
> >
> > 2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1645
> > 2012-06-07 08:31:15 DEBUG: pid 927: child 1413 exits with status 1 by
> > signal 1
> > 2012-06-07 08:31:15 DEBUG: pid 1645: I am 1645
> > 2012-06-07 08:31:15 DEBUG: pid 1645:
> > pool_initialize_private_backend_status: initialize backend status
> > 2012-06-07 08:31:15 DEBUG: pid 927: fork a new child pid 1646
> > 2012-06-07 08:31:15 DEBUG: pid 927: child 1410 exits with status 1 by
> > signal 1
> > 2012-06-07 08:31:15 DEBUG: pid 1646: I am 1646
> > 2012-06-07 08:31:15 DEBUG: pid 1646:
> > pool_initialize_private_backend_status: initialize backend status
> > 2012-06-07 08:31:17 ERROR: pid 927: fork() failed. reason: Not enough
> space
> > 2012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07
> > 08:31:1716382012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07
> > 08:31:172012-06-07 08:31:172012-06
> > -07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07 08:31:172012-06-07
> > 08:31:172012-06-07 08:31:17: 2012-06-07 08:31:172012-06-07
> > 08:31:172012-06-07 08:31:172012-06-0
> > 7 08:31:172012-06-07 08:31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07
> > 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07
> 08:31:172012-06-07
> > 08:31:172012-06-07 08
> > :31:172012-06-07 08:31:17 DEBUG: pid 2012-06-07 08:31:172012-06-07
> > 08:31:172012-06-07 08:31:172012-06-07 08:31:1716402012-06-07
> > 08:31:172012-06-07 08:31:17 DEBUG: pid 2
> > 012-06-07 08:31:172012-06-07 08:31:172012-06-07 08:31:172012-06-07
> > 08:31:172012-06-07 08:31:17 DEBUG: pid DEBUG: pid 1637 DEBUG: pid
> DEBUG:
> > pid DEBUG: pid DEBUG: pi
> > d child received shutdown request signal DEBUG: pid DEBUG: pid DEBUG:
> > pid DEBUG: pid DEBUG: pid DEBUG: pid 1641 DEBUG: pid DEBUG: pid
> DEBUG:
> > pid DEBUG: pid DE
> > BUG: pid DEBUG: pid DEBUG: pid 1642 DEBUG: pid DEBUG: pid DEBUG: pid
> > DEBUG: pid : DEBUG: pid DEBUG: pid 1636 DEBUG: pid DEBUG: pid DEBUG:
> > pid DEBUG: pid DEBU
> > G: pid 16341646: 163016351629163115163316431628164516441639:
> > 1632162716251617161516261610: 1614162216181613child received shutdown
> > request signal 16191612: 161116231621
> > 16241620: : child received shutdown request signal : : : :
> > : : : : : : child received shutdown request signal : : : : : : : child
> > received shutdown request signal : : : : 15: : child received shutdown
> > request signal : : : : : child received shutdown request signal child
> > received shutdown request signal 15child received shutdown request signal
> > child received shutdown request signal child received shutdown request
> > signal child received shutdown request signal child received shutdown
> > request signal child received shutdown request signal child received
> > shutdown request signal child received shutdown request signal child
> > received shutdown request signal child received shutdown request signal
> > 15child received shutdown request signal child received shutdown request
> > signal child received shutdown request signal child received shutdown
> > request signal child received shutdown request signal child received
> > shutdown request signal child received shutdown request signal 15child
> > received shutdown request signal child received shutdown request signal
> > child received shutdown request signal child received shutdown request
> > signal
> > child received shutdown request signal child received shutdown request
> > signal 15child received shutdown request signal child received shutdown
> > request signal child received shutdown request signal child received
> > shutdown request signal child received shutdown request signal 1515
> > 15151515151515151515
> > 15151515151515
> > 151515151515
> > 1515151515
> >
> >
> >
> > Regards,
> > Aravinth
> >
> >
> > On Thu, May 10, 2012 at 8:15 AM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> Good. Fix committed in master/V3_1_STABLE/V3_0_STABLE.
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese: http://www.sraoss.co.jp
> >>
> >> > It's working.
> >> >
> >> > Regards,
> >> > Aravinth
> >> >
> >> >
> >> > On Wed, May 9, 2012 at 5:26 PM, Tatsuo Ishii <ishii at postgresql.org>
> >> wrote:
> >> >
> >> >> Thanks for the hint. Attached is a patch trying to fix the
> >> >> problem. Can you please try it?
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese: http://www.sraoss.co.jp
> >> >>
> >> >> > Yes the issue is with random() function.
> >> >> >
> >> >> > Looks like I have solved the problem by using rand.
> >> >> >
> >> >> > Regards,
> >> >> > Aravinth
> >> >> >
> >> >> >
> >> >> > On Wed, May 9, 2012 at 4:02 PM, Tatsuo Ishii <ishii at postgresql.org
> >
> >> >> wrote:
> >> >> >
> >> >> >> Thanks. Apparently random() of Solaris could return value beyond
> >> >> >> RAND_MAX! It's easy to fix the problem, but I would like to do it
> >> with
> >> >> >> respcet to portability. Any idea?
> >> >> >> --
> >> >> >> Tatsuo Ishii
> >> >> >> SRA OSS, Inc. Japan
> >> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> >> Japanese: http://www.sraoss.co.jp
> >> >> >>
> >> >> >> >>From Solaris 10 (x86) man page:
> >> >> >> >
> >> >> >> >
> >> >> >> > SYNOPSIS
> >> >> >> > #include <stdlib.h>
> >> >> >> >
> >> >> >> > long random(void);
> >> >> >> >
> >> >> >> > void srandom(unsigned int seed);
> >> >> >> >
> >> >> >> > char *initstate(unsigned int seed, char *state,
> size_t
> >> >> >> > size);
> >> >> >> >
> >> >> >> > char *setstate(const char *state);
> >> >> >> >
> >> >> >> > DESCRIPTION
> >> >> >> > The random() function uses a nonlinear additive
> feedback
> >> >> >> > random-number generator employing a default state array
> size
> >> >> >> > of 31 long integers to return successive
> pseudo-random
> >> >> >> > numbers in the range from 0 to 2**31 -1. The period of
> this
> >> >> >> > random-number generator is approximately 16 x (2 **31
> -1).
> >> >> >> > The size of the state array determines the period of
> the
> >> >> >> > random-number generator. Increasing the state array
> size
> >> >> >> > increases the period.
> >> >> >> >
> >> >> >> > The srandom() function initializes the current state
> array
> >> >> >> > using the value of seed.
> >> >> >> >
> >> >> >> >
> >> >> >> > (...)
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > Regards,
> >> >> >> > Rafal
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > -----Original Message-----
> >> >> >> > From: pgpool-general-bounces at pgpool.net [mailto:
> >> >> >> pgpool-general-bounces at pgpool.net] On Behalf Of Tatsuo Ishii
> >> >> >> > Sent: Wednesday, May 09, 2012 11:44 AM
> >> >> >> > To: caravinth at gmail.com
> >> >> >> > Cc: pgpool-general at pgpool.net
> >> >> >> > Subject: [pgpool-general: 431] Re: strange load balancing issue
> in
> >> >> >> Solaris
> >> >> >> >
> >> >> >> > Thanks.
> >> >> >> >
> >> >> >> > 2012-05-09 14:31:48 LOG: pid 22459: r: 268356063.000000
> >> >> total_weight:
> >> >> >> 32767.000000
> >> >> >> >
> >> >> >> > This is really weird. Here pgpool caculate this:
> >> >> >> >
> >> >> >> > r = (((double)random())/RAND_MAX) * total_weight;
> >> >> >> >
> >> >> >> > Total weight is same as RAND_MAX. It seems your random()
> returns
> >> >> >> > bigger than RAND_MAX, which does not make sense because man
> page of
> >> >> >> > random(3) on my Linux says:
> >> >> >> >
> >> >> >> > The random() function uses a non-linear additive
> feedback
> >> >> >> random number
> >> >> >> > generator employing a default table of size 31 long
> >> integers
> >> >> to
> >> >> >> return
> >> >> >> > successive pseudo-random numbers in the range from 0 to
> >> >> RAND_MAX.
> >> >> >> The
> >> >> >> > period of this random number generator is very large,
> >> >> >> approximately
> >> >> >> > 16 * ((2^31) - 1).
> >> >> >> >
> >> >> >> > What does your man page for random() say on your system?
> >> >> >> > --
> >> >> >> > Tatsuo Ishii
> >> >> >> > SRA OSS, Inc. Japan
> >> >> >> > English: http://www.sraoss.co.jp/index_en.php
> >> >> >> > Japanese: http://www.sraoss.co.jp
> >> >> >> >
> >> >> >> >> Sorry . I missed it.
> >> >> >> >>
> >> >> >> >> Here is the log file.
> >> >> >> >>
> >> >> >> >> --Aravinth
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Wed, May 9, 2012 at 2:07 PM, Tatsuo Ishii <
> >> ishii at postgresql.org>
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >>> > The code you have sent is same in child.c.
> >> >> >> >>>
> >> >> >> >>> No.
> >> >> >> >>>
> >> >> >> >>> pool_log("r: %f total_weight: %f", r, total_weight);
> >> >> >> >>>
> >> >> >> >>> You need to add the line above to get usefull information.
> >> >> >> >>> --
> >> >> >> >>> Tatsuo Ishii
> >> >> >> >>> SRA OSS, Inc. Japan
> >> >> >> >>> English: http://www.sraoss.co.jp/index_en.php
> >> >> >> >>> Japanese: http://www.sraoss.co.jp
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> > I have attached the log file. Please check
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> > --Aravinth
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> > On Tue, May 8, 2012 at 6:20 AM, Tatsuo Ishii <
> >> >> ishii at postgresql.org>
> >> >> >> >>> wrote:
> >> >> >> >>> >
> >> >> >> >>> >> I suspect there's some portablity issue with load balance
> >> code.
> >> >> The
> >> >> >> >>> >> actual source code is in
> select_load_balancing_nodechild.c).
> >> >> >> >>> >> Please modify source code and connect to pgpool by using
> psql.
> >> >> >> >>> >> Please send the log output.
> >> >> >> >>> >> --
> >> >> >> >>> >> Tatsuo Ishii
> >> >> >> >>> >> SRA OSS, Inc. Japan
> >> >> >> >>> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> >> >>> >> Japanese: http://www.sraoss.co.jp
> >> >> >> >>> >>
> >> >> >> >>> >> int select_load_balancing_node(void)
> >> >> >> >>> >> {
> >> >> >> >>> >> int selected_slot;
> >> >> >> >>> >> double total_weight,r;
> >> >> >> >>> >> int i;
> >> >> >> >>> >>
> >> >> >> >>> >> /* choose a backend in random manner with weight */
> >> >> >> >>> >> selected_slot = MASTER_NODE_ID;
> >> >> >> >>> >> total_weight = 0.0;
> >> >> >> >>> >>
> >> >> >> >>> >> for (i=0;i<NUM_BACKENDS;i++)
> >> >> >> >>> >> {
> >> >> >> >>> >> if (VALID_BACKEND(i))
> >> >> >> >>> >> {
> >> >> >> >>> >> total_weight +=
> >> >> >> BACKEND_INFO(i).backend_weight;
> >> >> >> >>> >> }
> >> >> >> >>> >> }
> >> >> >> >>> >> r = (((double)random())/RAND_MAX) * total_weight;
> >> >> >> >>> >> pool_log("r: %f total_weight: %f", r, total_weight);
> >> >> >> >>> <--
> >> >> >> >>> >> add this
> >> >> >> >>> >>
> >> >> >> >>> >> total_weight = 0.0;
> >> >> >> >>> >> for (i=0;i<NUM_BACKENDS;i++)
> >> >> >> >>> >> {
> >> >> >> >>> >> if (VALID_BACKEND(i) &&
> >> >> >> BACKEND_INFO(i).backend_weight >
> >> >> >> >>> >> 0.0)
> >> >> >> >>> >> {
> >> >> >> >>> >> if(r >= total_weight)
> >> >> >> >>> >> selected_slot = i;
> >> >> >> >>> >> else
> >> >> >> >>> >> break;
> >> >> >> >>> >> total_weight +=
> >> >> >> BACKEND_INFO(i).backend_weight;
> >> >> >> >>> >> }
> >> >> >> >>> >> }
> >> >> >> >>> >>
> >> >> >> >>> >> pool_debug("select_load_balancing_node: selected
> >> backend
> >> >> id
> >> >> >> is
> >> >> >> >>> %d",
> >> >> >> >>> >> selected_slot);
> >> >> >> >>> >> return selected_slot;
> >> >> >> >>> >> }
> >> >> >> >>> >>
> >> >> >> >>> >>
> >> >> >> >>> >> > Hi Tatsuo, Thanks for the reply.
> >> >> >> >>> >> >
> >> >> >> >>> >> > The normalized weights are 0.5 for both nodes and the
> >> selected
> >> >> >> node is
> >> >> >> >>> >> always the same node. I hope then it's srandom().
> >> >> >> >>> >> >
> >> >> >> >>> >> >
> >> >> >> >>> >> > Any idea to solve this srandom issue
> >> >> >> >>> >> >
> >> >> >> >>> >> >
> >> >> >> >>> >> > Thanks and Regards,
> >> >> >> >>> >> > Aravinth
> >> >> >> >>> >> >
> >> >> >> >>> >> >
> >> >> >> >>> >> > ________________________________
> >> >> >> >>> >> > From: Tatsuo Ishii <ishii at postgresql.org>
> >> >> >> >>> >> > To: aravinth at mafiree.com
> >> >> >> >>> >> > Cc: pgpool-general at pgpool.net
> >> >> >> >>> >> > Sent: Tuesday, May 1, 2012 4:41 AM
> >> >> >> >>> >> > Subject: Re: [pgpool-general: 396] strange load balancing
> >> >> issue in
> >> >> >> >>> >> Solaris
> >> >> >> >>> >> >
> >> >> >> >>> >> > First of all please check "normalized" weights are as you
> >> >> >> expected.
> >> >> >> >>> >> > Run "show pool_status;" and see "backend_weight0",
> >> >> >> "backend_weight1"
> >> >> >> >>> >> > section. You see a floating point numbers, which are the
> >> >> >> normalized
> >> >> >> >>> >> > weight between 0.0 and 1.0. If you see both are 0.5,
> primary
> >> >> and
> >> >> >> >>> >> > standby are given same weight.
> >> >> >> >>> >> >
> >> >> >> >>> >> > If they are ok, I suspect srandom() function behavior is
> >> >> different
> >> >> >> >>> >> > from other platforms. Pgpool-II chooses the load balance
> >> node
> >> >> by
> >> >> >> using
> >> >> >> >>> >> > srandom(). select_load_balancing_node() is the function
> >> which
> >> >> is
> >> >> >> >>> >> > responsible for selecting the load balance node. If you
> run
> >> >> >> pgpool-II
> >> >> >> >>> >> > with -d (debug) option, you will see following in the
> log:
> >> >> >> >>> >> >
> >> >> >> >>> >> > pool_debug("select_load_balancing_node: selected
> backend
> >> >> id is
> >> >> >> >>> %d",
> >> >> >> >>> >> selected_slot);
> >> >> >> >>> >> >
> >> >> >> >>> >> > If backend_weight in show pool_status are fine but the
> line
> >> >> above
> >> >> >> >>> >> > always shows same number, it is the sign that we have
> >> problem
> >> >> with
> >> >> >> >>> >> > srandom().
> >> >> >> >>> >> > --
> >> >> >> >>> >> > Tatsuo Ishii
> >> >> >> >>> >> > SRA OSS, Inc. Japan
> >> >> >> >>> >> > English: http://www.sraoss.co.jp/index_en.php
> >> >> >> >>> >> > Japanese: http://www.sraoss.co.jp
> >> >> >> >>> >> >
> >> >> >> >>> >> >> Hi All,
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> I am facing a strange issue in load balancing with
> >> replication
> >> >> >> mode
> >> >> >> >>> set
> >> >> >> >>> >> to
> >> >> >> >>> >> >> true in Solaris. Load balancing algorithm always select
> the
> >> >> same
> >> >> >> node
> >> >> >> >>> >> >> whatever may be the backend weight
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Here is the scenario.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> I have a pgpool installed installed in 1 server
> >> >> >> >>> >> >> 2 postgres nodes in other 2 servers
> >> >> >> >>> >> >> replication mode set to true and load balancing set to
> true
> >> >> >> >>> >> >> backend weight of the 2 nodes is 1.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> When I fire the queries manuall using different
> >> connections or
> >> >> >> using
> >> >> >> >>> >> >> pgbench all the queries hit the same node. Load
> balancing
> >> >> >> algorithm
> >> >> >> >>> >> always
> >> >> >> >>> >> >> select the same node.
> >> >> >> >>> >> >> No effect in changing the backend weight. Only when I
> set
> >> >> backend
> >> >> >> >>> >> weight to
> >> >> >> >>> >> >> 0 hits go to the other server.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> I face this issue only in solaris. The same setup in
> other
> >> >> >> servers (
> >> >> >> >>> >> centos
> >> >> >> >>> >> >> ,RHEL, ubunt etc) does the load balancing perfectly.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Also tries various postgres versions and pgpool version
> >> with
> >> >> same
> >> >> >> >>> >> result.
> >> >> >> >>> >> >> But every version runs fine in other servers.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Has anyone faced this issue?
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Any information would highly helpful.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Regards,
> >> >> >> >>> >> >> Aravinth
> >> >> >> >>> >> _______________________________________________
> >> >> >> >>> >> pgpool-general mailing list
> >> >> >> >>> >> pgpool-general at pgpool.net
> >> >> >> >>> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> >> >>> >>
> >> >> >> >>>
> >> >> >> > _______________________________________________
> >> >> >> > pgpool-general mailing list
> >> >> >> > pgpool-general at pgpool.net
> >> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> >> > _______________________________________________
> >> >> >> > pgpool-general mailing list
> >> >> >> > pgpool-general at pgpool.net
> >> >> >> > http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> >> _______________________________________________
> >> >> >> pgpool-general mailing list
> >> >> >> pgpool-general at pgpool.net
> >> >> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> pgpool-general mailing list
> >> >> pgpool-general at pgpool.net
> >> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >> >>
> >> >>
> >> _______________________________________________
> >> pgpool-general mailing list
> >> pgpool-general at pgpool.net
> >> http://www.pgpool.net/mailman/listinfo/pgpool-general
> >>
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pgpool.net/pipermail/pgpool-general/attachments/20120629/1f244af5/attachment.htm>
More information about the pgpool-general
mailing list