ok let me explain my setup better
I am actually running zenoss on a 2 node veritas cluster. The zenoss filesystem, MySQL DB, rabbitmq mensia is all on a SAN that can fail back and forth between both nodes. This is why I use the rabbitmq-env.conf to make the mnesia location on the SAN and also force the hostname to be localhost (as both nodes are localhost).
None of the Apps (Zenoss, MySQL, RabbitMQ) have any idea that this is a cluster.
Zenoss works great on Node 1 but when I failover to node 2 I get the problem I reported in the first post where zenhub cannot connect to rabbitmq. This worked originally on the original failover test but will not now. I cannot find a reason why. Everything is the same on both nodes. The erlang cookie shouldn't matter because rabbitmq in not in a cluster mode. The erland cookie is also on on the SAN so it is passed back and forth when rabbitmq is failover.
I didn't really want to get into this as I don't want this to get too confusing.
On a failover to node 2, rabbitmq starts
ps -ef | grep rabbitmq
rabbitmq 28171 1 0 Sep04 ? 00:00:00 /usr/lib64/erlang/erts-5.8.5/bin/epmd -daemon
root 41777 1 0 10:50 ? 00:00:00 runuser rabbitmq --session-command /usr/sbin/rabbitmq-server
rabbitmq 41781 41777 0 10:50 ? 00:00:00 /bin/sh /usr/sbin/rabbitmq-server
rabbitmq 41791 41781 3 10:50 ? 00:00:02 /usr/lib64/erlang/erts-5.8.5/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /opt/zenoss/rabbitmq -- -noshell -noinput -sname rabbit@localhost -boot /opt/zenoss/rabbitmq/rabbit@localhost-plugins-expand/rabbit -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit@localhost.log"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@localhost-sasl.log"} -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/opt/zenoss/rabbitmq/mnesia/rabbit@localhost"
rabbitmq 41971 41791 0 10:50 ? 00:00:00 inet_gethost 4
rabbitmq 41972 41971 0 10:50 ? 00:00:00 inet_gethost 4
root 45458 40092 0 10:51 pts/0 00:00:00 grep rabbitmq
rabbitmqctl list_connections
Listing connections ...
zenoss 127.0.0.1 43314 running
zenoss 127.0.0.1 43288 running
zenoss 127.0.0.1 43301 running
zenoss 127.0.0.1 43284 running
zenoss 127.0.0.1 43277 running
zenoss 127.0.0.1 43313 running
zenoss 127.0.0.1 43297 running
...done.
service zenoss status
Daemon: zeneventserver program running; pid=43636
Daemon: zopectl not running
Daemon: zenrrdcached program running; pid=43835
Daemon: zenhub not running
Daemon: zenjobs program running; pid=45012
Daemon: zeneventd program running; pid=45063
Daemon: zenping program running; pid=45153
Daemon: zensyslog program running; pid=45205
Daemon: zenstatus program running; pid=45242
Daemon: zenactiond program running; pid=45280
Daemon: zentrap program running; pid=45315
Daemon: zenmodeler program running; pid=45368
Daemon: zenrender program running; pid=45398
Daemon: zenperfsnmp program running; pid=45438
Daemon: zencommand program running; pid=45487
Daemon: zenprocess program running; pid=45517
Daemon: zenmail program running; pid=45558
Daemon: zredis program running; pid=45561
Daemon: zenjmx program running; pid=45604
Daemon: zenwinperf program running; pid=45650
Daemon: zenwin program running; pid=45693
Daemon: zeneventlog program running; pid=45746
zenhub start
(/opt/zenoss/log/zenhub.log)
2013-09-04 15:48:27,231 INFO zen.ZenHub: Worker (28552) reports 2013-09-04 15:48:27,230 CRITICAL zen.zenoss.protocols.amqp: Could not use exchange $RawZenEvents: Could not connect to RabbitMQ: [111] Connection refused
2013-09-04 15:48:27,231 INFO zen.ZenHub: Worker (28552) reports 2013-09-04 15:48:27,231 CRITICAL zen.Events: Unable to publish event to <Products.ZenMessaging.queuemessaging.publisher.EventPublisher object at 0x5eb77d0>: Could not connect to RabbitMQ: [111] Connection refused
2013-09-04 15:48:27,246 INFO zen.ZenHub: Worker (28552) reports 2013-09-04 15:48:27,245 CRITICAL zen.zenoss.protocols.amqp: Could not use exchange $RawZenEvents: Could not connect to RabbitMQ: [111] Connection refused
2013-09-04 15:48:27,246 INFO zen.ZenHub: Worker (28552) reports 2013-09-04 15:48:27,245 CRITICAL zen.Events: Unable to publish event to <Products.ZenMessaging.queuemessaging.publisher.EventPublisher object at 0x5eb77d0>: Could not connect to RabbitMQ: [111] Connection refused
2013-09-04 15:48:27,267 INFO zen.ZenHub: Worker (28552) reports 2013-09-04 15:48:27,267 CRITICAL zen.zenoss.protocols.amqp: Could not use exchange $RawZenEvents: Could not connect to RabbitMQ: [111] Connection refused
/var/log/rabbitmq/rabbitmq@localhost.log
=INFO REPORT==== 5-Sep-2013::10:53:08 ===
accepting AMQP connection <0.350.0> (127.0.0.1:43405 -> 127.0.0.1:5672)
=WARNING REPORT==== 5-Sep-2013::10:53:09 ===
closing AMQP connection <0.350.0> (127.0.0.1:43405 -> 127.0.0.1:5672):
connection_closed_abruptly
free
total used free shared buffers cached
Mem: 148535120 4199564 144335556 0 349992 1838108
-/+ buffers/cache: 2011464 146523656
Swap: 4194288 0 4194288
df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg0-root 1032088 377348 602312 39% /
tmpfs 74267560 0 74267560 0% /dev/shm
/dev/sde1 148742 33391 107671 24% /boot
/dev/mapper/vg0-home 1032088 34176 945484 4% /home
/dev/mapper/vg0-opt 8256952 845196 6992328 11% /opt
/dev/mapper/vg0-srv 558356472 202408 530014544 1% /srv
/dev/mapper/vg0-usr 4128448 1602120 2316616 41% /usr
/dev/mapper/vg0-var 4128448 855076 3063660 22% /var
none 1048576 238772 809804 23% /tmp
tmpfs 4 0 4 0% /dev/vx
/dev/vx/dsk/zenmasterDG/zenoss-vol01
1073708032 1369014 1005317891 1% /opt/zenoss