Spacewalk OSAD Client Unable to Connect to Jabber on Spacewalk Server

From DFWLPiki
Jump to: navigation, search

Summary

Summary Statement
Spacewalk can occasionally get into a state where clients cannot contact the server. Constant communication from the client to the server is required for proper operation.

Impact Statement
Spacewalk shows all clients as "Offline for unknown" as OSAD status. Spacewalk clients are unable to immediately receive scheduled commands from the spacewalk server.

Analysis

Symptoms and Steps to Reproduce
Spacewalk Clients all show as offline.

Restarting OSAD on the client fails to reconnect to the server.

Looking in /var/log/osad you will see messages like this:

2015-10-24 07:29:10 jabber_lib.main: Unable to connect to jabber servers, sleeping 120 seconds
2015-10-24 07:33:00 jabber_lib.main: Unable to connect to jabber servers, sleeping 68 seconds
2015-10-24 07:35:57 jabber_lib.main: Unable to connect to jabber servers, sleeping 98 seconds
2015-10-24 07:39:27 jabber_lib.main: Unable to connect to jabber servers, sleeping 60 seconds
2015-10-24 07:42:19 jabber_lib.main: Unable to connect to jabber servers, sleeping 71 seconds
2015-10-24 07:45:20 jabber_lib.main: Unable to connect to jabber servers, sleeping 110 seconds
2015-10-24 07:48:06 jabber_lib.main: Unable to connect to jabber servers, sleeping 118 seconds

On the Spacewalk server, in /var/log/messages, you can see each client making connections and disconnections, but still the clients show offline.

[root@rco-spw01 rhn]# cat /var/log/messages|grep -v snmp
Oct 18 03:48:27 rco-spw01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1519" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Oct 18 03:48:27 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.18, port=43546] connect
Oct 18 03:48:27 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.18, port=43546] disconnect jid=unbound, packets: 0
Oct 18 03:48:28 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.36, port=53017] connect
Oct 18 03:48:28 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.36, port=53017] disconnect jid=unbound, packets: 0
Oct 18 03:48:28 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.18, port=43547] connect
Oct 18 03:48:28 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.18, port=43547] disconnect jid=unbound, packets: 0
Oct 18 03:48:29 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.36, port=53018] connect
Oct 18 03:48:29 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.36, port=53018] disconnect jid=unbound, packets: 0
Oct 18 03:48:30 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.36, port=53019] connect
Oct 18 03:48:30 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.36, port=53019] disconnect jid=unbound, packets: 0
Oct 18 03:48:31 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.107.17, port=54746] connect
Oct 18 03:48:31 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.107.17, port=54746] disconnect jid=unbound, packets: 0
Oct 18 03:48:33 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.107.17, port=54747] connect
Oct 18 03:48:33 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.107.17, port=54747] disconnect jid=unbound, packets: 0
Oct 18 03:48:34 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.107.17, port=54748] connect
Oct 18 03:48:34 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.107.17, port=54748] disconnect jid=unbound, packets: 0
Oct 18 03:48:36 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40359] connect
Oct 18 03:48:36 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40359] disconnect jid=unbound, packets: 0
Oct 18 03:48:37 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40360] connect
Oct 18 03:48:37 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40360] disconnect jid=unbound, packets: 0
Oct 18 03:48:38 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40361] connect
Oct 18 03:48:38 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40361] disconnect jid=unbound, packets: 0
Oct 18 03:48:49 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40362] connect
Oct 18 03:48:49 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40362] disconnect jid=unbound, packets: 0
Oct 18 03:48:50 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40363] connect
Oct 18 03:48:50 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40363] disconnect jid=unbound, packets: 0
Oct 18 03:48:51 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40364] connect
Oct 18 03:48:51 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.96.23, port=40364] disconnect jid=unbound, packets: 0
Oct 18 03:48:53 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.16, port=56421] connect
Oct 18 03:48:53 rco-spw01 jabberd/c2s[2345]: [7] [::ffff:10.23.94.16, port=56421] disconnect jid=unbound, packets: 0

A verbose restart of the OSAD client shows that the client cannot connect to the server:

[root@rco-spw01 log]# osad -N -v -v -v -v
2015-10-24 07:54:08 osad._setup_config: Updating configuration
2015-10-24 07:54:09 osad._setup_config: Time drift 1
2015-10-24 07:54:09 osad._setup_config: Client name de460d732f0d5f4c
2015-10-24 07:54:09 osad._setup_config: Shared key d3aea140753bb07e33b33a7900513ab67bd5d01b
2015-10-24 07:54:09 jabber_lib.setup_connection: Connecting to spacewalk.corp.repairsvc.com
2015-10-24 07:54:09 jabber_lib._get_jabber_client: 
2015-10-24 07:54:09 jabber_lib._get_jabber_client: Connecting to spacewalk.corp.repairsvc.com
2015-10-24 07:54:09 jabber_lib.__init__: 
2015-10-24 07:54:09 jabber_lib.__init__: 
2015-10-24 07:54:09 jabber_lib.check_cert: Loading cert <X509Name object '/C=US/ST=Texas/L=Coppell/O=RSI/OU=spacewalk.corp.repairsvc.com/CN=spacewalk.corp.repairsvc.com'>
2015-10-24 07:54:09 jabber_lib.connect: 
2015-10-24 07:54:09 jabber_lib.process: 300
Error connecting to jabber server: [Errno 104] Connection reset by peer
2015-10-24 07:55:58 jabber_lib.main: Unable to connect to jabber servers, sleeping 76 seconds
2015-10-24 07:57:14 osad.setup_config: Skipping config setup; counter=1; interval=72
2015-10-24 07:57:14 jabber_lib.setup_connection: Connecting to spacewalk.corp.repairsvc.com
2015-10-24 07:57:14 jabber_lib._get_jabber_client: 
2015-10-24 07:57:14 jabber_lib._get_jabber_client: Connecting to spacewalk.corp.repairsvc.com
2015-10-24 07:57:14 jabber_lib.__init__: 
2015-10-24 07:57:14 jabber_lib.__init__: 
2015-10-24 07:57:14 jabber_lib.check_cert: Loading cert <X509Name object '/C=US/ST=Texas/L=Coppell/O=RSI/OU=spacewalk.corp.repairsvc.com/CN=spacewalk.corp.repairsvc.com'>
2015-10-24 07:57:14 jabber_lib.connect: 
2015-10-24 07:57:14 jabber_lib.process: 300
Error connecting to jabber server: [Errno 104] Connection reset by peer
2015-10-24 07:59:05 jabber_lib.main: Unable to connect to jabber servers, sleeping 110 seconds

Return to Service

After much googling for this problem, this solution was found: https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD

The solution was to purge the jabber database, and restart jabber.

/etc/init.d/jabberd stop
/etc/init.d/osa-dispatcher stop
rm -rf /var/lib/jabberd/db/*
/etc/init.d/jabberd start
/etc/init.d/osa-dispatcher start

Immediately following this, all clients began to show as "Online" in their OSAD status. Some clients that I impatiently did not wait for them to check in, i logged in and restarted OSAD, and they immediately switched back to online as well (they likely would have come to online on their own).