Spacewalk, osad and jabberd miscommunication
Spacewalk is really a handy tool if you want to keep your Linux infrastructure up to date. Especially if you run Redhat based ditro, since essentially it’s community supported version of Redhat’s product called Satellite.
But it has a nasty issue, at least in our case, when the clients stop responding to the commands from a server. There are two mechanisms to deliver an action from a server to a client: through rhnsd, which by defaults connects to a server every 4 hours and checks if there any action it should execute, or using jabber protocol. In the later case a client receives an action request, i.e. install the latest packages or execute a command, from a server almost instantaneously but as I mentioned before, this cool feature stops working for no obvious reason. Everything seems to be working just fine: jabberd and osa-dispatcher are up and running, all client connects to a server flawlessly but an action request never reaches the target just like it has never been sent or got lost in between. Anyway, it seems that the only way out from this annoying situation is the following:
/etc/init.d/jabberd stop
/etc/init.d/osa-dispatcher stop
rm -f /var/lib/jabberd/db/*
su - postgresq && psql
delete from rhnPushDispatcher;
delete from rhnpushclient;
This is how we have to fix it from time to time. And don’t forget to restart osad daemon on all of your clients to reinstantiate a connection to your spacewalk server. If you have more than 10 servers this part could be a huge PITA. Hope that 1.8 release lacks this problem.