Missing agents

The GoCD agent may sometimes encounter memory issues and exit unexpectedly. To bring the agent back up, SSH to it and run the following:

~$ sudo systemctl status go-agent
● go-agent.service - go-agent
   Loaded: loaded (/etc/systemd/system/go-agent.service; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Tue 2020-02-18 10:46:11 UTC; 6s ago
 Main PID: 51694 (code=killed, signal=KILL)

Feb 10 19:37:58 s0 su[27465]: pam_unix(su:session): session closed for user ops
Feb 10 19:37:58 s0 sudo[27365]: pam_unix(sudo:session): session closed for user root
Feb 10 19:38:21 s0 sudo[27699]:       go : TTY=unknown ; PWD=/var/lib/go-agent/pipelines/MyPipeline ; USER=root ; COMMAND=/usr/bin/salt-call ubiquitous_platform.php_fpm_rollover ops
Feb 10 19:38:21 s0 sudo[27699]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 10 19:38:27 s0 sudo[27699]: pam_unix(sudo:session): session closed for user root
Feb 10 19:39:00 s0 sudo[28103]:       go : TTY=unknown ; PWD=/var/lib/go-agent/pipelines/MyPipeline ; USER=root ; COMMAND=/usr/local/bin/do-something
Feb 10 19:39:00 s0 sudo[28103]: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 10 19:39:05 s0 sudo[28103]: pam_unix(sudo:session): session closed for user root
Feb 18 10:46:11 s0 systemd[1]: go-agent.service: Main process exited, code=killed, status=9/KILL
Feb 18 10:46:11 s0 systemd[1]: go-agent.service: Failed with result 'signal'

In this example the agent has been killed administratively by the root user. We can start the service:

sudo systemctl start go-agent

When troubleshooting, you can find the logs at /var/log/go-agent. It's generally easier to tail them all at once:

sudo -s
cd /var/log/go-agent
tail -f *.log

Backlinks