Jobqueue stuck for some jobs for server_id = 0

linus · Nov 18, 2024

Server1 (id=1) has a mirror; Server2 (id=11). Since a month ago, or so, when I cleaned of old databases and dbusers I got two jobs stuck, and they have been stuck since then. Today I noticed six additional stuck jobs. I've updated both to latest versions.

The following changes are not yet populated to all servers:

Update main config: 6

Delete database user: 2

Normal changes seem to be working, when I make a change in the ISPConfig GUI and check the debug the new job is processed and the ballon value is lowered to 8 again.
Code:
root@server2 # /usr/local/ispconfig/server/server.sh
Set Lock: /usr/local/ispconfig/server/temp/.ispconfig_lock
18.11.2024-14:48 - DEBUG [plugins.inc:155] - Calling function 'check_phpini_changes' from plugin 'webserver_plugin' raised by action 'server_plugins_loaded'.
18.11.2024-14:48 - DEBUG [server:224] - Remove Lock: /usr/local/ispconfig/server/temp/.ispconfig_lock
finished server.php.
From the sys_datalog I think the following are in the queue:
datalogid, server_id, dbtable, dbidx, (no errors)
94407 0 sys_ini sysini_id:1
94409 0 sys_ini sysini_id:1
94405 0 sys_ini sysini_id:1
94403 0 sys_ini sysini_id:1
94401 0 sys_ini sysini_id:1
94399 0 sys_ini sysini_id:1

serverid, updated
1, 94411
11, 94351

From the forum I've read that sys_datalog server_id = 0 is the broadcast, but as serverid 11 should be a mirror it should have updated everything automatically? I have mysql database-replication on so I suspect at least the database could have had been removed before the job was processed, but then again, if so, I would expect that to happen more often. But now that there are six new jobs, I'm out of clues.

till · Nov 18, 2024

Is server 11 running and fetching jobs from the queue correctly? Either server 11 is not able to fetch changes from master, or it is not able to update the value in the 'updated' column to the last processed datalog_id anymore.

remkoh · Nov 18, 2024

I've had a similar problem.
Turned out node X lost it's connection to ISPC master db.
So jobs queued up on the master node for node X while other nodes did their job correctly.

So for short: verify that your database connections on all other nodes to the master node are operational.
My guess is at least one is not.

till · Nov 18, 2024

I guess we have to build a tool to diagnose such issues more easily and add an alert, E.g., when there has been no new monitoring data for a node for a certain amount of time.

linus · Nov 18, 2024

Yes, thank you, you are correct. The error is on my side. For some reason the my monitoring hadn't noticed that the replication had stalled (some WP options transient duplicate keys). Now it is catching up, two of the jobs are gone.

Log in or Sign up

Jobqueue stuck for some jobs for server_id = 0

linus Member

till Super Moderator Staff Member ISPConfig Developer

remkoh Active Member HowtoForge Supporter

till Super Moderator Staff Member ISPConfig Developer

linus Member

Share This Page

Log in or Sign up

Jobqueue stuck for some jobs for server_id = 0

linus Member

till Super Moderator Staff Member ISPConfig Developer

remkoh Active Member HowtoForge Supporter

till Super Moderator Staff Member ISPConfig Developer

linus Member

Share This Page

Useful Searches