Jobqueue stuck for some jobs for server_id = 0

Discussion in 'General' started by linus, Nov 18, 2024 at 1:53 PM.

Tags:
  1. linus

    linus Member

    Server1 (id=1) has a mirror; Server2 (id=11). Since a month ago, or so, when I cleaned of old databases and dbusers I got two jobs stuck, and they have been stuck since then. Today I noticed six additional stuck jobs. I've updated both to latest versions.

    The following changes are not yet populated to all servers:
    • Update main config: 6
    • Delete database user: 2

    Normal changes seem to be working, when I make a change in the ISPConfig GUI and check the debug the new job is processed and the ballon value is lowered to 8 again.

    Code:
    root@server2 # /usr/local/ispconfig/server/server.sh
    Set Lock: /usr/local/ispconfig/server/temp/.ispconfig_lock
    18.11.2024-14:48 - DEBUG [plugins.inc:155] - Calling function 'check_phpini_changes' from plugin 'webserver_plugin' raised by action 'server_plugins_loaded'.
    18.11.2024-14:48 - DEBUG [server:224] - Remove Lock: /usr/local/ispconfig/server/temp/.ispconfig_lock
    finished server.php.
    
    From the sys_datalog I think the following are in the queue:
    datalogid, server_id, dbtable, dbidx, (no errors)
    94407 0 sys_ini sysini_id:1
    94409 0 sys_ini sysini_id:1
    94405 0 sys_ini sysini_id:1
    94403 0 sys_ini sysini_id:1
    94401 0 sys_ini sysini_id:1
    94399 0 sys_ini sysini_id:1

    serverid, updated
    1, 94411
    11, 94351

    From the forum I've read that sys_datalog server_id = 0 is the broadcast, but as serverid 11 should be a mirror it should have updated everything automatically? I have mysql database-replication on so I suspect at least the database could have had been removed before the job was processed, but then again, if so, I would expect that to happen more often. But now that there are six new jobs, I'm out of clues.
     
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    Is server 11 running and fetching jobs from the queue correctly? Either server 11 is not able to fetch changes from master, or it is not able to update the value in the 'updated' column to the last processed datalog_id anymore.
     
    linus and ahrasis like this.
  3. remkoh

    remkoh Active Member HowtoForge Supporter

    I've had a similar problem.
    Turned out node X lost it's connection to ISPC master db.
    So jobs queued up on the master node for node X while other nodes did their job correctly.

    So for short: verify that your database connections on all other nodes to the master node are operational.
    My guess is at least one is not.
     
    linus likes this.
  4. till

    till Super Moderator Staff Member ISPConfig Developer

    I guess we have to build a tool to diagnose such issues more easily and add an alert, E.g., when there has been no new monitoring data for a node for a certain amount of time.
     
  5. linus

    linus Member

    Yes, thank you, you are correct. The error is on my side. For some reason the my monitoring hadn't noticed that the replication had stalled (some WP options transient duplicate keys). Now it is catching up, two of the jobs are gone.
     
    till likes this.

Share This Page