Validator 'sync-state' turned into 'Stalled', allthough internet was online and services running

Hello everyone,

I am running a Pulsechain-validator on Mainnet.
Having used the scripts and procedure outlined by RHMax on Github, all was good the first days.
After a good day I was synced and the first yield was being made.

Some time ago, however, I checked on the status of the services for Geth, LH-Beacon and LH-Validator; they were all running.
But doing a check on the json-output for LH-Health, I noticed that the value for sync_state was “Stalled”, where it should be in a normal, healthy state:

“sync_state”: “Synced”

Doing a check on my validator-nodes I saw that I had been offline for some days already, ‘making’ negative income.
Damn…solely checking on status ‘running’ for the services had proved not to be effective.

So, what does this status ‘Stalled’ mean and what might have been the cause?

So far I only some suspicious logoutput in this location:

/opt/lighthouse/data/beacon/beacon/logs/beacon.log.

This is what I found:

“Subnet peer discovery did not find sufficient peers. Reached max retry limit.”
service: libp2p
module: lighthouse_network: : discovery:591

I checked my router and around that time the internetconnection was functioning normally.

Of course I also tried to remember what things I may have invoked on the validator after it was activated on the chain.
This is only one thing: I added some firewall rules to allow an extra homecomputer to access the server.
After that I did a: sudo ufw reload, which should apply the extra rules without having to restart anything.
Can this have caused the ‘stalling’.

A server-reboot made my validators come online again (slowly)

========================================================

I took some serious nagative income because of this matter, so I am thinking on running an itelligent script (bash/python)
that does this (in words):

Test the value in the json-Dictionary: if it’s “Synced” or “Syncing” do nothing (all is good)
If it’s not “Synced” or “Syncing”:
Stop all 3 servcies in an elegant way via Systemd. (Maybe add some extra ‘pause’ time.)
Then reboot the machine.

Then we add such a script to Cron to run once a day. or other frequency.
Even better (an extra monitor-safety) would be: send an email if the status is not “Synced” or “Syncing”.

========================================================

Maybe there are validator-runners out there who have implemented better or simpler solutions.

Greetz, Laurens (5555 to all)

1 Like