OctoScreen causing OctoPrint to build up excessive thread count when printer is offline

I am currently running OctoPrint version 1.4.2 on octopi 0.15.1 with an Anycubic i3 Mega.
This setup worked fine for many months. I don't print very much, so the printer is powered off 90% of the time, but the pi with Octoprint is running 24/7. When I power the printer I usually just have to click "Connect" in the webpage and Octoprint wil reconnect successfully and I can print.

I happened about 1 week ago that suddenly the Octoprint webpage did no longer load for me after trying to print something again, when I had printed 1 or 2 days ago.
Not knowing exactly what might be the problem I just restarted octoprint (via systemctl restart octoprint) and everything was fine again. However the problem reoccured over and over again.
I thought it might have to do with a recent plugin or plugin-update so I disabled them all (e.g. TouchUI). The problem persisted and required me to restart Octoprint on error.

Finally looking at the logs I spotted the problem. The logs are FLOODED with hundreds of thousands of repeated errors:

octoprint.log (excerpt)

2020-08-25 21:22:10,553 - octoprint.util.comm - INFO - Changing monitoring state from "Error: No more candidates to test, and no working port/baudrate combination detected." to "Offline (Error: No more candidates to test, and no working port/baudrate combination detected.)"
2020-08-25 21:22:10,564 - octoprint.plugins.action_command_notification - INFO - Notifications cleared
2020-08-25 21:22:12,517 - octoprint.util.comm - INFO - Changing monitoring state from "Offline" to "Detecting serial connection"
2020-08-25 21:22:12,549 - octoprint.util.comm - INFO - Serial detection: Performing autodetection with 0 port/baudrate candidates:
2020-08-25 21:22:12,550 - octoprint.util.comm - INFO - Changing monitoring state from "Detecting serial connection" to "Error: No more candidates to test, and no working port/baudrate combination detected."
2020-08-25 21:22:12,556 - octoprint.util.comm - INFO - Changing monitoring state from "Error: No more candidates to test, and no working port/baudrate combination detected." to "Offline (Error: No more candidates to test, and no working port/baudrate combination detected.)"
2020-08-25 21:22:12,565 - octoprint.plugins.action_command_notification - INFO - Notifications cleared
2020-08-25 21:22:14,570 - octoprint.util.comm - INFO - Changing monitoring state from "Offline" to "Detecting serial connection"
2020-08-25 21:22:14,601 - octoprint.util.comm - INFO - Serial detection: Performing autodetection with 0 port/baudrate candidates:
2020-08-25 21:22:14,603 - octoprint.util.comm - INFO - Changing monitoring state from "Detecting serial connection" to "Error: No more candidates to test, and no working port/baudrate combination detected."
2020-08-25 21:22:14,607 - octoprint.util.comm - INFO - Changing monitoring state from "Error: No more candidates to test, and no working port/baudrate combination detected." to "Offline (Error: No more candidates to test, and no working port/baudrate combination detected.)"
2020-08-25 21:22:14,617 - octoprint.plugins.action_command_notification - INFO - Notifications cleared
2020-08-25 21:22:16,588 - octoprint.util.comm - INFO - Changing monitoring state from "Offline" to "Detecting serial connection"
2020-08-25 21:22:16,619 - octoprint.util.comm - INFO - Serial detection: Performing autodetection with 0 port/baudrate candidates:
2020-08-25 21:22:16,621 - octoprint.util.comm - INFO - Changing monitoring state from "Detecting serial connection" to "Error: No more candidates to test, and no working port/baudrate combination detected."
2020-08-25 21:22:16,624 - octoprint.util.comm - INFO - Changing monitoring state from "Error: No more candidates to test, and no working port/baudrate combination detected." to "Offline (Error: No more candidates to test, and no working port/baudrate combination detected.)"
2020-08-25 21:22:16,635 - octoprint.plugins.action_command_notification - INFO - Notifications cleared
2020-08-25 21:22:18,598 - octoprint.util.comm - INFO - Changing monitoring state from "Offline" to "Detecting serial connection"
2020-08-25 21:22:18,629 - octoprint.util.comm - INFO - Serial detection: Performing autodetection with 0 port/baudrate candidates:
2020-08-25 21:22:18,631 - octoprint.util.comm - INFO - Changing monitoring state from "Detecting serial connection" to "Error: No more candidates to test, and no working port/baudrate combination detected."
2020-08-25 21:22:18,634 - octoprint.util.comm - INFO - Changing monitoring state from "Error: No more candidates to test, and no working port/baudrate combination detected." to "Offline (Error: No more candidates to test, and no working port/baudrate combination detected.)"
2020-08-25 21:22:18,645 - octoprint.plugins.action_command_notification - INFO - Notifications cleared
2020-08-25 21:22:20,554 - octoprint.util.comm - INFO - Changing monitoring state from "Offline" to "Detecting serial connection"
2020-08-25 21:22:20,557 - octoprint - ERROR - Exception on /api/connection [POST]
Traceback (most recent call last):
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/server/util/flask.py", line 1315, in decorated_view
    return func(*args, **kwargs)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/vendor/flask_principal.py", line 199, in _decorated
    rv = f(*args, **kw)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/server/api/connection.py", line 69, in connectionCommand
    printer.connect(port=port, baudrate=baudrate, profile=printerProfile)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/printer/standard.py", line 270, in connect
    self._comm.start()
  File "/home/pi/oprint/local/lib/python2.7/site-packages/octoprint/util/comm.py", line 662, in start
    self.sending_thread.start()
  File "/usr/lib/python2.7/threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread
2020-08-25 21:22:20,570 - tornado.access - ERROR - 500 POST /api/connection (127.0.0.1) 103.56ms
2020-08-25 21:22:20,605 - octoprint.util.comm - INFO - Serial detection: Performing autodetection with 0 port/baudrate candidates:
2020-08-25 21:22:20,606 - octoprint.util.comm - INFO - Changing monitoring state from "Detecting serial connection" to "Error: No more candidates to test, and no working port/baudrate combination detected."
2020-08-25 21:22:20,612 - octoprint.util.comm - INFO - Changing monitoring state from "Error: No more candidates to test, and no working port/baudrate combination detected." to "Offline (Error: No more candidates to test, and no working port/baudrate combination detected.)"
2020-08-25 21:22:20,620 - octoprint.plugins.action_command_notification - INFO - Notifications cleared
2020-08-25 21:22:22,582 - octoprint.util.comm - INFO - Changing monitoring state from "Offline" to "Detecting serial connection"
2020-08-25 21:22:22,584 - octoprint - ERROR - Exception on /api/connection [POST]
Traceback (most recent call last):
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pi/oprint/local/lib/python2.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)

This excerpt is the very beginning when the exceptions started and the exception stack traces repeat over and over again.

Any clue about the cause of that problem?
logs.zip (2.1 MB)

Highly possible there is some kind of corruption going on there. If it was a general issue with uploads, believe me we would have heard about it...

You can try a force reinstall of OctoPrint, using th following command, from the command line of the raspberry pi ( eg. Via SSH)
~/oprint/bin/pip install --force-reinstall --no-cache-dir octoprint

That would make sure there was no corruption in OctoPrint's install.

The other option may be re-imaging the SD card. You can take a backup from within OctoPrint, then restore it on a fresh image flashed to the SD card.

Dear Charlie, thanks for that suggestion. I can certainly try that but IMHO a problem a installation corruption problem would not explain why everything works after a restart of the service. Not sure why you say "upload"? The problem is not related to the upload feature/filemanager - the whole UI is unavailable after a certain time. Is there a seperate way of logging the tornado webserver maybe to see what causes it to error out with code 500?

At the end of the exception is a "error: can't start new thread" - maybe due to an out-of-memory/filehandle/... error after too many retries of failed connection attempts? I shall have a look at the system stats the next time the problem occurs.

Yeah I'm not sure why I said upload too.... Must have been confused with a different thing at the time. Anyway I've read the post again now

The reason it's sending back a 500 is internal server error, which is explained from the exceptions in the logs, from what I can tell there is no need to log it separately, since it is logging to octoprint.log. The exception that causes it is always right above the logging of the 500 error.

I'll admit that I didn't look through the rest of the log after the first few exceptions, and I now see lots of MemoryError too. What Pi are you using? It could be a memory leak issue I guess, but I would have thought it would take a long time to build up enough, last I checked OctoPrint used minimal amounts of memory (<200mb) since it runs on all pi models.

Do have a look at memory usage (maybe open an SSH session with htop running) when this happens.

Maybe something is not being closed in the serial comm properly. If you try a reinstall (just to confirm, there's no harm done I guess) and the issue still persists, maybe a report on the bug tracker might be in order.

It seems that issue in another form is already well known and discussed between the Octoprint and OctoScreen team already.... https://github.com/OctoPrint/OctoPrint/issues/3532
From what I read both parties seem to consider the problem not theirs and ask the other one to change. As mentioned in my last comment on that issue I could imagine a fix on the Octoscreen side for that which might be rather simple... :-/

I see it, I've been getting the emails on that too, since I follow all conversations in OctoPrint as I'm doing ticket triage.

The issue with OctoScreen is, it is far easier to set up a timer to repeatedly poll the server than to integrate OctoPrint's websocket as is done in the standard UI, so that's what people do.

Hope it gets solved soon...

I am currently cloning the repo and looking at that stuff with VS Code and do the docker build... I have no clue about Go, but it is pretty good to understand what's going on... at first I'll try to rebuild the binary as the last release in the repo is from December 2019...

I honestly don't understand why the f OctoScreen even keeps on constantly trying to connect. Polling API endings via GET is one thing, with a sensible rate, but when you keep spamming a server with POST requests that have side effects you are doing it wrong and are abusing the server and API.

Implementing rate limiting in tornado (web server that octoprint uses internally) is hard though, if possible at all. My research only found recommendations by the tornado developers to do that through different means, outside of the application.

Cloned the repo and changed the code (just here offline for myself) so the connection interval is configurable. Now running at 10s interval, not the default 2s. Will monitor for emerging problems with this setting.I understand this is no "good fix" for the underlying root cause but the best I can offer for the moment as I am no Go dev. Will keep you updated.

1 Like

Results first: The situation does not seem to improve.
I made further tweaks and it seems even at e.g. 15sec connection intervals OctoPrint is building up a thread count :frowning:

Added some further log messages (I don't have any other means of debugging it right now :-/) and see the following:

time="2020-09-03T12:54:05+02:00" level=info msg="OctoPrint's config file found: \"/home/pi/.octoprint/config.yaml\"" file=main.go func=main.readConfig line=118
time="2020-09-03T12:54:05+02:00" level=info msg="Using \"http://localhost:5000\" as server address" file=main.go func=main.init.0 line=54
time="2020-09-03T12:54:05+02:00" level=info msg="Found API key at \"/home/pi/.octoprint/config.yaml\" file" file=main.go func=main.init.0 line=60
time="2020-09-03T12:54:06+02:00" level=debug msg="New background task started" file=common.go func="ui.(*BackgroundTask).Start" line=136
time="2020-09-03T12:54:06+02:00" level=debug msg="ConnectionAttempts: 0" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T12:54:06+02:00" level=error msg="unexpected status code: 404" file=ui.go func="ui.(*UI).loadSettings" line=198
time="2020-09-03T12:54:06+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T12:54:17+02:00" level=info msg="OctoPrint's config file found: \"/home/pi/.octoprint/config.yaml\"" file=main.go func=main.readConfig line=118
time="2020-09-03T12:54:17+02:00" level=info msg="Using \"http://localhost:5000\" as server address" file=main.go func=main.init.0 line=54
time="2020-09-03T12:54:17+02:00" level=info msg="Found API key at \"/home/pi/.octoprint/config.yaml\" file" file=main.go func=main.init.0 line=60
time="2020-09-03T12:58:44+02:00" level=info msg="OctoPrint's config file found: \"/home/pi/.octoprint/config.yaml\"" file=main.go func=main.readConfig line=118
time="2020-09-03T12:58:44+02:00" level=info msg="Using \"http://localhost:5000\" as server address" file=main.go func=main.init.0 line=54
time="2020-09-03T12:58:44+02:00" level=info msg="Found API key at \"/home/pi/.octoprint/config.yaml\" file" file=main.go func=main.init.0 line=60
time="2020-09-03T12:58:45+02:00" level=debug msg="New background task started" file=common.go func="ui.(*BackgroundTask).Start" line=136
time="2020-09-03T12:58:45+02:00" level=debug msg="ConnectionAttempts: 0" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T12:58:45+02:00" level=error msg="unexpected status code: 404" file=ui.go func="ui.(*UI).loadSettings" line=198
time="2020-09-03T12:58:45+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T12:59:00+02:00" level=debug msg="ConnectionAttempts: 0" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T12:59:00+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T12:59:15+02:00" level=debug msg="ConnectionAttempts: 1" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T12:59:15+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T12:59:30+02:00" level=debug msg="ConnectionAttempts: 2" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T12:59:30+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T12:59:45+02:00" level=debug msg="ConnectionAttempts: 3" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T12:59:45+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:00:00+02:00" level=debug msg="ConnectionAttempts: 4" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:00:00+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:00:15+02:00" level=debug msg="ConnectionAttempts: 5" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:00:15+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:00:30+02:00" level=debug msg="ConnectionAttempts: 6" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:00:30+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:00:45+02:00" level=debug msg="ConnectionAttempts: 7" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:00:45+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:01:00+02:00" level=debug msg="ConnectionAttempts: 8" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:01:00+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:01:15+02:00" level=debug msg="ConnectionAttempts: 9" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:01:15+02:00" level=debug msg="ConnectionAttemps > 8 - putting on hold" file=ui.go func="ui.(*UI).update" line=207
time="2020-09-03T13:01:30+02:00" level=debug msg="ConnectionAttempts: 9" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:01:30+02:00" level=debug msg="ConnectionAttemps > 8 - putting on hold" file=ui.go func="ui.(*UI).update" line=207
time="2020-09-03T13:01:45+02:00" level=debug msg="ConnectionAttempts: 9" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:01:45+02:00" level=debug msg="ConnectionAttemps > 8 - putting on hold" file=ui.go func="ui.(*UI).update" line=207
time="2020-09-03T13:02:00+02:00" level=debug msg="ConnectionAttempts: 9" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:02:00+02:00" level=debug msg="ConnectionAttemps > 8 - putting on hold" file=ui.go func="ui.(*UI).update" line=207
time="2020-09-03T13:02:15+02:00" level=debug msg="ConnectionAttempts: 9" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:02:15+02:00" level=debug msg="ConnectionAttemps > 8 - putting on hold" file=ui.go func="ui.(*UI).update" line=207
time="2020-09-03T13:02:30+02:00" level=debug msg="ConnectionAttempts: 9" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:02:30+02:00" level=debug msg="ConnectionAttemps > 8 - putting on hold" file=ui.go func="ui.(*UI).update" line=207
time="2020-09-03T13:02:31+02:00" level=info msg="OctoPrint's config file found: \"/home/pi/.octoprint/config.yaml\"" file=main.go func=main.readConfig line=118
time="2020-09-03T13:02:31+02:00" level=info msg="Using \"http://localhost:5000\" as server address" file=main.go func=main.init.0 line=54
time="2020-09-03T13:02:31+02:00" level=info msg="Found API key at \"/home/pi/.octoprint/config.yaml\" file" file=main.go func=main.init.0 line=60
time="2020-09-03T13:02:32+02:00" level=debug msg="New background task started" file=common.go func="ui.(*BackgroundTask).Start" line=136
time="2020-09-03T13:02:32+02:00" level=debug msg="ConnectionAttempts: 0" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:02:32+02:00" level=error msg="unexpected status code: 404" file=ui.go func="ui.(*UI).loadSettings" line=198
time="2020-09-03T13:02:32+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:02:47+02:00" level=debug msg="ConnectionAttempts: 0" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:02:47+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:03:02+02:00" level=debug msg="ConnectionAttempts: 1" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:03:02+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:03:17+02:00" level=debug msg="ConnectionAttempts: 2" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:03:17+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:03:32+02:00" level=debug msg="ConnectionAttempts: 3" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:03:32+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:03:47+02:00" level=debug msg="ConnectionAttempts: 4" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:03:47+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:04:02+02:00" level=debug msg="ConnectionAttempts: 5" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:04:02+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127
time="2020-09-03T13:04:17+02:00" level=debug msg="ConnectionAttempts: 6" file=ui.go func="ui.(*UI).update" line=205
time="2020-09-03T13:04:17+02:00" level=debug msg="In verifyConnection()" file=ui.go func="ui.(*UI).verifyConnection" line=127

So it seems that though the code says "putting on hold" after 9 failed attempts Octoscreen kind of restarts as it is relogging the startup messages of reading the config file etc.

I also noticed that when configuring a poll interval >=10 the systemd WatchDog Timeout fires (WatchdogSec=10s), so I created an override.

pi@octopi:/lib/systemd/system $ systemctl cat octoscreen
# /lib/systemd/system/octoscreen.service
[Unit]
Description=OctoScreen
Conflicts=getty@tty7.service
After=systemd-user-sessions.service getty@tty7.service plymouth-quit.service

[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=/etc/octoscreen/config
ExecStart=/usr/bin/xinit /usr/bin/OctoScreen -- :0 -nolisten tcp -nocursor
ExecStartPost=/bin/bash /etc/octoscreen/disablescreenblank.sh 0
StandardOutput=journal
Restart=always
WatchdogSec=10s

[Install]
WantedBy=graphical.target
Alias=display-manager.service

# /etc/systemd/system/octoscreen.service.d/override.conf
[Service]
WatchdogSec=90s

What startles me however is that despite the much longer connection interval 2s -> 15s OctoPrint is still building up a thread count... @foosel Can you imagine why this is? I understood the increase in thread count is due to firing requests too fast for tornado to handle? I presumed 15s should be more than enough? Is there anything I can configure in octoprint to log the main cause why the threads keep increasing (Waiting for something maybe?)

I also checked the code for commands to close the connection but apparently this is not required at all. It seems Octoscreen's code is just firing JSON requests towards the API so all the low level handling is beneath and should be done properly.

No, sorry, I don't get it either at the moment. Just to rule out the obvious, does it happen in safe mode as well? Also, just to verify, this is against 1.4.2, right?

I'd say so, yes.... restarted in safe-mode:

pi@octopi:~ $ curl -H "X-Api-Key: xxxxx" http://localhost:5000/api/version
{"api":"0.1","server":"1.4.2","text":"OctoPrint 1.4.2"}

pi@octopi:~ $ while true; do { date && ps hH -p 5851 | wc -l &&  sleep 10; } done
Thu  3 Sep 14:50:14 CEST 2020
19
Thu  3 Sep 14:50:24 CEST 2020
20
Thu  3 Sep 14:50:34 CEST 2020
20
Thu  3 Sep 14:50:45 CEST 2020
21
Thu  3 Sep 14:50:55 CEST 2020
22
Thu  3 Sep 14:51:05 CEST 2020
22
Thu  3 Sep 14:51:15 CEST 2020
23
Thu  3 Sep 14:51:25 CEST 2020
24
Thu  3 Sep 14:51:35 CEST 2020
24
Thu  3 Sep 14:51:45 CEST 2020
25
Thu  3 Sep 14:51:56 CEST 2020
26
Thu  3 Sep 14:52:06 CEST 2020
26

After several hours of fiddling around with various Go versions, dlv, modules and loooooots of problems I finally managed to debug OctoScreen in VS Code and... to my surprise... the behavior is here exactly how I expected it when I looked first at the code statically... The connection poller tries 9 times to connect and then display a box on the UI "Cannot connect, click here to try again" and afterwards no connection attempt is made anymore. So basically exactly how @foosel proposed the app should behave. I copied the new compiled binary over to octopi and replaced the binary in the filesystem. The behavior is now correct: Octoscreen just tries 9 times and then waits for user interaction. There is no increase in the thread count anymore for some minutes and it is consistent with my observation in VS Code that no new network I/O is performed.

I don't know why this didn't work today when building OctoScreen via docker built script.
However one of the problems I had with recompiling OctoScreen was that the go-octoprint module lacks a certain file, that OctoScreen references. This file is "printerprofiles.go" and it is NOT present in the referenced module github.com/mcuadros/go-octoprint but only in github.com/ugurgudelek/go-octoprint.
So I have no idea how it could compile at all today, but there were many WT* moments anyway today when doing a crash course of Go...

I'll observe the behavior tomorrow to see whether everything works as designed and will then think about how to make this new version available for others.

I think I identified another and potentially the core bug that hindered OctoScreen from actually pausing properly. After 9 retries it goes to sleep - but doesn't update the systemd watchdog which it does otherwise. The systemd then will restart the octoscreen unit restarting the whole game once more, so in effect octoscreen keeps on bugging octoprint over and over again.
I now "satisfies" the watchdog even when OctoScreen is "asleep". Even after some hours, the thread count does not increase anymore now :smiley:

However, @foosel I would like to understand: Apart from those connection attempts OctoScreen ofc queries the rest of the API also extensively for e.g. displaying the current status on the screen:

  • Is this consided "Ok" or do you see the same problem here as it just another call basically which is then hammering the server
  • If this is not the way to go, what would be the best variant to actually query that kind of information for an external application to display it. If it is within my capabilities I could try to implement something like that which won't stress octoprint the same way OctoScreen does today.

Thank you

Querying information that OctoPrint has readily available is something different than telling OctoPrint over and over to do something that it cannot do, but has to spend resources to try to do each time.

1 Like

Anything that just GETs an API endpoint should be no issue (as long as it doesn't start doing that excessively = hundreds of times per second). POSTing repeatedly is a code smell - if you need to do that from client code you should have a really good reason as POST requests can and will have side effects, e.g. "create connection", "upload file", "issue tool command" etc.

In general, there's a websocket available which can be used to get all the constant status updates pushed from the server instead of having to pull it over and over again. That is what the default UI utilizes as well. If a client needs to keep really up to date with the server backend, it should use that instead of aggressive pulling.

So, best practice: limit pulling requests, don't keep requesting all data again and again but limit it to stuff that changed, use the websocket to only query stuff if needed (e.g when you get an event pushed that the file list changed, refresh the file list, otherwise don't)

Thank you! I will have a look! I'm not really a dev anymore as my professional focus is more on security but I think this is nothing I could not get my feet wet with now that I even fixed that Go stuff so far :rofl:
Thank you for providing the references and I'll see what I can do.