Stuck on "Connecting" even after actually connecting

What is the problem?

Most of the time, after disconnecting from my printer, the connection process freezes on reconnect. From all outward appearances, it looks like a connection has been made (Terminal says it's connected and provides updates from the printer, temperature graph updates), but the state of the connection never updates. I can't start a print or control the printer.


Serial logs show that when the connection is successfully made, there is a handshake attempt after a connection attempt, a few GCODE commands are sent and the status is updated from "Detecting serial connection" to "Operational"

2024-01-25 18:42:33,264 - Changing monitoring state from "Offline" to "Detecting serial connection"
2024-01-25 18:42:33,475 - Performing autodetection with 7 port/baudrate candidates: /dev/ttyACM0@115200, /dev/ttyACM0@250000, /dev/ttyACM0@230400, /dev/ttyACM0@57600, /dev/ttyACM0@38400, /dev/ttyACM0@19200, /dev/ttyACM0@9600
2024-01-25 18:42:33,476 - Trying port /dev/ttyACM0, baudrate 115200
2024-01-25 18:42:33,498 - Connecting to port /dev/ttyACM0, baudrate 115200
2024-01-25 18:42:33,609 - Handshake attempt #1 with timeout 2.0s
2024-01-25 18:42:33,622 - Connected to: Serial<id=0x71a1f9a0, open=True>(port='/dev/ttyACM0', baudrate=115200, bytesize=8, parity='N', stopbits=1, timeout=2.0, xonxoff=False, rtscts=False, dsrdtr=False), starting monitor
2024-01-25 18:42:33,626 - Send: N0 M110 N0*125
2024-01-25 18:42:33,635 - Recv: ok
2024-01-25 18:42:33,667 - Changing monitoring state from "Detecting serial connection" to "Operational"
2024-01-25 18:42:33,710 - Send: N0 M110 N0*125
2024-01-25 18:42:33,792 - Recv: ok
2024-01-25 18:42:33,803 - Send: N1 M115*39

Safemode serial logs show that, during the failure, a connection was completed. However, there is no handshake attempt, and the status is updated from "Detecting serial connection" to "Connecting"

2024-01-25 18:43:40,992 - Changing monitoring state from "Offline" to "Opening serial connection"
2024-01-25 18:43:41,007 - Connecting to port /dev/ttyACM0, baudrate 115200
2024-01-25 18:43:41,031 - Changing monitoring state from "Opening serial connection" to "Connecting"
2024-01-25 18:43:41,036 - Connected to: Serial<id=0x6f0e6a48, open=True>(port='/dev/ttyACM0', baudrate=115200, bytesize=8, parity='N', stopbits=1, timeout=10.0, xonxoff=False, rtscts=False, dsrdtr=False), starting monitor
2024-01-25 18:43:41,588 - Recv: X:0.00 Y:220.00 Z:31.20 E:3670.10 Count X:0 Y:17512 Z:12543
2024-01-25 18:43:41,746 - Recv: Not SD printing
2024-01-25 18:43:42,615 - Recv:  T:16.21 /0.00 B:16.51 /0.00 @:0 B@:0
2024-01-25 18:43:42,733 - Recv: Not SD printing
2024-01-25 18:43:43,734 - Recv: Not SD printing
2024-01-25 18:43:44,609 - Recv:  T:16.24 /0.00 B:16.62 /0.00 @:0 B@:0
2024-01-25 18:43:44,733 - Recv: Not SD printing
2024-01-25 18:43:45,734 - Recv: Not SD printing

What did you already try to solve it?

Refreshing browser: Still stuck on connecting
Opening WebUI in Incognito mode: Shows stuck on connecting.
Disconnect and reconnect: Off and on successes, mostly off.
Disconnect and wait for a few hours before reconnecting: 75% effective
Restart Octoprint: Connects on restart, if disconnected, won't reconnect reliably
Restart Octoprint in safe mode: Connects on restart, if disconnected, won't reconnect reliably
Restart system: Connects on startup, if disconnected, won't reconnect reliably
Power cycling the printer and reconnecting: (Just started trying this moments ago) seems to reliably connect....

Have you tried running in safe mode?

Yes, problem persists

Did running in safe mode solve the problem?

No, seeing exact same problem.

Systeminfo Bundle

octoprint-systeminfo-20240125165725.zip (179.1 KB)

A failure captured in serial on safe mode
serial (1).log (3.1 KB)

A success on restart captured in serial logs, and a later failure on disconnect/reconnect on safe mode
serial (2).log (14.1 KB)

Syslog

Additional information about your setup

OctoPrint Version: 1.9.3
OctoPi version: Build 2023.07.18.111854 with "webcamd", based on OctoPi 1.0.0, running on Raspberry Pi 3 Model B Rev 1.2

Ender 3 Pro, SKR Mini E3 V3

Custom Marlin bugfix-2.1.x Build (just printing parameters for max temp, z-probe, etc.... Nothing connection-y)
Chrome Version 120.0.6099.225 (Official Build) (64-bit) on Windows 11

This only surfaced in the last few days. I've been running on that firmware and this version of octopi/octoprint for months.

7 "Serial" USB ports?

Is that correct?

No, just one. Are you referring to

Performing autodetection with 7 port/baudrate candidates: .....

If so, my read there is that's the cross product of ports and possible baudrates, which in itself is weird, because I have the baudrate set (Not auto detect).

Unless you are seeing something else that suggests 7 ports that I'm missing (which is totally possible, first time needing to put effort into debugging Octoprint.)

You are correct. I missed the @baudrate portion.

That said, the initial 115K connection is failing when it should not. Assuming no kernel problems that leads to a possible issue with the cable (or even backpowering?) or damaged port.

Do you have anything else plugged into the PI? Have you tried a different USB port in the Pi?

Only other item plugged in is a webcam.
And I said 1 port above, but it's a Pi, 4 physical ports.
I tried switching ports. That caused a successful auto-connection, but once again, cannot disconnect and reconnect via the UI
Can you help me understand how you see that 115k baud connection is failing? My read is that it seems to form a connection that transmits data, but Octoprint never runs the handshake. To be more clear, it seems that they are communicating just fine, but Octoprint isn't "holding up" it's end of the deal by formalizing the connection. There doesn't seem to be any "Failure" in the logs, just no indication that Octoprint actually attempted a handshake, which would be in logs.

Also can confirm I'm not backpowering. Double checked jumpers and voltages on the board.

If I look at my logs, the port is opened at 115K and no subsequent attempts are made made to open it again (it is already open).
Then the initial request of info from the printer transpires.

From your log the port cannot be opened at 115k so Octoprint requests opening the port at other baud rates.

Remember that Octoprint itself does not handle the port. Port handling is done by the kernel which establishes the hardware as a file type I/O device. All Octoprint does is open the port, much like a file, then send or receive data to it. Octoprint does not really use serial handshaking (e.g. Xon-Xoff, RTS/CTS, etc.) It merely pipes data to and from the presumably open port once the "file" is opened. Your copy of Octoprint is not getting that far.

Note that there are "advanced" serial settings in Octoprint you could try such as increasing the initial connection timeout in case your printer is not ready for data once the port is opened. You may want to increase those values a bit to see if that improves things.

You could try test the port by installing "cu" (sudo apt install cu) on your Pi and try to talk to the port directly. That too may fail if Octoprint cannot open it too and could help you debug the issue.

That the port is not being opened at 115K upon attempt or at other baud rates indicates either a Kernel/hardware support issue (not likely since you are running on the PI), a problem or defect with the USB hardware on your PI, an issue with your USB cable, overloading of the USB chain from other devices, too much reduction in USB voltage or signal (hardware issues again), or a problem with the printer's USB port.

I think I'm still confused. In my logs, when we have a successful connection, I see a handshake attempt:

2024-01-25 18:42:33,609 - Handshake attempt #1 with timeout 2.0s
2024-01-25 18:42:33,622 - Connected to: Serial<id=0x71a1f9a0, open=True>(port='/dev/ttyACM0', baudrate=115200, bytesize=8, parity='N', stopbits=1, timeout=2.0, xonxoff=False, rtscts=False, dsrdtr=False), starting monitor
2024-01-25 18:42:33,626 - Send: N0 M110 N0*125
2024-01-25 18:42:33,635 - Recv: ok
2024-01-25 18:42:33,667 - Changing monitoring state from "Detecting serial connection" to "Operational"

As this is in the logs without a "Recv", this appears to be Octoprint starting the handsake. Not sure what the handshake is, but it's certainly says its doing one. Looks like it's setting a line number, per the M110 call, probably as to initialize commands to the printer.
But when it fails to update to "Connect", I see that a connection is made at 115200, but no handshake attempt is made, and the Status is changed to "Connecting". It also begins reading from the printer's port. I receive telemetry from the printer. So, my read is that there is successful connection established, but there is no handshake attempt and no attempt to initialize commands, and that leads to a stuck status of "Connecting" even though it is actually connected and receiving data from the serial port. This all happens in half a second, so it's doesn't look like a timeout problem.

2024-01-25 18:43:41,031 - Changing monitoring state from "Opening serial connection" to "Connecting"
2024-01-25 18:43:41,036 - Connected to: Serial<id=0x6f0e6a48, open=True>(port='/dev/ttyACM0', baudrate=115200, bytesize=8, parity='N', stopbits=1, timeout=10.0, xonxoff=False, rtscts=False, dsrdtr=False), starting monitor
2024-01-25 18:43:41,588 - Recv: X:0.00 Y:220.00 Z:31.20 E:3670.10 Count X:0 Y:17512 Z:12543
2024-01-25 18:43:41,746 - Recv: Not SD printing
2024-01-25 18:43:42,615 - Recv:  T:16.21 /0.00 B:16.51 /0.00 @:0 B@:0

Semantics are getting in the way. Serial communication can perform hardware handshaking (CTS/RTS) and/or software handshaking (Xon/Xoff) to verify the data being sent back and forth. This happens at a "lower" level so to speak and is usually not visible to the communicating devices. That is what I am referring to.

Once communication is established the higher level processes begin to exchange data, which is what you are referring to. At that point in time communication of sorts has already been established after the ports have been opened/accessed.

In your original log the opening of the ports permitting the initial exchange of data with Octoprint is not happening. That points to an issue having nothing to do with Octoprint but with the Kernel or hardware.

You stated:

"I receive telemetry from the printer. So, my read is that there is successful connection established, but there is no handshake attempt and no attempt to initialize commands, and that leads to a stuck status of "Connecting" even though it is actually connected and receiving data from the serial port."

From that statement it appears that your PI is reading data but is incapable of sending data to the printer. Check your cables or ports.

Let me open up by saying how much I appreciate the time you are taking here, and admit my "software" knowledge is way better than my hardware knowledge. And I might be a guy whose only tool is a software hammer, and every problem looks like a software nail, but this still feels software-y to me.

I agree that we're arguing symantics, but I want to make sure we're using the same terminology. The handshake I'm referring to from here on out I'll call a "Octoprint Handshake", which appears to just be an attempt to reset the command line number and get a response from the printer via M110 to verify a successful connection (a quick read of source seems to confirm this). I'll refer to physical connections and logical connections as well. Everything I'm seeing suggests that the physical connection is fine.

A few things here make me question the theory that the physical connection is bad:

  1. When I successfully connect, I'm able to print just fine. Resend ratio is "0%".
  2. I would expect to see an attempt to Send in logs if it was a physical failure, but it doesn't appear to be writing anything to the line, and if a send occurred and failed or timed out, I'd also expect to see that error in either serial or system logs, but nothing there as well.
  3. Other people who do indeed have a bad physical connection on here report that the state changes to some sort of error. Mine never changes from "Connecting". I see the same process occur if I throw a fatal error on the printer while keeping it powered on. It gets an error when it attempts the handshake on initial connection, but just hangs
  4. I always get a successful connection after restarting the Octoprint service while the host machine remains running, which should have no impact on the quality of a physical connection. I could come up with a hardware story related to it succeeding on machine reboot, but not for it succeeding on service reboot.

I cu'd into the port at 115200, and was able to submit commands consistently and receive responses, even when Octoprint fails to complete the connection process. I've also changed USB cables and ports. Seeing the exact same symptoms.

On that note, I'm going to do some digging into the source code to see if I can verify that I should be seeing a "Octoprint Handshake", and verify that I should be seeing sends and errors when it attempts to connect.

Not necessarily.

Remember that Octoprint is not doing more than attempting to open a device type file to initiate the subsequent exchange of data (what you are calling the Octoprint "handshake").

The problem is that Octoprint doesn't get that far from what I saw in your log. It is unable to open the "device" which is beyond Octoprint itself. The opening of the port "device" should then be followed by an attempt to send and receive data from the printer... that is not occurring. Once it fails, Octoprint then retries to open the port at a different baud rate, hence the 7 attempts. Unless the timeout value for the serial port is way too short or the baud rate too high, that should not happen. 115k is normally fairly safe.

One other thing that can affect this beyond a kernel/device or hardware problem is a permissions issue.

If the Octoprint "user" (octoprint merely runs as another "user" of the computer) does not have read and write, "rights" to the group and/or port then you can get a similar result.

You say that sometimes you get a successful connection and can even print with a zero percent resend ratio. So what is different between then and when Octoprint cannot connect? Does it successfully connect after a full power off and restart when all power is removed?

Okay, I think I see where we're missing each other. In the original post, I mentioned that it works 100% of the time on what I'll now call a "Fresh" connection with the Octoprint service. That is by power cycling the printer, doing a full restart on the host, or doing a service restart of Octoprint. All of those cause the Octoprint handshake (test the line for port/baudrate candidates, reset the line number via M110, receive a response, all seen in logs) to occur on the next connection attempt, and that leads to a fully successful logical Octoprint connection. Octoprint says I'm connected, and I can print. After I started messing with the USB cable, I also noticed that unplugging and replugging in the cable caused a handshake and a successful connection.
The fact that restarting the Octoprint service leads to a success screams "software" to me. If it was indeed a hardware issue, restarting the service would still lead to failure.
This fails when I kick off a Octoprint disconnect (click the disconnect button) and then attempt to reconnect by clicking the connect button. In that case, the connection is "set up" and it begins reading from the line, but it never tries to send any data to the printer. It just hangs there reading from the line, stuck on "Connecting" In my original post, I mentioned that if I waited a few hours it sometimes kicked back in, but I haven't been able to reproduce that. Octoprint Reconnection is failing 100% of the time right now. I also mentioned that this was new after using this for months, but it occurs to me that I probably never tried to disconnect and reconnect. I was messing around with OctoRelay (it's been uninstalled, and all of this occurs in safe mode) which has a setting to automatically disconnect on print completion. The relay was not actually hooked up to the printer, I was just testing it out, so the printer remained powered on. That's probably the first time I've ever had to try and reconnect on v1.3.x.

I read through the Connection source code last night, and there is indeed a separate code path for initial connection and a reconnect (https://github.com/OctoPrint/OctoPrint/blob/a93b4d397248dc406a75e634c1326fb1b8ddcb63/src/octoprint/util/comm.py#L2240 and https://github.com/OctoPrint/OctoPrint/blob/a93b4d397248dc406a75e634c1326fb1b8ddcb63/src/octoprint/util/comm.py#L3368). Haven't wrapped my head around exactly what they are doing.

The only evidence I have for this not being a software bug is that.... no one else is getting it but me....

I would agree, reading through your post that it sometimes works almost rules out permissions & hardware.

If you've worked out that it works on a service restart, that means we can narrow it down quite a lot. Based on this, I would suggest that when you restart OctoPrint, it is using 'Auto' to find the serial port, then after that it is just trying the known working combo. So my question would be is it related to using auto detection vs. manually setting the serial port in the dropdown? Or still service restart based?

I have not reviewed the code.

If I hot reboot my PI's I will sometimes see that printer offline but clicking on "connect" invariably always works no matter if I have the settings on Auto or manual.

The logs show a successful opening of the port upon the very first attempt followed by the exchange of data with the printer each time I hit "connect".

Have you increased the timeout values in the "advanced" section of the serial tab?

There's no evidence in the log of OctoPrint sending the M110 "hello" command, so I don't think increasing the timeout would help as the printer is never going to respond regardless of how long you wait.

So, you've got a good connection when you hot reboot and connect. What happens if you then disconnect via Octoprint and reconnect?

What if the printer is taking a while to re-init after startup causing the Pi to fail to open the port.

Doesn't the timeout help with that.

It works fine each time.

The Pi looks to be opening the port fine, since there is the line in the serial log 'Connected to: XXX', which means that the port is in use by OctoPrint. The "Opening serial connection" stage is generally dependent on the OS/Hardware side, then "Connecting" is while OctoPrint is trying to do it's "hello" handshake to ensure this is a serial port that belongs to a printer talking gcode.

The log goes for at least 10 seconds, this should be long enough for the printer to reset. The SKR Mini E3 V3 board should not need to reset on connect, as I believe this is a 32 bit board - the older 8 bit ones normally reset on connection. The timeouts are for the time difference between OctoPrint sending a command and getting an acknowledgement, but the first is just not happening for some reason.

Yes I see. The 7 queued attempts were from the "autoconnect" option, not an actual attempt.

Thanks.