Communication timeout error after upgrading to 1.3.9


#1

What is the problem?
Attempted to print a calibration tower and it failed with a communication timeout error. Problem occurred just after updating to OctoPrint 1.3.9. Log seems to indicate it could be related to the Exclude Region plug-in.

Can you provide any insight as to what may be causing this? Thanks!

What did you already try to solve it?
Rebooted OctoPrint.

Additional information about your setup (OctoPrint version, OctoPi version, printer, firmware, octoprint.log, serial.log or output on terminal tab, ...)

OctoPrint version: 1.3.9
OctoPi version: 0.15.1
Printer: Prusa i3 MK3
Printer Firmware: 3.3.1-845
Pi: Raspberry Pi 3B+
Camera: Raspberry Pi Camera Module v2

OctoPrint Log (last few lines before failure):

2018-07-26 20:41:07,522 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 92.6, 'old': 93.0}
2018-07-26 20:41:14,460 - octoprint.util.comm - INFO - Got a resend request from the printer: requested line = 8759, current line = 8766
| Last lines in terminal:
| Recv: ok
| Send: N8759 G1 E0.80000 F2100.00000*35
| Recv: ok
| Send: N8760 M204 S800*87
| Recv: ok
| Send: N8761 G1 F900*79
| Recv: ok
| Send: N8762 G1 X137.275 Y117.275 E0.00609*84
| Send: N8763 G1 X112.725 Y117.275 E0.83099*86
| Recv: echo:busy: processing
| Recv: echo:busy: processing
| Recv: echo:busy: processing
| Recv: ok
| Send: N8764 M105*26
| Recv: Full RX Buffer
| Recv: tmc2130_goto_step 3 28 2 1000
| Recv: ok
| Send: N8765 G1 X112.725 Y92.725 E0.83099*108
| Recv: Error:Line Number is not Last Line Number+1, Last Line: 8758
| Recv: Resend: 8759
2018-07-26 20:41:14,500 - octoprint.util.comm - INFO - Ignoring resend request for line 8759, that still originates from lines we sent before we got the first resend request
2018-07-26 20:41:14,597 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E-0.04000 F2100.00000
2018-07-26 20:41:14,598 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E-0.04000 F2100.00000, returnCommands=[u'G1 E-0.04000 F2100.00000']
2018-07-26 20:41:14,615 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 93.2, 'old': 92.6}
2018-07-26 20:41:14,621 - octoprint.plugins.excluderegion - INFO - Event received: event=Dwelling payload=None
2018-07-26 20:41:14,635 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 92.8, 'old': 93.2}
2018-07-26 20:41:18,694 - octoprint.server.heartbeat - INFO - Server heartbeat <3
2018-07-26 20:41:21,528 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E0.80000 F2100.00000
2018-07-26 20:41:21,530 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E0.80000 F2100.00000, returnCommands=[u'G1 E0.80000 F2100.00000']
2018-07-26 20:41:21,665 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E-0.04000 F2100.00000
2018-07-26 20:41:21,666 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E-0.04000 F2100.00000, returnCommands=[u'G1 E-0.04000 F2100.00000']
2018-07-26 20:41:21,687 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 93.4, 'old': 92.8}
2018-07-26 20:41:21,692 - octoprint.plugins.excluderegion - INFO - Event received: event=Dwelling payload=None
2018-07-26 20:41:21,703 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E0.80000 F2100.00000
2018-07-26 20:41:21,705 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E0.80000 F2100.00000, returnCommands=[u'G1 E0.80000 F2100.00000']
2018-07-26 20:41:21,713 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 93.0, 'old': 93.4}
2018-07-26 20:41:28,607 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E-0.04000 F2100.00000
2018-07-26 20:41:28,608 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E-0.04000 F2100.00000, returnCommands=[u'G1 E-0.04000 F2100.00000']
2018-07-26 20:41:28,631 - octoprint.plugins.excluderegion - INFO - Event received: event=Dwelling payload=None
2018-07-26 20:41:28,636 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 93.6, 'old': 93.0}
2018-07-26 20:41:28,647 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E0.80000 F2100.00000
2018-07-26 20:41:28,648 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E0.80000 F2100.00000, returnCommands=[u'G1 E0.80000 F2100.00000']
2018-07-26 20:41:28,661 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 93.2, 'old': 93.6}
2018-07-26 20:41:35,705 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E-0.04000 F2100.00000
2018-07-26 20:41:35,707 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E-0.04000 F2100.00000, returnCommands=[u'G1 E-0.04000 F2100.00000']
2018-07-26 20:41:35,735 - octoprint.plugins.excluderegion - INFO - Event received: event=Dwelling payload=None
2018-07-26 20:41:35,752 - octoprint.plugins.excluderegion - WARNING - Encountered recovery without a corresponding retraction: G1 E0.80000 F2100.00000
2018-07-26 20:41:35,754 - octoprint.plugins.excluderegion - INFO - recovery: excluding=False, originalCmd=G1 E0.80000 F2100.00000, returnCommands=[u'G1 E0.80000 F2100.00000']
2018-07-26 20:41:35,758 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 93.8, 'old': 93.2}
2018-07-26 20:41:35,771 - octoprint.plugins.excluderegion - INFO - Event received: event=ZChange payload={'new': 93.4, 'old': 93.8}
2018-07-26 20:41:53,810 - octoprint.util.comm - INFO - No response from printer after 6 consecutive communication timeouts, considering it dead.
2018-07-26 20:41:53,833 - octoprint.plugins.excluderegion - INFO - Event received: event=Error payload={'reason': 'timeout', 'error': 'Too many consecutive timeouts, printer still connected and alive?'}
2018-07-26 20:41:53,834 - octoprint.plugins.excluderegion - INFO - Printing stopped
2018-07-26 20:41:53,845 - octoprint.plugins.excluderegion - INFO - Event received: event=UpdatedFiles payload={'type': 'gcode'}
2018-07-26 20:41:53,857 - octoprint.util.comm - INFO - Changing monitoring state from "Printing" to "Offline (Error: Too many consecutive timeouts, printer still connected and alive?)"
2018-07-26 20:41:53,887 - octoprint.plugins.excluderegion - INFO - Event received: event=Disconnected payload=None
2018-07-26 20:41:53,894 - octoprint.plugins.excluderegion - INFO - Event received: event=PrinterStateChanged payload={'state_string': 'Offline', 'state_id': 'OFFLINE'}
2018-07-26 20:41:53,929 - octoprint.plugins.excluderegion - INFO - Event received: event=MetadataStatisticsUpdated payload={'path': u'pla-mk-3-ecor-tower.gcode', 'storage': 'local'}
2018-07-26 20:41:56,445 - octoprint.plugins.excluderegion - INFO - Event received: event=PrintFailed payload={'origin': 'local', 'name': u'pla-mk-3-ecor-tower.gcode', 'file': u'/home/pi/.octoprint/uploads/pla-mk-3-ecor-tower.gcode', 'time': 3557.72283411026, 'path': u'pla-mk-3-ecor-tower.gcode', 'size': 245309L, 'filename': u'pla-mk-3-ecor-tower.gcode'}
2018-07-26 20:41:56,446 - octoprint.plugins.excluderegion - INFO - Printing stopped

Serial Log (last few lines before failure):

2018-07-26 20:41:35,694 - Send: N8822 G1 X137.275 Y117.275 E-0.04401*124
2018-07-26 20:41:35,699 - Send: N8823 G1 F8160*112
2018-07-26 20:41:35,700 - Recv: ok
2018-07-26 20:41:35,714 - Recv: ok
2018-07-26 20:41:35,722 - Recv: ok
2018-07-26 20:41:35,723 - Send: N8824 G1 X134.346 Y117.275 E-0.71599*122
2018-07-26 20:41:35,737 - Recv: ok
2018-07-26 20:41:35,734 - Send: N8825 G1 E-0.04000 F2100.00000*6
2018-07-26 20:41:35,743 - Recv: ok
2018-07-26 20:41:35,747 - Send: N8826 G1 Z93.800 F10200.000*49
2018-07-26 20:41:35,748 - Recv: ok
2018-07-26 20:41:35,753 - Send: N8827 G4 S0*91
2018-07-26 20:41:35,761 - Send: TMC_SET_STEP_E932
2018-07-26 20:41:35,764 - Send: N8828 G1 Z93.400 F10200.000*51
2018-07-26 20:41:35,767 - Send: N8829 G1 E0.80000 F2100.00000*43
2018-07-26 20:41:35,769 - Send: N8830 M204 S800*93
2018-07-26 20:41:35,779 - Recv: Full RX Buffer
2018-07-26 20:41:53,812 - No response from printer after 6 consecutive communication timeouts, considering it dead. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
2018-07-26 20:41:53,855 - Changing monitoring state from "Printing" to "Offline (Error: Too many consecutive timeouts, printer still connected and alive?)"
2018-07-26 20:41:53,872 - Connection closed, closing down monitor

#2

To isolate the problem, I would first of all disable the plugin, restart Octoprint and try a new print to see whether it still occurs.


#3

As @BerndJM suggested, please try if that reproduces in safe mode. Also please share the full serial.log - I cannot analyse what happened here just based on that excerpt :confused:


#4

Thanks for your responses. I tried to print the same gcode in Safe Mode but got the same timeout error, so it doesn't appear the issue is related to a plugin.

I have attempted to share the full serial.log file, but unfortunately it's too big to put in a post unless I break it up into multiple posts. Any suggestions on how I can best share it?


#5

Attaching the file to a post instead of pasting it into the post should work.


#6

Thank you! (Didn't know you could attach files. :grinning:) The octoprint and serial logs are attached. Both were recorded in safe mode.

octoprint_safemode.log (32.8 KB)
serial_safemode.log (309.4 KB)


#7

So one thing that somewhat jumps to eye in your serial.log is that your printer somehow from somewhere produced an extra ok right at the beginning:

2018-07-27 06:25:10,635 - Send: N24 M105*17
2018-07-27 06:25:10,647 - Recv: ok T:214.7 /215.0 B:60.4 /60.0 T0:214.7 /215.0 @:37 B@:0 P:39.8 A:40.5
2018-07-27 06:25:10,654 - Send: N25 G21*45
2018-07-27 06:25:10,664 - Recv: Unknown G code: $25 G21
2018-07-27 06:25:10,665 - Recv: ok                                                      <- this belongs to N25
2018-07-27 06:25:10,671 - Send: N26 G90*36
2018-07-27 06:25:10,675 - Recv: ok                                                      <- N26
2018-07-27 06:25:10,680 - Send: N27 M83*45
2018-07-27 06:25:10,688 - Recv: ok                                                      <- N27
2018-07-27 06:25:10,693 - Send: TMC_SET_WAVE_E0                                         <- what triggers this output? M83?
2018-07-27 06:25:10,697 - Send: N28 G1 E-0.80000 F2100.00000*7
2018-07-27 06:25:10,701 - Recv: tmc2130_set_wave 3 0
2018-07-27 06:25:10,702 - Recv:  factor: 0.000
2018-07-27 06:25:10,745 - Recv: ok                                                      <- N28?
2018-07-27 06:25:10,750 - Send: N29 G1 Z0.600 F10200.000*10
2018-07-27 06:25:10,751 - Recv: ok                                                      <- N29?
2018-07-27 06:25:10,762 - Recv: ok                                                      <- but if so, where does THIS come from?
2018-07-27 06:25:10,764 - Send: N30 G4 S0*93
2018-07-27 06:25:10,769 - Send: TMC_SET_STEP_E0
2018-07-27 06:25:10,774 - Send: N31 G1 X108.811 Y90.972 F10200.000*100
2018-07-27 06:25:10,868 - Recv: ok                                                      <- N30?
2018-07-27 06:25:10,873 - Send: N32 G1 Z0.200 F10200.000*4
2018-07-27 06:25:10,875 - Recv: tmc2130_goto_step 3 0 2 1000
2018-07-27 06:25:10,885 - Recv: ok                                                      <- N31?
2018-07-27 06:25:10,889 - Send: N33 G1 E0.80000 F2100.00000*32
2018-07-27 06:25:10,896 - Recv: ok                                                      <- N32?
2018-07-27 06:25:10,901 - Send: N34 M204 S1000*96
2018-07-27 06:25:10,902 - Recv: ok                                                      <- N33?
2018-07-27 06:25:10,907 - Recv: ok                                                      <- N34?
2018-07-27 06:25:10,909 - Send: N35 G1 F1800*113
2018-07-27 06:25:10,914 - Recv: ok                                                      <- N35?
2018-07-27 06:25:10,919 - Send: N36 G1 X109.404 Y89.993 E0.03588*107
2018-07-27 06:25:10,924 - Send: N37 G1 X110.339 Y89.142 E0.03964*97
2018-07-27 06:25:10,928 - Recv: ok                                                      <- N36?
2018-07-27 06:25:10,936 - Send: N38 G1 X110.994 Y88.805 E0.02309*104
2018-07-27 06:25:10,936 - Recv: ok                                                      <- N37?
2018-07-27 06:25:10,945 - Send: N39 G1 X112.038 Y88.541 E0.03376*96
2018-07-27 06:25:10,948 - Recv: ok                                                      <- N38?
2018-07-27 06:25:10,954 - Send: N40 G1 X137.949 Y88.544 E0.81241*108
2018-07-27 06:25:10,957 - Recv: ok                                                      <- N39?
2018-07-27 06:25:10,961 - Send: N41 G1 X138.746 Y88.705 E0.02550*104
2018-07-27 06:25:10,964 - Recv: ok                                                      <- N40?
2018-07-27 06:25:10,968 - Send: N42 G1 X139.488 Y89.033 E0.02545*108
2018-07-27 06:25:10,969 - Recv: ok                                                      <- N41?
2018-07-27 06:25:10,976 - Send: N43 G1 X140.530 Y89.926 E0.04302*111
2018-07-27 06:25:10,977 - Recv: ok                                                      <- N42?
2018-07-27 06:25:10,983 - Send: N44 G1 X141.163 Y90.910 E0.03669*105
2018-07-27 06:25:10,986 - Recv: ok                                                      <- N43?
2018-07-27 06:25:10,990 - Send: N45 G1 X141.459 Y92.038 E0.03655*106
2018-07-27 06:25:10,994 - Recv: ok                                                      <- N44?
2018-07-27 06:25:10,998 - Recv: ok                                                      <- N45?
2018-07-27 06:25:11,001 - Send: N46 G1 X141.459 Y117.912 E0.81126*93
2018-07-27 06:25:11,006 - Send: N47 G1 X141.189 Y119.028 E0.03602*81
2018-07-27 06:25:11,175 - Recv: ok                                                      <- N46?
2018-07-27 06:25:11,180 - Send: N48 G1 X140.596 Y120.007 E0.03588*83
2018-07-27 06:25:11,389 - Recv: ok                                                      <- N47?
2018-07-27 06:25:11,394 - Send: N49 G1 X139.661 Y120.858 E0.03964*91
2018-07-27 06:25:11,529 - Recv: ok                                                      <- N48?
2018-07-27 06:25:11,537 - Send: N50 G1 X139.006 Y121.195 E0.02309*93
2018-07-27 06:25:11,745 - Recv: ok                                                      <- N49?
2018-07-27 06:25:11,750 - Send: N51 G1 X137.962 Y121.459 E0.03376*85
2018-07-27 06:25:11,810 - Recv: ok                                                      <- N50?
2018-07-27 06:25:11,814 - Send: N52 G1 X112.088 Y121.459 E0.81126*81
2018-07-27 06:25:11,855 - Recv: ok                                                      <- N51?
2018-07-27 06:25:11,860 - Send: N53 G1 X110.972 Y121.189 E0.03602*93
2018-07-27 06:25:11,900 - Recv: ok                                                      <- N52?
2018-07-27 06:25:11,905 - Send: N54 G1 X109.993 Y120.596 E0.03588*87
2018-07-27 06:25:11,946 - Recv: ok                                                      <- N53?
2018-07-27 06:25:11,951 - Send: N55 G1 X109.142 Y119.661 E0.03964*93
2018-07-27 06:25:11,969 - Recv: ok                                                      <- N54?
2018-07-27 06:25:11,974 - Send: N56 G1 X108.805 Y119.006 E0.02309*82
2018-07-27 06:25:12,002 - Recv: ok                                                      <- N55?
2018-07-27 06:25:12,007 - Send: N57 G1 X108.541 Y117.962 E0.03376*82
2018-07-27 06:25:12,875 - Recv: ok                                                      <- N56?
2018-07-27 06:25:12,879 - Send: N58 M105*26
2018-07-27 06:25:12,900 - Recv: ok                                                      <- N58?
2018-07-27 06:25:12,905 - Send: N59 G1 X108.541 Y92.088 E0.81126*96
2018-07-27 06:25:12,925 - Recv: ok                                                      <- N59?
2018-07-27 06:25:12,933 - Send: N60 G1 X108.797 Y91.030 E0.03414*109
2018-07-27 06:25:12,938 - Recv: ok T:214.5 /215.0 B:60.2 /60.0 T0:214.5 /215.0 @:39 B@:16 P:39.9 A:40.6 <- no, THIS is N58!

See my annotations to the right. Around N29 at least one extra ok suddenly shows up. There's no command triggering it (for preceding commands all oks are accounted for). The point where this extra shows up coincides with the occurrence of that TMC_SET_WAVE_E0 output. By the time the next M105 gets triggered and replied to, it looks like there are already two extra oks. Scrolling further through the log, that number increases. Sooner or later, this causes OctoPrint to have too many "yep, please send the next line" flags that it overwhelms the printer's RX buffer. Further through the print, that gets so severe that communication breaks down.

So ... update of OctoPrint or not, something's up with this firmware. There shouldn't be any extra oks injected into the serial communication, and them being present causes havock with the serial communication.

I've so far never seen this apparently out-of-band TMC output before, and it being unprompted by a command sent to the printer makes me suspicious that the firmware might accidentally send extra oks with this output when it's generated (we've had this in the past with Prusa firmware in a different case if my memory isn't betraying me).

Based on what I see in this serial.log there, OctoPrint is behaving correctly and the firmware's extra oks are at fault for the communication issues ultimately leading to a break down and timeout. And I can't fix that. If the firmware here is really injecting extra oks, that needs to be fixed in the firmware by the firmware vendor. So I suggest you open a ticket with them. Feel free to point to the analysis above and also tag me in it.


#8

Thank you Gina! I appreciate the thorough analysis.

The gcode causing the communication timeouts is the extruder linearity correction calibration tower, from the Prusa firmware GitHub (https://github.com/prusa3d/Prusa-Firmware/wiki/Extruder-linearity-correction-calibration). It uses the TMC commands. So are you saying that the latest Prusa firmware simply doesn't play nice with this particular gcode - because of the TMC comands? Or do you suspect an issue with the firmware in general, with all model gcode?

I ask because I've been able to successfully print other gcode (for non-calibration models) using OctoPrint (in normal mode) without an issue. Inspection of the serial logs for those prints didn't show any extra oks.

Thanks!


#9

:woman_facepalming: I'm sorry... for some reason I saw this TMC stuff as output instead of it being sent by OctoPrint to the firmware.

Now that of course explains a lot. This isn't a valid GCODE command (defined as either G, M or T followed by a decimal number). So it's an unknown command for OctoPrint. By default, OctoPrint will not expect unknown commands to have to be acknowledged by the firmware (since there are firmwares out there like e.g. SmoothieWare which support non-GCODE commands sent to control more sophisticated functionality like underlying configuration and such which are handled in a different way and hence don't return an ok and also must not be checksummed and numbered). Hence it will not consume an ok for such a command sent to the printer. But since the printer does in fact return an ok for those commands after all it will lead to stuff getting out of sync.

Temporary fix: edit your config.yaml to have it set serial.unknownCommandsNeedAck to true:

serial:
  unknownCommandsNeedAck: true

Restart OctoPrint and stuff should start working.

Long term the Prusa guys need to switch to proper GCODE commands for these calibration commands. I will not start adding exceptions for vendor specific commands that are also supposed to be treated as GCODE even though they in fact aren't. If I start doing this for one vendor, I'll have to do it for all and managing this flee circus of a firmware landscape is already tricky enough as is without adding this on top of everything.


#10

Thank you Gina! Good to know.