Curious issue with print progress

Nah, if there was I'd already have asked for it :sweat_smile: I'm trying to wrap my head around this but so far I don't have a clue I gotta admit. Especially since I'm not seeing it myself, which is really odd.

I'll try to come up with some way to be able to simulate with your files and see if maybe that way I can see it myself. I went through some of the files that are involved here that could be at fault and looked at the changes between 1.3.12 and 1.4.0 but so far could not spot anything that could be the reason.

Just for the record, the way the gcode viewer maps the progress is that it gets the progress in the file pushed from the socket, then looks that up in the whole model tree (which gets stuffed into an AVL tree for easier mapping by progress) and then fetches the closest leaf and renders until then. Nothing about this mechanism in the frontend was changed. The only way I can see how this could run out of sync with what is actually going over the line to the printer is if the progress being pushed to the frontend was off. That would also reflect in the progress bar but not in the z height reported by plugins hooked into that event that gets generated on sending. So my current guess is that something's off with the progress reporting, that it either gets read wrong or that the socket push is delayed.

If I whipped up a quick plugin to print progress as reported in the backend to a log, would you be willing to throw that on your machine and run a test print with error reproduction for me, @welshjohn? That might give us some insight if it's overall progress reporting that's weird with Cura files, or just something on the socket.

Sure, happy to try anything to get to the bottom of it. Will you post here or email - or something else? Can you see my email address on here or on Patreon?

Would post it here, maybe some of the others would also be willing to gather some data. But can also see your email address :wink:

Very good. Standing by. :slightly_smiling_face:

I see the same behavior as well.

Slicer Cura 4.4 and 4.5 same behavior on same model sliced in both versions
Printer Anet A8 (still has Anet board and a few other Anet parts)
Raspberry Pi 3B
Marlin 1.1.9

One thing that might help is I printed a model that had a pause for a filament change, the printer paused and did what it was supposed to, but the viewer still showed it printing away on the previous layer. On a 23 hr print 2 days ago, the viewer was about 10 layers behind when the print finished, so it definately gets farther behind the longer it prints.

Does that mean it was continuing to show animation as well? Like, you saw the virtual print head move, the lines getting added and so on? If so that would be a strong vote for data delay on the socket!

I'm not sure if it contimued until the layer finished or stopped shortly after the printer paused. I'll rerun that print and let you know the exact behavior after the printer pauses.

As promised I have prepared a plugin that can log print progress in both backend and frontend.

To help debug this issue, please

  1. Install https://github.com/OctoPrint/OctoPrint-Progress-Logger/archive/0.1.0.zip via the Plugin Manager (Get More > from URL). After the restart you'll see a new tab pop up (probably behind the overflow menu):

  1. Start a print (ideally a small one because the logs can be BIG), reproduce the issue and - most importantly - keep the browser open during the whole print (hence also the recommendation to make the print as small as possible while still reproducible).
  2. Once the print has finished, download plugin_progress_logger_output.log through Settings > Logging and copy all that's in the new output on the "Progress Logger Plugin" (you can use the "Copy all" link for that) and paste it on gist.github.com in a new gist.
  3. Upload the downloaded log & share the link to the copied output here. (edit: clarified)

I'm hoping that the progress logs from both backend and frontend will show a drift, which would underline the idea that it's something about the socket that's getting out of sync. If they don't show a drift it's something strictly located in the backend that is causing these issues for some of you. I sadly still haven't been able to reproduce it locally (and I've switched to Cura for good now).

@wrench_bender would also be thrilled about info on your observations during a pause, and if you could take a screen recording of that that would be wonderful. Btw, pausing when the gcode viewer is behind would also be an interesting experiment for you @welshjohn.

Never used Gist before so I hope I have done it correctly. I hope you can see the output at https://gist.github.com/welshjohn60/9fbb4441a3491994f3fa56d510e60702

If you can't I still have both the log file and the copy/paste from the tab so I can try again.

I can confirm the lag was present during this test print.

I had thought about doing a pause after the post from @wrench_bender. I will see if I can do that tomorrow and let you know.

@welshjohn the gist only seems to contain the frontend output, I also need the log from the backend. You should be able to just upload that here (sorry, I wasn't clear on that in my post)

Hi @foosel - no you were very clear and both are there when I access the page. If you scroll to about half way down the very long page you should see the start of the log file. I can also upload them here if that would help?
Sorry for the delay btw, the site is not always alerting me when a new post is made.

Oh, you are right. I scrolled too far and skipped right past the file boundary :woman_facepalming: Apologies... I gotta admit I'm a bit off my usual game these days with everything going on.

Will look into this right away and figure out if I see something or not. Thank you!

Update on the Pause test.

  1. Start a test print
  2. Wait until about half way through the first layer - the gcode viewer is about 5 or 6 lines behind head pos
  3. Click on Pause. The gcode viewer stops immediately. Printer continues to print another 4 or 5 lines (buffer?) then pauses
  4. Click on Resume. The gcode viewer jumps forward but only to around the point where the head was when Pause was clicked. The head begins printing again with the viewer still 4 or 4 lines behind

Not sure if that helps or not?

With @wrench_bender it sounds like the pause was in the gcode. Perhaps that is the next thing to test?

I use chrome on my PC, but I see the lag even on my android galaxy phone (though I guess maybe thats chrome too... sorta)

For me it only effects the gcode viewer, the left side stub shows the correct layer height at all times, including hops and such (within a second of it happening anyway).

So the good news is that this doesn't look like it's an issue with the socket. The percentages and timestamps for both backend and frontend line up pretty much exactly, with a slight clock drift in favor of the frontend. Which means whatever issue this is, it's limited to the processing that happens in the frontend. I'll have to think a bit about a next set of test steps in order to narrow things down further.

1 Like

Further good news: I can reproduce it with the file provided by @welshjohn (initially I didn't dare to send it to my printer but after a quick look through the code it looked compatible enough and it was). Or at least I'm fairly sure I can, as the viewer is currently lagging behind on the first layer by quite a number of infill segments.

Still no idea what's causing this but this makes it easier to investigate.

Agreed - very good news.

I found an old issue on github for the gcodeviewer (https://github.com/hudbrog/gCodeViewer/issues/24), and one difference between the Cura file and others is that by default it uses absolute extrusion rather than relative. However there is a Cura option (in Special Modes) to use relative extrusion. I tested it and it made no difference :pensive: but I thought I would mention it in case it might give a clue to what's going on.

@foosel - I am in lockdown :worried: so let me know if I can help with any more tests/debug/logging

I'm narrowing down on it. Put an M600 after a layer change and checked what byte position inside the file was being pushed when it triggered. And it was several lines off. So something's fishy here with the overall progress reporting after all (which matches the narrative that backend and frontend values are correct - they seem to just both be off).

I noticed that the Cura file has CRLF line endings and the Prusa Slicer as well as my own Cura files have LF. Might be a clue, still looking into it.

And hey, it's not officially called a lockdown here yet, but it comes close and I've practicing voluntary self isolation ever since this started to boil over anyhow, so - lockdown buddies!

Hey lockdown buddy!

Good spot on the CRLF vs just a LF. I have just stripped out all the CRs from the Cura file and ran another test. Perfect - with the viewer running slightly ahead of the print head all the way through. So it looks like the CRs in the Cura files are causing it. I wonder why?

I am sure you are very close!

BTW I am not really on lockdown but I am one of those who the Government "strongly recommend" to stay at home.

Yep, that seems to have been it. And the reason is simply that I have to calculate the position in the file myself from the read lines (due to using a buffered line based reader, which is always a tad ahead). I do this by adding up the lengths of the read lines (byte count). And what I didn't know is that unless I set a specific flag on the file handle opened through io.open, python will fiddle around with the line endings and always return LF, even if the file contains CRLF. Which of course introduces a one-off on the byte count for every single line read from the file for CRLF files, which in turn adds up through the course of a print job and will cause things to lag behind significantly.

And this only started to happen with 1.4.0 because in 1.4.0 I switched the file open method from codecs.open to io.open during the Python 3 migration. I don't know about you but that sounds to me like this mystery can be considered solved.

I've prepped a plugin to work around this for those of you affected by it, you can find it here:

I've tested it with my own printer and with it installed I can no longer reproduce the issue. To install it just throw https://github.com/OctoPrint/OctoPrint-FixCRLFProgressReporting/archive/master.zip into the plugin manager's "Get more... from URL" field. I'm not sure yet if I should register it in the plugin repo to be honest since this will be fixed in 1.4.1 anyhow and it seems to only affect a hand full of people...

What I don't understand though is why Cura does generate LF files for me and CRLF files for you. Weird, but eh, I'll live with that remaining unsolved and be happy about having found a cause and solution for this whackyness :wink:

1 Like