Identify a print job

Kenneth_Jiang · October 24, 2019, 3:26pm

Is there a way to reliably identify specific print job in the plugin?

When I was writing the event handler in my plugin, I realized that it'd be desirable to be able to correlate individual events by the print job. For instance, I'd like to know a PrintFailed event is for the same print job as the PrintCancelled event at some earlier point, and the PrintResumed event at another point earlier.

I'm currently tracking these events using a pretty complicated logic but apparently I missed some of the corner cases. My users have reported mis-identified print jobs.

Ideally I'd want OctoPrint to track print job id and provide it to plugin, probably by adding an id field to the return value of plugin._printer.get_current_job(). Since OctoPrint controls the lifecycle of a print job, it should be able do it much more reliably than the patchy code in my plugin that tries to reverse-engineer the print job's lifecycle.

But I might have missed some API/field that already exits to provide this info.

Any help is really appreciated!

Kenneth

OutsourcedGuru · October 24, 2019, 3:45pm

Wouldn't filename concatenated with the start time suffice for a unique ID?

201908090827_5mmCamRing.gcode

Kenneth_Jiang · October 24, 2019, 3:48pm

No, unfortunately. When a gcode is printed again, it's a different print job but the filename will be the same.

OutsourcedGuru · October 24, 2019, 3:53pm

...and the start time will be different, right?

201908071658_5mmCamRing.gcode

25%20AM

Kenneth_Jiang · October 24, 2019, 4:02pm

Very good point!

Somehow I can't find start_time in the return value of plugin._printer.get_current_job() . Did I miss something obvious?

OutsourcedGuru · October 24, 2019, 5:58pm

I thought it was in the REST API's job query response... but apparently I'm mistaken. The file endpoint contains some of this data, here's a typical json response for a single file query. A call to this will give you an array of information; you're interested in the date field's value.

I do know that each folder/subfolder within ~/.octoprint/uploads contains a hidden .metadata.json file. It includes a variety of information to include a history. Those are likely Linux-style timestamps would would uniquely identify any particular print session.

        "history": [
            {
                "timestamp": 1549577452.725463,
                "printerProfile": "_default",
                "success": false
            },
            {
                "timestamp": 1549577506.415093,
                "printerProfile": "_default",
                "success": false
            },
            ...

Kenneth_Jiang · October 24, 2019, 6:15pm

Thank you @OutsourcedGuru for all these pointers.

Unfortunately print history or stats won't help my situation. What I need is an id right when the print is still on going (or just finished).

jneilliii · October 24, 2019, 8:54pm

You could in theory just create your own id/timestamp outside of the printer's job info using the print start event, and then just reference that internal variable when you are storing information.

Kenneth_Jiang · October 24, 2019, 9:16pm

That's what I did. But it has happened quite frequently that apparently my code mis-identified some prints. My best guess is either some "critical events" such as PrintStarted and PrintCancelled occasionally don't get delivered to my plugin, or there are some bugs deep in my event handler that occasionally cause exceptions.

Since there must already be an Job object somewhere in OctoPrint, I guess it'd be a lot easier and more robust to add a UUID to that object and expose it to plugins...

jneilliii · October 24, 2019, 9:56pm

I don't see why that couldn't be added to the core OctoPrint code. Maybe submit a feature request or PR that adds that functionality. The problem here though is if your code is erroring then there's a chance the same issue could occur if you're depending on that UUID from the get_job_info callback, especially if your events aren't firing.

Kenneth_Jiang · October 24, 2019, 10:30pm

You have got a good point.

However my problem is a bit complicated. Since my plugin has a server component, I have to deal with the possibility that some events get delivered to my plugin but lost between my plugin and the server.

I already built a logic in the server to deal with the possible lost of events, based on the Job UUID the plugin generates However that logic depends on the assumption that events may get lost, but each received event still has the correct Job UUID.

My code would have to be exponentially more complicated if that assumption is no longer true. Whereas if OctoPrint can supply such assumption, both my plugin code and server code would be simpler and more robust because now they only have to deal with uncertainty coming from 1 dimension (event loss).

Hope this makes more sense now.

Kenneth_Jiang · October 24, 2019, 10:46pm

Just opened a github issue.

Thanks again @OutsourcedGuru @jneilliii for your help!

OllisGit · October 26, 2019, 10:11pm

Hi @Kenneth_Jiang,
I looked into your code and now I have some remarks/questions:

The creation of the print_ts (aka UUID) looks good to me (small improvement, use Events.PRINT_STARTED instead of just “PrintStarted“).
You use a Daemon-Thread in the on_after_startup-Hook to start the message communication and to make sure the thread is killed during server-shutdown. I am not a Python-Thread expert, but from my current understanding the child-thread is killed if the parent thread is killed. I think the startup-hook is not called from main-thread, so the question is: what if the startup-thread is killed? Is the message-thread killed too?
You should log the message-payload and the timestamp when ever you send data via web socket def post_printer_status, maybe in a separate log-file. I did this in my DisplayLayerProgress-Plugin (_event_file_logger). I think it is necessary to see that „all messages“ you send have the correct print_ts
Or maybe a litte bit more complicated solution: implement a hand-shake communication and not just fire-and-forget. Or just send an "incremental id" each time to the server, so that the server store this id and compares it next time a new request is received (must be: last id+1). So the server could identify a gap during communication.

Last question: You say: "reported mis-identified print jobs“ do you have more details about this „mis-identified“?

BR
Olli

Kenneth_Jiang · October 27, 2019, 2:53am

I really like your suggestion of logging all the messages to a separate log file. I think I'll do that.

One common symptom of "mis-identified print jobs" is the server receives a message with the print_job_id the same as the previously ended one. This has been happening so often that I have started to log a server error for it. This error has appeared hundreds of times.

I don't think child thread will be killed after parent thread ends. Actually one of the reasons why I chose to have do almost everything in a child thread is so that the parent thread, which is whoever calls on_after_startup, can finish asap. Previously when I run the forever loop in that thread, OctoPrint sometimes exhibited some strange behavior such as not responding to TERM signal to cleanly shut down.

Thanks again for your help @OllisGit

OllisGit · October 27, 2019, 8:49am

I analysed your code a little bit deeper:

The creation of the print_ts in done in the event-thread, after that you send the event-payload via web socket, also in the event-thread.
In parallel you read the print_ts in the message-thread and send the data also with the same web socket-client.
I wrote a test-client to reproduce the issue, but without success. Each print_ts was transferred correct.

So, my assumption is that this combination: two different threads use the same web socket client to send data, creates this issue.
Try to implement a single thread communication. E.g. put each message (event or normal-message) into a queue, then a Websocket client threads picks the message and send it to the server.

Or you can reuse the message-thread, just check if a event-message is present and then send this message:

while True:
	if not self.eventMessagePayload == None:
		self.post_printer_status(self.eventMessagePayload, throwing=True)

But I think the best approach to solve this issue, is to collect more logging-data.

Good luck...and if you find the problem, please let me know!

Kenneth_Jiang · October 27, 2019, 8:13pm

Thank you!

Race condition could be the issue. Although I was hoping Python's ws client is thread-safe, I haven't found a document that officially claims so.

Are you aware any situations when some of the events don't get delivered to plugins, or get delivered multiple times (maybe some retry mechanism)?

Kenneth_Jiang · October 29, 2019, 1:19pm

I now suspect the problem is with the sequence of events for print ending.

Currently I'm assuming these event sequences:

If print finishes successfully:

PrintStarted
...
PrintDone

If print is canceled:

PrintStarted
...
PrintCancelled
PrintFailed

However, this is just based on my observation on my printer. Not documented anywhere. Now I'm wondering if there are situations when event sequences like the following will happen:

PrintStarted
...
PrintFailed
PrintCancelled

PrintStarted
...
PrintCancelled
PrintFailed
PrintDone

PrintStarted
...
PrintCancelled

(No PrintFailed event).

If any of these events happens, it'll result in the bug that my users have reported.

Is the event sequence documented somewhere but I missed it?

OllisGit · October 29, 2019, 4:51pm

I also monitored this "cancel-behaviour" if I press cancel

PrintStarted
...
PrintCancelled
PrintFailed

My solution in my plugin was to use a simple boolean flag to identify a single "stop-event":

elif Events.PRINT_STARTED == event:
  self.alreadyCanceled = False
  ...

elif Events.PRINT_DONE == event:
  self._printJobFinished("success", payload)

elif Events.PRINT_FAILED == event:
  if self.alreadyCanceled == False:
    self._printJobFinished("failed", payload)

elif Events.PRINT_CANCELLED == event:
  self.alreadyCanceled = True
  self._printJobFinished("canceled", payload)

...and on my ToDo-List is also 'looking for a documentation or ask in forum about this behaviour"

foosel · October 29, 2019, 4:57pm

Documentation improvements welcome

OllisGit · October 29, 2019, 6:16pm

Short question: Is the source of the API documentation from here: https://docs.octoprint.org/en/master/events/
the index.rst in the github: https://github.com/foosel/OctoPrint/blob/8409fdaa48ff692bbb050086e8d7c09940b58c59/docs/events/index.rst.

So we just need to create a pull-request, right!?!

Beside the doc-improvement, I looked into standard.py where the cancel-event is fired (def on_comm_print_job_cancelled) and my question is: why is in the finalize-methode also a print-failed event fired? This is a miracle for me

			eventManager().fire(Events.PRINT_CANCELLED, payload)
...
			def finalize():
				self._fileManager.log_print(payload["origin"],
				                            payload["path"],
				                            time.time(),
				                            payload["time"],
				                            False,
				                            self._printerProfileManager.get_current_or_default()["id"])
				eventManager().fire(Events.PRINT_FAILED, payload)