Identify a print job

Is there a way to reliably identify specific print job in the plugin?

When I was writing the event handler in my plugin, I realized that it'd be desirable to be able to correlate individual events by the print job. For instance, I'd like to know a PrintFailed event is for the same print job as the PrintCancelled event at some earlier point, and the PrintResumed event at another point earlier.

I'm currently tracking these events using a pretty complicated logic but apparently I missed some of the corner cases. My users have reported mis-identified print jobs.

Ideally I'd want OctoPrint to track print job id and provide it to plugin, probably by adding an id field to the return value of plugin._printer.get_current_job(). Since OctoPrint controls the lifecycle of a print job, it should be able do it much more reliably than the patchy code in my plugin that tries to reverse-engineer the print job's lifecycle.

But I might have missed some API/field that already exits to provide this info.

Any help is really appreciated!

Kenneth

Wouldn't filename concatenated with the start time suffice for a unique ID?

201908090827_5mmCamRing.gcode

No, unfortunately. When a gcode is printed again, it's a different print job but the filename will be the same.

...and the start time will be different, right?

201908071658_5mmCamRing.gcode

25%20AM

Very good point!

Somehow I can't find start_time in the return value of plugin._printer.get_current_job() . Did I miss something obvious?

I thought it was in the REST API's job query response... but apparently I'm mistaken. The file endpoint contains some of this data, here's a typical json response for a single file query. A call to this will give you an array of information; you're interested in the date field's value.

I do know that each folder/subfolder within ~/.octoprint/uploads contains a hidden .metadata.json file. It includes a variety of information to include a history. Those are likely Linux-style timestamps would would uniquely identify any particular print session.

        "history": [
            {
                "timestamp": 1549577452.725463,
                "printerProfile": "_default",
                "success": false
            },
            {
                "timestamp": 1549577506.415093,
                "printerProfile": "_default",
                "success": false
            },
            ...

Thank you @OutsourcedGuru for all these pointers.

Unfortunately print history or stats won't help my situation. What I need is an id right when the print is still on going (or just finished).

You could in theory just create your own id/timestamp outside of the printer's job info using the print start event, and then just reference that internal variable when you are storing information.

1 Like

That's what I did. But it has happened quite frequently that apparently my code mis-identified some prints. My best guess is either some "critical events" such as PrintStarted and PrintCancelled occasionally don't get delivered to my plugin, or there are some bugs deep in my event handler that occasionally cause exceptions.

Since there must already be an Job object somewhere in OctoPrint, I guess it'd be a lot easier and more robust to add a UUID to that object and expose it to plugins...

I don't see why that couldn't be added to the core OctoPrint code. Maybe submit a feature request or PR that adds that functionality. The problem here though is if your code is erroring then there's a chance the same issue could occur if you're depending on that UUID from the get_job_info callback, especially if your events aren't firing.

You have got a good point.

However my problem is a bit complicated. Since my plugin has a server component, I have to deal with the possibility that some events get delivered to my plugin but lost between my plugin and the server.

I already built a logic in the server to deal with the possible lost of events, based on the Job UUID the plugin generates However that logic depends on the assumption that events may get lost, but each received event still has the correct Job UUID.

My code would have to be exponentially more complicated if that assumption is no longer true. Whereas if OctoPrint can supply such assumption, both my plugin code and server code would be simpler and more robust because now they only have to deal with uncertainty coming from 1 dimension (event loss).

Hope this makes more sense now.

Just opened a github issue.

Thanks again @OutsourcedGuru @jneilliii for your help!

Hi @Kenneth_Jiang,
I looked into your code and now I have some remarks/questions:

  • The creation of the print_ts (aka UUID) looks good to me (small improvement, use Events.PRINT_STARTED instead of just “PrintStarted“).
  • You use a Daemon-Thread in the on_after_startup-Hook to start the message communication and to make sure the thread is killed during server-shutdown. I am not a Python-Thread expert, but from my current understanding the child-thread is killed if the parent thread is killed. I think the startup-hook is not called from main-thread, so the question is: what if the startup-thread is killed? Is the message-thread killed too?
  • You should log the message-payload and the timestamp when ever you send data via web socket def post_printer_status, maybe in a separate log-file. I did this in my DisplayLayerProgress-Plugin (_event_file_logger). I think it is necessary to see that „all messages“ you send have the correct print_ts
  • Or maybe a litte bit more complicated solution: implement a hand-shake communication and not just fire-and-forget. Or just send an "incremental id" each time to the server, so that the server store this id and compares it next time a new request is received (must be: last id+1). So the server could identify a gap during communication.

Last question: You say: "reported mis-identified print jobs“ do you have more details about this „mis-identified“?

BR
Olli

I really like your suggestion of logging all the messages to a separate log file. I think I'll do that.

One common symptom of "mis-identified print jobs" is the server receives a message with the print_job_id the same as the previously ended one. This has been happening so often that I have started to log a server error for it. This error has appeared hundreds of times.

I don't think child thread will be killed after parent thread ends. Actually one of the reasons why I chose to have do almost everything in a child thread is so that the parent thread, which is whoever calls on_after_startup, can finish asap. Previously when I run the forever loop in that thread, OctoPrint sometimes exhibited some strange behavior such as not responding to TERM signal to cleanly shut down.

Thanks again for your help @OllisGit

I analysed your code a little bit deeper:

The creation of the print_ts in done in the event-thread, after that you send the event-payload via web socket, also in the event-thread.
In parallel you read the print_ts in the message-thread and send the data also with the same web socket-client.
I wrote a test-client to reproduce the issue, but without success. Each print_ts was transferred correct.

So, my assumption is that this combination: two different threads use the same web socket client to send data, creates this issue.
Try to implement a single thread communication. E.g. put each message (event or normal-message) into a queue, then a Websocket client threads picks the message and send it to the server.

Or you can reuse the message-thread, just check if a event-message is present and then send this message:

while True:
	if not self.eventMessagePayload == None:
		self.post_printer_status(self.eventMessagePayload, throwing=True)  

But I think the best approach to solve this issue, is to collect more logging-data.

Good luck...and if you find the problem, please let me know!

1 Like

Thank you!

Race condition could be the issue. Although I was hoping Python's ws client is thread-safe, I haven't found a document that officially claims so.

Are you aware any situations when some of the events don't get delivered to plugins, or get delivered multiple times (maybe some retry mechanism)?

I now suspect the problem is with the sequence of events for print ending.

Currently I'm assuming these event sequences:

  1. If print finishes successfully:
PrintStarted
...
PrintDone
  1. If print is canceled:
PrintStarted
...
PrintCancelled
PrintFailed

However, this is just based on my observation on my printer. Not documented anywhere. Now I'm wondering if there are situations when event sequences like the following will happen:

PrintStarted
...
PrintFailed
PrintCancelled
PrintStarted
...
PrintCancelled
PrintFailed
PrintDone
PrintStarted
...
PrintCancelled

(No PrintFailed event).

If any of these events happens, it'll result in the bug that my users have reported.

Is the event sequence documented somewhere but I missed it?

I also monitored this "cancel-behaviour" if I press cancel

PrintStarted
...
PrintCancelled
PrintFailed

My solution in my plugin was to use a simple boolean flag to identify a single "stop-event":

elif Events.PRINT_STARTED == event:
  self.alreadyCanceled = False
  ...

elif Events.PRINT_DONE == event:
  self._printJobFinished("success", payload)

elif Events.PRINT_FAILED == event:
  if self.alreadyCanceled == False:
    self._printJobFinished("failed", payload)

elif Events.PRINT_CANCELLED == event:
  self.alreadyCanceled = True
  self._printJobFinished("canceled", payload)

...and on my ToDo-List is also 'looking for a documentation or ask in forum about this behaviour" :wink:

Documentation improvements welcome :wink:

Short question: Is the source of the API documentation from here: https://docs.octoprint.org/en/master/events/
the index.rst in the github: https://github.com/foosel/OctoPrint/blob/8409fdaa48ff692bbb050086e8d7c09940b58c59/docs/events/index.rst.

So we just need to create a pull-request, right!?!

Beside the doc-improvement, I looked into standard.py where the cancel-event is fired (def on_comm_print_job_cancelled) and my question is: why is in the finalize-methode also a print-failed event fired? This is a miracle for me :wink:

			eventManager().fire(Events.PRINT_CANCELLED, payload)
...
			def finalize():
				self._fileManager.log_print(payload["origin"],
				                            payload["path"],
				                            time.time(),
				                            payload["time"],
				                            False,
				                            self._printerProfileManager.get_current_or_default()["id"])
				eventManager().fire(Events.PRINT_FAILED, payload)