Is there a way to reliably identify specific print job in the plugin?
When I was writing the event handler in my plugin, I realized that it'd be desirable to be able to correlate individual events by the print job. For instance, I'd like to know a PrintFailed event is for the same print job as the PrintCancelled event at some earlier point, and the PrintResumed event at another point earlier.
I'm currently tracking these events using a pretty complicated logic but apparently I missed some of the corner cases. My users have reported mis-identified print jobs.
Ideally I'd want OctoPrint to track print job id and provide it to plugin, probably by adding an id field to the return value of plugin._printer.get_current_job(). Since OctoPrint controls the lifecycle of a print job, it should be able do it much more reliably than the patchy code in my plugin that tries to reverse-engineer the print job's lifecycle.
But I might have missed some API/field that already exits to provide this info.
I thought it was in the REST API's job query response... but apparently I'm mistaken. The file endpoint contains some of this data, here's a typical json response for a single file query. A call to this will give you an array of information; you're interested in the date field's value.
I do know that each folder/subfolder within ~/.octoprint/uploads contains a hidden .metadata.json file. It includes a variety of information to include a history. Those are likely Linux-style timestamps would would uniquely identify any particular print session.
You could in theory just create your own id/timestamp outside of the printer's job info using the print start event, and then just reference that internal variable when you are storing information.
That's what I did. But it has happened quite frequently that apparently my code mis-identified some prints. My best guess is either some "critical events" such as PrintStarted and PrintCancelled occasionally don't get delivered to my plugin, or there are some bugs deep in my event handler that occasionally cause exceptions.
Since there must already be an Job object somewhere in OctoPrint, I guess it'd be a lot easier and more robust to add a UUID to that object and expose it to plugins...
I don't see why that couldn't be added to the core OctoPrint code. Maybe submit a feature request or PR that adds that functionality. The problem here though is if your code is erroring then there's a chance the same issue could occur if you're depending on that UUID from the get_job_info callback, especially if your events aren't firing.
However my problem is a bit complicated. Since my plugin has a server component, I have to deal with the possibility that some events get delivered to my plugin but lost between my plugin and the server.
I already built a logic in the server to deal with the possible lost of events, based on the Job UUID the plugin generates However that logic depends on the assumption that events may get lost, but each received event still has the correct Job UUID.
My code would have to be exponentially more complicated if that assumption is no longer true. Whereas if OctoPrint can supply such assumption, both my plugin code and server code would be simpler and more robust because now they only have to deal with uncertainty coming from 1 dimension (event loss).
Hi @Kenneth_Jiang,
I looked into your code and now I have some remarks/questions:
The creation of the print_ts (aka UUID) looks good to me (small improvement, use Events.PRINT_STARTED instead of just “PrintStarted“).
You use a Daemon-Thread in the on_after_startup-Hook to start the message communication and to make sure the thread is killed during server-shutdown. I am not a Python-Thread expert, but from my current understanding the child-thread is killed if the parent thread is killed. I think the startup-hook is not called from main-thread, so the question is: what if the startup-thread is killed? Is the message-thread killed too?
You should log the message-payload and the timestamp when ever you send data via web socket def post_printer_status, maybe in a separate log-file. I did this in my DisplayLayerProgress-Plugin (_event_file_logger). I think it is necessary to see that „all messages“ you send have the correct print_ts
Or maybe a litte bit more complicated solution: implement a hand-shake communication and not just fire-and-forget. Or just send an "incremental id" each time to the server, so that the server store this id and compares it next time a new request is received (must be: last id+1). So the server could identify a gap during communication.
Last question: You say: "reported mis-identified print jobs“ do you have more details about this „mis-identified“?
I really like your suggestion of logging all the messages to a separate log file. I think I'll do that.
One common symptom of "mis-identified print jobs" is the server receives a message with the print_job_id the same as the previously ended one. This has been happening so often that I have started to log a server error for it. This error has appeared hundreds of times.
I don't think child thread will be killed after parent thread ends. Actually one of the reasons why I chose to have do almost everything in a child thread is so that the parent thread, which is whoever calls on_after_startup, can finish asap. Previously when I run the forever loop in that thread, OctoPrint sometimes exhibited some strange behavior such as not responding to TERM signal to cleanly shut down.
The creation of the print_ts in done in the event-thread, after that you send the event-payload via web socket, also in the event-thread.
In parallel you read the print_ts in the message-thread and send the data also with the same web socket-client.
I wrote a test-client to reproduce the issue, but without success. Each print_ts was transferred correct.
So, my assumption is that this combination: two different threads use the same web socket client to send data, creates this issue.
Try to implement a single thread communication. E.g. put each message (event or normal-message) into a queue, then a Websocket client threads picks the message and send it to the server.
Or you can reuse the message-thread, just check if a event-message is present and then send this message:
while True:
if not self.eventMessagePayload == None:
self.post_printer_status(self.eventMessagePayload, throwing=True)
But I think the best approach to solve this issue, is to collect more logging-data.
Good luck...and if you find the problem, please let me know!
I now suspect the problem is with the sequence of events for print ending.
Currently I'm assuming these event sequences:
If print finishes successfully:
PrintStarted
...
PrintDone
If print is canceled:
PrintStarted
...
PrintCancelled
PrintFailed
However, this is just based on my observation on my printer. Not documented anywhere. Now I'm wondering if there are situations when event sequences like the following will happen:
So we just need to create a pull-request, right!?!
Beside the doc-improvement, I looked into standard.py where the cancel-event is fired (def on_comm_print_job_cancelled) and my question is: why is in the finalize-methode also a print-failed event fired? This is a miracle for me