LineProcessorStream Python 3 behaviour

OllisGit · February 16, 2020, 10:25am

FYI: I just realised that the python 3 io.BufferedReader.readline() return a line of type bytes. Python 2 returns str.
So, if your plugin use LineProcessorStreamwhich includes BufferedReader, make sure that you encode, decode the line before you make any string-checks/manipulation.

def process_line(self, origLine):
  line = origLine.decode('utf-8')
  # do your thing
  ...
  line = line.encode('utf-8')

OutsourcedGuru · February 16, 2020, 4:05pm

Is it bin or bytes?

And yeah, I definitely got burned by that.

OllisGit · February 16, 2020, 6:17pm

Upps...changed to bytes

foosel · February 17, 2020, 9:50am

FYI, this change in LineProcessorStream behaviour is something I might put a workaround in for 1.4.0 (with yet another RC ) because it might bite plugin authors. Got a ticket opened for that too:

And just as a side note, these bytes vs str issues are precisely what I expected to be cause the most pain in the transition and what I've been cursing most ever since the first versions of Python 3 were released.

OutsourcedGuru · February 17, 2020, 4:10pm

Oh, and watch out for constructs like this:

bytesMessage = (command + self.eof).encode(self.encoding)

...where self.eol somewhere higher in the code got this value injected as... u'/n'. So what happens is the already-bytes EOL just... gets... ignored (wtf?) by Python 3 and doesn't make it into the bytes stream going out. You can't imagine how long that took to troubleshoot.

foosel · February 18, 2020, 10:46am

Quoting myself from the github issue:

So I took a close, long and hard look at this, and came to the conclusion that I cannot do anything about this on my end.

The thing is, the class in question implements io.RawBaseIO, which means I have to adhere to its API. Which means that it needs to operate on bytes (str in Python 2, bytes in Python 3), not unicode (unicode in Python 2, str in Python 3). If I start decoding stuff into unicode before passing it to process_line I get into all kinds of troubles when further processing the result from that inside read. I could turn it back into bytes, but that just screams trouble too...

The thing is, if you work on the file system level, you work with bytes in Python. You always have, their data type was just called str in Python 2. Client code needs to recognize that. I try to make the py2/3 compat migration as painless as possible for plugins, but that's something I cannot do for them without breaking parts of OctoPrint.

So I'm leaving this as is but added some warnings to the code which should make it into the documentation. I'll also finally look into whipping up some kind of list of common pitfalls for plugin authors to refer to for the coming months of migrations. Should have done that already but other stuff was even more urgent.

OllisGit · April 10, 2020, 8:56am

Hi @foosel,
I received a Bug-Report, because converting between bytes -> str failed in my plugin.
In my plugin I used your suggested Utility-Methode octoprint.util.to_unicode()
https://docs.octoprint.org/en/master/plugins/python3_migration.html#bytes-vs-unicode

But then I received an UnicodeDecodeError. I already created a bug-report: https://github.com/OctoPrint/OctoPrint/issues/3513

I mentioned it here, because my suggested solution on the top of this report needs to be improved.
Use ISO-8859-1 instead of uft8

import octoprint.util
# gcode_line_as_str = "M117 Priming Filamentâ{¦"
gcode_line_as_bytes = b'M117 Priming Filament\xe2{\xa6\n'
print (gcode_line_as_bytes)

gcode_encoded = gcode_line_as_bytes.decode('ISO-8859-1')
print (gcode_encoded)

unicode_line = octoprint.util.to_unicode(gcode_line_as_bytes)
# --> BOOOM: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 21: invalid continuation byte

OutsourcedGuru · April 10, 2020, 3:53pm

Not sure if it helps, but at the top of all my Python files is the line:

# -*- coding: iso-8859-15 -*-

From a serial-reading snippet:

# readline() returns a byte stream
line = self.cSerial.readline().decode('utf-8')
# line would now be an str
...
# command at this point is an str, as is eol and encoding is 'utf-8'
message = (command + self.eol).encode(self.encoding)
# message would be a byte stream here
self.cSerial.write(message)
self.cSerial.flush()

In a case like this, I might use logging with the str(type(variablename)) sort of construct to verify what everything is during development.