Compile and Import C++ Module

FormerLurker · January 29, 2019, 7:43pm

Does anyone know if it's possible to include and compile a c++ file using distutils and setup.py (or some other way) within an Octoprint plugin into a module, and then import it within a .py file? I've got a function that is just too slow in python that parses gcode, and am looking to speed it up. I've got a good start on the c++ function already, but can't figure out how to get it to build and import.

Here is the class definition in c++

class ParsedCommand
{
public:
	string cmd;
	vector<GcodeParameter> parameters;
	string gcode;
	string error;
};

I also have a function that parses a string and returns the ParsedCommand object with the following signature:

ParsedCommand parse(string gcode);

Any help would be greatly appreciated.

Ewald_Ikemann · January 29, 2019, 8:00pm

Apparently it seems so. I found some links:

https://docs.python-guide.org/scenarios/clibs/
A PDF file

And the Google search.

FormerLurker · January 29, 2019, 8:19pm

I saw that already, but thanks for digging it up! I also tried out this document and basically pasted the 'superfastcode' example in, and added the setup code to my setup.py, but nothing seems to happen. Here is the code I added to setup.py, which might need adjustment for OctoPrint I suppose:

sfc_module = Extension('superfastcode', sources=['module.cpp'])
additional_setup_parameters = {"ext_modules": [sfc_module]}

I verified that the ext_modules parameter is being passed into setup via setup(**setup_parameters), but I can't tell if it's doing anything. When I try later to import the modules like so:

import sfc_module

It just says no module available.

FormerLurker · January 29, 2019, 8:47pm

It looks like the GPX plugin does this. I'm taking a look.

OutsourcedGuru · January 30, 2019, 2:22am

Did you do the part where you educate your project References? Otherwise, VS won't be able to find it.

Honestly, I don't love using Microsoft tools in the Python space, to be honest. You should consider going the ctypes route.

Not that it's really pertinent to what we're talking about, but I recently wrote a step-by-step guide for compiling Python into C++ (opposite direction).

FormerLurker · January 30, 2019, 1:58pm

@OutsourcedGuru, thanks for your tips! I did get my c++ code to compile eventually. It turns out my working directory was not correct in Pycharm. Setup.py wasn't being executed, but somehow my plugin code has been building properly for over a year! I was wondering why my breakpoints weren't being hit

Also, I'm only using visual studio to run and test the c++, not to run python. Pycharm doesn't have any c capabilities that I can see, else I would use that.

I'll look into the ctypes method. Currently I'm using Python.h and have created a method table, and am using PyCFunction types and PyObject returns. I had a heck of a time figuring out how to pass object data back to Python, but finally started using Py_BuildValue, which pretty much works as expected.

I'm sure I'll run into other roadblocks, though, so I will keep this open for now.

OutsourcedGuru · January 30, 2019, 5:03pm

A long time ago, we use to mix/match assembly code with C programs. Since something written in assembly would run faster, you'd write, say, a trigonometric function in that and then link that in with your C code and your early graphics programs would then be faster. Decades later, I wrote embedded SQL in C code for a Microsoft SQL Server extended stored procedure or maybe an ISAPI function perhaps. It took years for Microsoft to figure things out so inventive people had to solve problems early on.

It's been a long time since I've returned to this stuff.

FormerLurker · February 12, 2019, 8:42pm

Update: I have gotten my module working! It compiles without issue on the PI and in Windows (provided you have a C++ compiler installed). Now I just need to detect problems during compilation and include a fallback and all will be well!

FYI I got approximately 4X speed boost. It was 10X natively, but it takes a lot of time to turn the results into python objects I may try ctypes to see if it's faster.

OutsourcedGuru · February 13, 2019, 2:53am

So what does it do, exactly? Is it just a gcode parser?

FormerLurker · February 14, 2019, 12:14am

Yes, that's all it does. It returns the command and all of the parameters in a dict. It doesn't handle every command properly yet, but all the ones I need

For example, if you supply the line G0 X1.00 Y2.00 E2.00 it would return:

("G1",{"X":1.00, "Y":2.00, "E":2.00})

It handles some commands with text only parameters like M117. For example, the command M117 Print Starting would return:

("M117",{"TEXT":"Print Starting"})

The other odd one it handles is the T command. For example, T? would return:

("T",{"T":"?"})

and Tc would return:

("T",{"T":"c"})

I managed to further boost the speed by spawning a process that will parse a set number of commands in a file,(depending on the parameters) return them in a queue, then continue processing. This allows the parsing to happen in the background so that my functions can pull them out of the queue as necessary. Basically the parsing time in this case is close to 0, though there is a bit of overhead added by the inter-process communication, but not too much due to the chunking behavior. One odd thing about the way I'm importing the routine is that it takes about 2-3 seconds to initialize the module in the new process which was killing a decent bit of the multiprocess performance gains. I worked around this by spawning a daemon process that first initialized the dynamic module, then waits for a filename to be supplied via a multiprocess Queue before parsing the data in chunks.

My next step is to migrate my position and state calculation routines to C as well. Right now my PC can parse a test 38MB file in about 23 seconds, but it takes nearly 2 minutes on the pi. I'm not sure exactly what the bottleneck is yet because it's difficult to profile multithread/multiprocess applications (need to figure out how to do that still), but I suspect it's the position/state routines. On my PC that part of the code takes about 51% of the time (depending on how the profiler feels). If it's an IO bottleneck I might have fixed it already with my multiprocess technique. I've not tested that on the pi yet.

All of this was done so that I could preprocess a gcode file in a reasonable amount of time. My first attempt with my old parser and position calculations took around 170 seconds to complete (about 45 seconds of that was parsing). I've managed an 86.5% reduction in time for the same sample file, so lots of progress has been made! It gets harder and harder though

OutsourcedGuru · February 14, 2019, 12:44am

That's an awesome project. I suppose it's not too unexpected that C code is outperforming its Python equivalent. Maybe I am a little surprised at how much faster this is.

I fondly remember strtok() in the C world. I once wrote a realtime stock market pricing analyzer that had to be super-fast at parsing the stock feed. I imagine I'd still try to use that, to be honest. There was a way of setting up a per-line while loop and then just strtok'ing the pieces off that, deciding what they were and then storing them.

If this were on Linux, I'd suggest htop for profiling. Are you using threading.Thread() or something else?

FormerLurker · February 14, 2019, 1:02am

Yeah, I used to use spaces or other separators to parse gcode, but you'd be amazed how many people have gcode like this:

G1X100Y1 0 0

Spaces don't matter on many machines, and I got lots of complaints. If Octoprint can't parse the code no big deal (usually), but if Octolapse can't parse the code things can go haywire.

Worse yet are parenthetical comments (allowed on some machines) like so:

G1(this is a comment(What about this??))X100 ; and there is another comment

I removed the parenthetical comment handling temporarily (my python version handles this) and may re-add it if I can make it performant.

What I do is first strip out all whitespace from the command, then look for a valid letter for gcodes (G, M, T, etc..) followed by a number, maybe followed by a period and more numbers. Then I expect the next character to be a parameter (unless it's text only or the T command above). After that I look for floats by default (+ or - + digits + maybe a period + maybe more digits). It's done recursively by passing the current index to the function and the entire command string, finding one parameter, then recursing deeper until the index is greater than the string length. At the end all of the parameters are returned along with the command itself.

FormerLurker · February 14, 2019, 1:03am

I use threading, but am using subprocess.Process so that I can avoid the GIL somewhat.

OutsourcedGuru · February 14, 2019, 1:13am

:laugh: Don't forget OCTO801 which on my printer will do R2D2's get-your-attention whistle. See the Gcode System Commands plugin.

I actually wrote a C compiler in Allen Holub's class (Lex/Yacc) there at U.C. Berkeley and managed to scrape an A in his fourth-year class.

FormerLurker · February 14, 2019, 2:26am

Also Octolapse's snap command I think OCTO801 would be returned as None in the current parser. In a previous version it would be returned as

'O': {'c':None, 'T':None, 'O': 801.0}

lol!

Never made a compiler, but would be interesting to tinker.

OutsourcedGuru · February 14, 2019, 5:45pm

If you want to write a compiler, you should begin with Brainfuck to get an appreciation of the simplicity of what actually happens behind-the-scenes.

FormerLurker · March 4, 2019, 7:57pm

Here's another update - I finished porting the position tracker code over to C++ and combined it with the an enhanced parser (can you say pointer math?) for an incredible speed boost. Here are my best numbers for processing 1.3million gcodes BEFORE to the update:

Desktop - 23.5 seconds
RPI - 170 seconds (2 min 40 seconds approx)

After combining parsing with the new C position tracker:
Desktop - 3 seconds
RPI - 18 seconds

Also, when I run the program natively (exe file) it completes processing the same file in 1.456 seconds. I have no idea why it takes less than half the amount of time compared to running the routine from python. Also, I wasn't expecting 9X slowdown running the routine from the raspberry pi, though I haven't figured out how to profile it there yet (Maybe it's IO related?). In any case, 9 times slower (18X if we're going from the fastest time possible) seems REALLY steep for a single thread application.

This could all be compiler related since I'm using 3 different compilers (gcc on the pi, MSVC for python and MSVC 2017 for c++ development). Any insight would be appreciated, though I now believe the performance is adequate enough for me to start moving forward again.

OutsourcedGuru · March 4, 2019, 8:19pm

The Pi has something like 1GB of RAM versus your PC. Some of that is dedicated to being shared with the GPU (maybe 128MB as the default in /boot/config.txt for the OctoPi image). So the difference is your working space.

When memory fills up, it's supposed to go to the swapfile which is on the microSD and not very fast. If that's happening (memory getting pushed to the swapfile) then this will slow things down.

"Swappiness... is a warm gun" /BadlyRememberedBeatlesLyrics

FormerLurker · March 4, 2019, 8:59pm

Thanks for your thoughts!

However, the process uses at most 30MB of memory according to my most recent profile, and somewhere around 27MB of that is the python interpreter.

OutsourcedGuru · March 4, 2019, 9:02pm

Maybe it's just the read time from the microSD (I/O, as you've suggested).