Yes I'm very aware of that (I helped write a chunk of the USB stack used on the LPC1768 Marlin implementation), which was why I said that the "UARTs (on the Pi) don't come into it". My comment was in reply to OutsourcedGuru who had said that you may need to use a "good UART" on the Pi to get good performance. My understanding is that this thread is about printers connected to the Pi by USB (even if that USB connection then uses a UART), not (the relatively small number of) printers connected directly to the Pi via a UART on the Pi board.
But the thing is that even with a reasonable USB stack running directly on a 32 bit board you may still not get very high communication speeds. I've just been running some tests to transfer a 20Mbyte file using the "Upload to SD" button in Octoprint. This uses exactly the same code to send the gcode to the printer as used during printing. It takes 61 minutes to upload the file. The same file uploaded from Repetier host takes just under 7 minutes. Which one of these do you think is less likely to have issues feeding large numbers of small move gcode operations like the ones you described into a printer?
The above tests generated no errors, no resends, no extra logging, so are reasonably comparable. They were run on different host side hardware, Octoprint was running on a Pi 2 and Repetier on a relatively low end PC. The Pi had a RPi Cam attached but was not streaming (activating streaming slows things down to about 72 mins, not surprising given that the network connection runs via the same USB controller). But that same file can be transferred to the printers SD card over USB (using the feature that shares the printers SD card as a "USB drive") in just over 30 seconds from the PC and around 56 seconds from the same Pi used to run Octoprint. So although the Pi/USB/printer transfer is slower than the PC/USB/printer it is not that much slower. All of these tests are basically transferring roughly the same volume of data (and using the same USB bulk endpoints).
So why is an Octoprint transfer so much slower? To be honest I'm not really sure. The protocol used by Octoprint (so called ping pong mode) is not as efficient as the buffer state tracking used by Repetier, but even switching Repetier Host to use ping pong mode only extends the time to around 14 minutes. Even if we combine that with the above 2 fold increase for the PC v Pi USB speed we only get to around 28 minutes. So it looks like there may be something else going on here. But whatever the reason it would seem that there may be some room for improving the Octoprint to printer transfer rate.