Large File Download Memory Issues

FormerLurker · January 13, 2020, 9:52pm

Hi everyone!

I've implemented some file browsers in my plugin, and noticed a really bad memory issue. I'm sending some rather large files to the client, and no matter what I do it consumes a terrible amount of memory, crashing my PI pretty fast. I've tried streaming the files using a generator + stream_with_context. I've tried send_from_directory. No dice. Htop even crashed while I was monitoring the usage, so it's filling the memory up 100%.

My working theory is that Octoprint is working on the request behind the scenes, and putting them into memory. I looked at the LargeResponseHandler, but it would be complicated to use since I'm dynamically creating some of the data. Perhaps it could be modified somehow? I also need to be able to delete files after the request is finished.

Any ideas? I'm desperate for a solution, because a ton of functionality I've added is rendered useless if I can't download a few 100mb files in a row without crashing the server.

Thanks!

FormerLurker · January 13, 2020, 9:57pm

Actually, the problem is worse than I thought. I downloaded a 150MB file using send_from_directory (just returning it), and my memory usage went from 190MB to 675MB! That's 477MB used to download a 150MB file. Nuts. Also, there appears to be a leak, because the memory used just keeps increasing after each download. What is going on here?

foosel · January 14, 2020, 8:28am

The problem here is that Flask runs in a WSGI context within the single threaded Tornado framework. Thus it cannot stream. If you want to stream responses that you generate on the fly, you'll probably need to go the route of implementing the endpoint in Tornado instead. There's a hook for that ^^

I've actually thought about implementing most or even all of the API in Torndo, and it's still something I want to look into, especially due to issues like that.

FormerLurker · January 14, 2020, 2:25pm

I did some research into WSGI, and I understand now. I already started implementing the octoprint.server.http.routes hook, so thanks for confirming that's the way to go. I'm going to try to generalize the implementation somewhat to make it easier to add routes in the future.

Implementing the entire API in Tornado sounds like quite the project, and one worth of OctoPrint. Thanks for considering that, and for answering my question! You are always so helpful

OllisGit · January 14, 2020, 6:22pm

It's the same issue I had, streaming modified Video-Stream and large CSV files:

FormerLurker · January 14, 2020, 6:57pm

Thanks! I noticed that very same hook, and have been working to implement it. I've been having a bit of difficulty getting backwards compatibility with the old permission system since I'm forced to create my own custom request handler, but I'm working out all of the issues.

Once I get it working, I'm going to try to write some decorators to keep the API looking a bit more consistent. If they end up being helpful, I'll post my code here.

FormerLurker · January 14, 2020, 7:02pm

Well, the hook works! It now uses only about 10MB of memory when downloading a 150MB file(I'm assuming this caused by a buffer). Additional simultaneous requests don't seem to take any additional memory, hooray! Now to clean up the code, generalize, and replace some other bits of the API that should stream this way