System.OutOfMemoryException when reading a large XLSX workbook

Mar 25, 2014 at 4:36 PM
I'm getting a System.OutOfMemoryException when I attempt to open a large XLSX file (~150 megs).

The exception is thrown on the following line: https://github.com/tonyqus/npoi/blob/master/ooxml/openxml4Net/Util/ZipInputStreamZipEntrySource.cs#L134

I have done a little research and discovered that the issue is that each process has memory limitations and this is often hit when converting a large stream to an array of bytes. This is because memory is fragmented and there is no way to map it back or ensure that other blocks of memory are available. I believe this can possibly be solved by either passing references of the a stream or temporarily writing the decompressed data to disk. I have not tried either suggested solution.

http://blogs.msdn.com/b/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx
Apr 1, 2014 at 5:35 PM
I'm having the same issue.
The problem in XmlDocument object. It crash with big xml files..
A solution might be switch to linq but it means a lot of work
Apr 2, 2014 at 10:08 AM
Xml linq is not working either... next step XmlReader
Apr 2, 2014 at 12:38 PM
As stated in my original post, the Out of Memory exception is occurring when decompressing the XML from the XLSX file, not when reading the XML. The root cause is from the call to the ToArray method against the decompressed stream. This step essentially writes the entire content to the current process's memory as a byte array.

The only way to solve something like this is to change the architecture to pass a stream reference instead of a byte array.

BTW, I'm not saying that there is not an issue with the XmlDocument object and large files. I'm saying that the issue I am reporting is occurring well before it even gets to that point in the code.
Coordinator
Apr 12, 2014 at 8:53 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.