This project is read-only.
6
Vote

XSSFWorkbook Write corrupts XLSX File

description

This may be a duplicate of 12616, but after much research, I haven't been able to track down a solution.

I compiled NPOI from source in Visual Studio 2013 with a direct pull from GitHub - the NPOI version is 2.0.8. The issue occurs with the NPOI NuGet package as well. I am opening the file with Excel 2010. Consider the following:
IWorkbook wb = null;
string file = "numbers.xlsx";

using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read)) {
    if (Path.GetExtension(file).Contains("xlsx")) {
        wb = new XSSFWorkbook(fs);
    } else {
        wb = new HSSFWorkbook(fs);
    }
}

using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.ReadWrite)) {
    wb.Write(fs);
}
This will render the output XLSX file unusable. Case in point on Excel's reaction.

Workbooks created/written in-code with the XSSF API open correctly. XLS files opened/written with the HSSF API are fine with respect to Excel.

The FileMode/FileAccess/FileShare options for the FileStream do not seem to make a difference.

In practice I have been opening far more complex XLSX files, writing a few new cells to the rows therein, and writing back, where after the spreadsheet cannot be opened or subsequently recovered.

I've attached the aforementioned XLSX spreadsheet for testing purposes. Thanks for having a look.

file attachments

comments

qspore wrote Mar 27, 2014 at 3:52 PM

One more thing: this file can't be decompressed with WinRar. You can see the manifest, but WinRar flags it as corrupt:

Example here.

wrote Apr 11, 2014 at 2:49 AM

wrote Apr 12, 2014 at 10:42 PM

wrote May 7, 2014 at 4:53 PM

ddelella wrote May 7, 2014 at 4:54 PM

I have confirmed this issue exists. When I click "yes" for the repair, the file data appears to be correct but the initial message is not good.

wrote May 10, 2014 at 2:57 AM

lukeautry wrote May 10, 2014 at 2:57 AM

I'm running into the same problem. Here's a quick console app to repro the problem (create a new Console app, pull in NPOI package through NuGet, and replace program.cs with the code below).

Attached is the spreadsheet produced by this code.
using System.IO;
using NPOI.XSSF.UserModel;

namespace XSSFWorkbookTest
{
    class Program
    {
        static void Main()
        {
            //Make sure this path exists before you run the console application
            const string path = "C:\\Tests\\TestFile1.xlsx";

            var xlsxWorkbook = new XSSFWorkbook();

            //Using the same example block provided in the sample pack
            GenerateData(xlsxWorkbook);

            var stream = new MemoryStream();
            xlsxWorkbook.Write(stream);

            var fileStream = new FileStream(path, FileMode.Truncate, FileAccess.Write);
            byte[] bytes = stream.GetBuffer();

            fileStream.Write(bytes, 0, bytes.Length);
            fileStream.Close();
            stream.Close();
        }

        private static void GenerateData(XSSFWorkbook xlsxWorkbook)
        {
            var sheet = xlsxWorkbook.CreateSheet("Sheet1");

            sheet.CreateRow(0).CreateCell(0).SetCellValue("This is a Sample");
            int x = 1;
            for (int i = 1; i <= 15; i++)
            {
                var row = sheet.CreateRow(i);
                for (int j = 0; j < 15; j++)
                {
                    row.CreateCell(j).SetCellValue(x++);
                }
            }
        }
    }
}

lukeautry wrote May 10, 2014 at 6:06 AM

After more investigation, I think this has to do with the way that NPOI compresses .xlsx files. Try the following steps:

1) Generate "corrupted"/"unreadable" file using the code above
2) Change the file to a .zip
3) Extract the zip
4) Compress contents back into a .zip (I did this using the native Windows 7 zip support)
5) Rename the file back to .xlsx - file now opens with no complaints.

I will probably look at this some more tomorrow to see if I can debug it from source. Wondering if it could be an issue with the SharpZipLib dependency...

lukeautry wrote May 11, 2014 at 4:44 AM

Stepped through today but didn't find any obvious problems, but I'm also not very familiar with the library. I did a few more tests, and what I did find is that there's no actual difference between the extracted contents of a broken file and a fixed file. Note that all you have to do to a fix a file is rename, unzip, and compress the contents. No repair in Excel necessary.

I also found that the problem could be reproduced with a blank spreadsheet. Also reproduced using the XSSFWorkbook sample in the latest source code.

It appears that a blank .xlsx produced by NPOI is 8 KB. When unzipped, rezipped, and renamed, it goes down to 5 KB. So, obviously, the code doing the zipping is adding way more bytes than necessary.

wrote May 11, 2014 at 4:57 AM

lukeautry wrote May 11, 2014 at 10:22 PM

Back again. I have a fix, although it's a fix that may have complications later down the road.

In ZipPackage.cs, after line 432 (inside the SaveImpl method), I've added this snippet:
     //Zip64 is problematic in Windows environments; set to false
     zos.UseZip64 = UseZip64.Off;
With this change only, XSSF workbooks are produced without error..

Perhaps you could expose a public method on the XSSFWorkbook class, something like "AllowZip64", which users could set to false? I have a feeling that this issue may only impact Windows systems, although I haven't actually tested that theory.

tonyqus wrote May 14, 2014 at 1:21 AM

Hi Lukeautry,

I used the latest code in github () to read numbers.xlsx and write back to a xlsx file. I didn't see any error. It looks your environment is a bit different from mine. Let's figure out this first.

lukeautry wrote May 14, 2014 at 1:52 AM

Hmm, weird. Here's what I'm using:
  • Windows 7 64-bit (en-us locale)
  • Visual Studio 2013 Ultimate
  • Excel 2013 (although I observed the same behavior in Open Office suite)
What does your test environment look like?

Also, have you tried writing a blank workbook (i.e. not reading from an existing XLSX and saving back to it, but creating a brand new XSSF workbook from scratch)?

Thanks for your time,

Luke

tonyqus wrote May 20, 2014 at 4:52 AM

Please contact me via skype: tonyqus. We need to align the dev environment first.

antondd wrote May 20, 2014 at 9:04 PM

I can confirm that this is the same issue as I have reported in [#12685]. The corrupted test files I created there become also readable after unzipping them and zipping them back as described by Luke.
I did the unzip/zip in this case under Linux and then can read the file in LibreOffice like a normal xlsx file.
The corrupted file was created in a program build under a Win7-64 virtual machine using Sharpdevelop set for a target of .Net 3.5 and a 32bit CPU.

If I add the "fix" in ZipPackage.cs as mentioned by Luke the generated xlsx file works just fine.

tonyqus wrote May 21, 2014 at 12:02 AM

Hi antondd,

Your issue looks to be a different issue because I can unzip oo-mod.zip with windows explorer or 7zip. It should be a different issue I think. I'll work with lukeautry first to figure out if it is a issue.

wrote Jun 23, 2014 at 8:33 AM

nvanesch wrote Nov 4, 2014 at 7:17 AM

Not sure if it helps, or if it is the same issue

We forgot to add a reference to ICSharpCode.SharpZipLib in our project and once added it didn't produce the same issue. Originally any alteration to the document (docx) caused it to be saved corrupt.

wrote Dec 4, 2014 at 4:56 PM

antondd wrote Dec 22, 2014 at 5:04 PM

Thanks for the comment nvanesch, adding the ICSharpCode.SharpZipLib library has indeed solved my problem with corrupted xlsx files.

suterma wrote Feb 1, 2016 at 7:30 AM

I can confirm this issue, with the following
  • Windows 7
  • Visual Studio 2013
  • NPOI Version 2.1.3.1 via NuGet
  • SharpZipLib in the Project: Version 0.86.0.518 (received via NPOI-Nuget-Package)
When opening, manipulating and storing a valid XLSX, the file becomes corrupt the way already mentioned. When I manually unzip and re-zip the file, MS Excel 2013 is happy.

When explicitly referencing the SharpZipLib, 0.86.0 the issue persists, however. Can someone tell which version of SharpZipLib they installed, when it fixed their problem?

SachaK wrote Nov 10, 2016 at 2:28 PM

If the file stream is created with FileMode.Create instead of FileMode.Open it works fine. It's confusing because it works with both for xls files.

See http://stackoverflow.com/questions/36142337