Adventures with OpenXML Spreadsheets and Performance

4 April 2011

Following on from my previous post about my foray into OpenXML, I ran my little console application to extract data from a SharePoint list with about 5000 items, and very quickly noticed the huge amount of CPU and memory consumed by it. Turns out that the reason behind this is the large number of Data Objects being created (and not being collected/disposed) in memory. So after a bit of research on good old Google, I made a few modifications and turned the running time from 30 minutes to roughly 2 seconds; and it's actually pretty easy to do.

Instead of using the Sheet, Row and Cell objects directly, Microsoft provides the OpenXmlReader and OpenXmlWriter classes which allow us to manage OpenXML documents in a much more performant way (with the obvious downside that it's a slight bit more technical, but not by much - you just need to know XML and to put the document together in the right order). And strangely enough, using OpenXmlWriter means you don't need to set CellReferences, so the problem I had last time with the incorrect column names is now irrelevant! An aside - there's a lot of posts referring to using OpenXmlReader/Writer as a SAX-based approach, and I'd just like to point out purely for my own sake, that SAX stands for Simple API for XML...

Enough talk, some code - this snippet creates a workbook with one sheet, and one row and one cell:

using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Create(sOutputPath, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook))
    //create workbook part
    WorkbookPart wbp = spreadsheet.AddWorkbookPart();
    wbp.Workbook = new Workbook();
    Sheets sheets = wbp.Workbook.AppendChild<Sheets>(new Sheets());

    //create worksheet part, and add it to the sheets collection in workbook
    WorksheetPart wsp = wbp.AddNewPart<WorksheetPart>();                                               
    Sheet sheet = new Sheet() { Id = spreadsheet.WorkbookPart.GetIdOfPart(wsp), SheetId = 1, Name = list.Title };                                             

    OpenXmlWriter writer = OpenXmlWriter.Create(wsp);
    writer.WriteStartElement(new Worksheet());
    writer.WriteStartElement(new SheetData());

    writer.WriteStartElement(new Row());
    writer.WriteElement(new Cell { CellValue = new CellValue("Cell Data"), DataType = CellValues.String });                      
    writer.WriteEndElement(); //end of Row

    writer.WriteEndElement(); //end of SheetData
    writer.WriteEndElement(); //end of worksheet

I'd highly recommend using this approach if you're comfortable with XML, as well as checking out these great resources for working with OpenXML:

Tags: OpenXML, performance, spreadsheet

Add a Comment

No Comments