Skip to main content

Working with large xml files in c# .net

Working with large (huge) xml files is always a pain in the … The reason? These files can’t be loaded in to memory. On my desktop, where I have 2 gigs memory, I can’t open the file in even notepad. I was presented with a challenge recently to manipulate one such large xml file. The xml file was of 550+ MB. I know many would say I have seen bigger xml files than this. But the heart of the matter is if I can’t open 550+ MB file in notepad or in xmldocument in c#, then I can’t open any file bigger than this. And hence the logic to play with these files would remain same.

The scenario: We have an xml file from which we want to remove a single node without removing its children. In the below sample xml fragment, the node has to be removed. The children nodes, must then be attached to ( node’s parent) node.








One
Two

100.22
GoodDay



3
4
Five

200.09
CrackJack






Proposed Solution: To start with, I tried to work with xmlDocument, because this is the easiest way to manipulate xml data. But as I mentioned earlier, the xml file size is too large (675 MB). When I try to pass the xml file and create an object of xmlDocument, the system stops responding for couple of minutes and then throws an out of memory exception. Then I switched to xmlReader and xmlWriter provided by .net. In the below code what I am trying to do is to read the xml file using xmlReader and then write to a new xml file using xmlWriter. The xmlReader reads the xml file one element at a time. In the process I remove node and add nodes to its parent .



///
/// This method creates xmlReader with the large xml file. It also creates new xml file for writing
/// processed xml data. In here, I loop through xmlReader (large xmlfile) and process it further.
///

private void XMLReaderRealScan1()
{
//take xmlReaderSettings to remove white space.
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
//XmlWriterSettings xws = new XmlWriterSettings();
//xws.Indent = true; //indenting would increase the file size. It went beyond 2 gigs.

XmlReader xR = XmlReader.Create("D:\\xmlfile\\large.xml", settings);
XmlWriter xW = XmlWriter.Create("D:\\xmlfile\\large_New.xml");

string sNode = "";
//read the xml file...
while (xR.Read())
{
//since xmlReader does not give handle on the node/attribute, we have to determine with
//the help of NodeType. I have mentioned possible nodetypes being used in our xml file.
switch (xR.NodeType)
{
case XmlNodeType.Element:
if (xR.Name == "Features")
{
//if it is features node, then dont add the node in the xmlwriter. instead call
//another method to write all its children nodes to parent node.
RemoveFeaturesNode(xR.ReadSubtree(), xW);
}
else
{
//write start element in the new xml file using xmlWriter.
xW.WriteStartElement(xR.Name);
//also write all the attributes in the node.
while (xR.MoveToNextAttribute())
xW.WriteAttributes(xR, false);
}
break;
case XmlNodeType.Text:
//write the text in the node
xW.WriteString(xR.Value);
break;
case XmlNodeType.CDATA:
break;
case XmlNodeType.ProcessingInstruction:
xW.WriteProcessingInstruction(xR.Name, xR.Value);
break;
case XmlNodeType.Comment:
xW.WriteComment(xR.Value);
break;
case XmlNodeType.Whitespace:
xW.WriteWhitespace(xR.Value);
break;
case XmlNodeType.SignificantWhitespace:
break;
case XmlNodeType.EndElement:
//if the nodeType is features, then dont write it to the processed xml file.
if (xR.Name != "Features")
{
xW.WriteEndElement();
}
break;
}
}
xW.Close();
}

///
/// this method will consider all the child nodes of node and write them to xml file using xmlwriter.
///

/// xmlreader.ReadSubTree must be passed
/// the xml writer to write xml file
private void RemoveFeaturesNode(XmlReader xmlrd, XmlWriter xW)
{
while (xmlrd.Read())
{
//the xR.ReadSubTree (which is passed to this method as xmlrd), will give node as well
//as all the children nodes. and hence we will have to omit node.
if (xmlrd.Name != "Features")
{
switch (xmlrd.NodeType)
{
case XmlNodeType.Element:
xW.WriteStartElement(xmlrd.Name);
while (xmlrd.MoveToNextAttribute())
xW.WriteAttributes(xmlrd, false);
break;
case XmlNodeType.Text:
xW.WriteString(xmlrd.Value);
break;
case XmlNodeType.EndElement:
xW.WriteEndElement();
break;
}
}
}
}


The Output: The output would look something like this. With this approach, we have achieved an one time activity to change the large xml file without taking it into memory.







One
Two
100.22
GoodDay


3
4
Five
200.09
CrackJack








Feel free to contact me in case you need help.
-Vighnesh Bendre

Comments

Anonymous said…
Hi, Vighnesh.

I'm trying to process a large XML file with XPath.
Can you help me? How can I proceed with this?

Regards
Anonymous said…
Hi Vighnesh,

I have large xml file and I want to write it into another xml file with some upadates i.e. by adding extra nodes. How can I proceed.My email is rekhaingulkar@gmail.com
Mark said…
Hello Vighnesh,
I have one small question, if u can help me...
I wonder why is neccessary to use the function 'RemoveFeaturesNode()', I mean when I found the node I want to remove, instead of calling RemoveFeaturesNode(), just do nothing, just don't write currenty node in XmlWriter.

Thanks.

Popular posts from this blog

Create list view - Conditional Formatting in SharePoint Designer 2010

In this example, we are going to format a column based on certain condition. Here I already have a list called Projects. I also have workflow associated with it. So whenever I create a new item in the list, workflow status column shows ‘In Progress’. Subsequently when the workflow in completed, the workflow status column shows ‘Completed’. For demonstration purpose, I will set the background color of workflow status column to yellow when the status is ‘In Progress’ and to green when the status is ‘Completed’. In SharePoint Designer open the site on which you are working. Click on ‘Lists and Libraries’ link. Choose the ‘Projects’ list. In SharePoint Designer Navigation, choose ‘Lists and Libraries’. In the list settings page, click on ‘New’ in ‘Views’ section. Provide appropriate name for the view and click OK. After choosing list, click on ‘New’ in the Views section. Give appropriate name to the list. Now click on any column, then in the ribbon, click on List View Tools-&g

SharePoint 2013 (SP 15) - Creating Custom Lists

As I am exploring SharePoint 2013, I found out that there are lot of things that are new and there are lot of thing that are old but presented in a different manner as compared to SharePoint 2010. For example, Site Actions was on the top left corner in SP 2010. But in SP 15 (SharePoint 2013) we dont have ‘Site Actions’ button. But instead we have a settings icon which is placed at top right corner. When you click on the settings icon select Add an App. This will basically allow you to add custom lists, documents libraries etc. Add an App is basically the same as More Options in SharePoint 2010. From now on custom lists & libraries will be called apps. Just like in Apple store or Android store you can develop apps for SharePoint and sell it. You can find more information here: http://officepreview.microsoft.com/en-us/store/apps-for-sharepoint-FX102804987.aspx When you click on Add an App you will be navigated to a different scree which will display different opti