Thursday, March 11, 2010

Concatenating row values in a single statement using CROSS APPLY

One more SQL issue, one more solution provided. I feel good about it now ;). I am adding it to my collection because it really helps in reducing the duration to execute the process. It is about a new feature which was introduced way back in SQL 2005. CROSS APPLY.

The Scenario: Let me lay down the scenario, I have a table which has some 100 thousand + records. This is a very (deliberately) de normalized table. The structure and contents of the table are as below.




IDFirstNameEducation
1Vighnesh
BSC
KUD
1995


2Vighnesh


MSC
KUD
1997

3Vighnesh

PHD
MIT
2009

4Mark


BCOM
KUD
2000

5Mark
 
MBA
IIMB
2008

6Vicky


BE
VIT
2008




From the above table, I want to get only distinct records, with the education column concatenated. Something like this…



ID

FirstName

Education

1

Vighnesh

 BSCKUD
1995
MSC
KUD1997
PHDMIT2009

4

Mark


BCOMKUD
2000
MBAIIMB2008


6

Vicky


BE
VIT2008




This can be achieved with a simple query…


SELECT DISTINCT FirstName, CAST(Concat_Education as nvarchar(max)) as Education_ctn
FROM EducationHistory p1
CROSS APPLY
(
SELECT cast(Education + '' as xml)
FROM EducationHistory p2
where p2.Name = p1.Name
ORDER BY Education
FOR XML PATH(''), TYPE
)
V ( Concat_Education )


In the above query we are using FOR XML PATH(‘’), TYPE. If we don’t use this along with casting the Education column as xml (cast(Education + '' as xml)), we will get the output xml as “<” and “>”.
While searching for the possible solution, I searched the web with different approaches. This also shows the evolution in my thinking from start to end. Sample this:
1. assigning value to variable in sub query
2. can I use variable in sub query?
3. coalesce in sql server
4. combine value of 2 rows sql
5. combine value of 2 rows sql in DISTINCT query
6. concat subquery rows
7. concat using coalesce in subquery – here is where I got the answer (thanks to this blog)
8. FOR XML PATH('') < >
9. FOR XML PATH('') < with < get rid
10. FOR XML PATH('') is replacing < with <
11. FOR XML PATH('') sql server

Feel free to contact me in case you need help.
-Vighnesh Bendre

Thursday, March 4, 2010

Working with xml data type in MS SQL server

Previously I had worked with xml data type in SQL 2005. There I was serializing the object in to xml and putting it into xml variable in SQL table. And whenever needed I would pull the xml from SQL table and create an object (deserialize). But recently I was presented with a challenge to manipulate the xml in SQL (T-SQL) itself.

In this scenario, xml text was being stored in a table as nvarchar(max). Let’s say the column name is “xmlFrag”. A function was written to return rows based on a set parameter. But for the set parameter, if the result set returns multiple rows, the function would combine the xmlFrag column (from both rows) and return one xml fragment. Below is the example.

1st row of the resultset.




Gud Day Biscuits


Cashew nuts


Honey






2nd row of the resultset.




Britannia Marie


Wheat


Barley






Then the function would combine these 2 rows data into one single xml fragment. The combined xml fragment would look like…




Gud Day Biscuits


Cashew nuts


Honey




Britannia Marie


Wheat


Barley






If you notice the highlighted node, the order of the feature is repeating (1,2,1,2). This was creating a problem in processing the nodes. And hence I was asked to make the order unique (1,2,3,4).

We had 2 approaches, one to make changes in the c# code itself. Two, make changes in the SQL function. Regarding making changes in the c# code, it would have been a huge task because this feature was being used in many places. So I decided to go ahead with changing the SQL function itself. Below are the steps followed to achieve the same.

Step 1. As we already know the function will return merged xml fragment from 2 rows. In the first step we will have to take the merged xml fragment into an xml variable and count the number of nodes in the xml fragment. So the total nodes are 4 in our case.

Step 2. After we know the total number of nodes where we have to make changes, we will have to loop through the nodes and modify the same. In the second step, we will have to use the .modify method on the xml variable. In this scenario, we actually have 2 variants. One, which node we are pointing to. Two, what value we want to replace. In both situations, the @counter variable will be used.

So in the modify method, the first part is “replace value of”. In here we will have to point to the correct node. This can be achieved by pointing to the node in the hierarchy (/Product/Item/Features/Feature) and then pointing to the attribute of the node (/@Order). Here since there are 4 nodes, we will have to point to the single node by specifying the position. This can be achieved by [position()=sql:variable(“@counter”)].

The second part (replacing with) we will have to specify the counter variable. This can be achieved by sql:variable(“@counter”). And of course we will have to increment the @counter variable.


Below is the complete solution.


DECLARE @processed_xml as xml
DECLARE @counter as INT
DECLARE @node_count as INT
-- This is the previous return statement in function
-- RETURN '' + replace(replace(@new_text_all,'',''),'','') + ''

-- New code to add sequence number to order attribute of Feature node.
set @counter = 1;
set @processed_xml = '' + replace(replace(@new_text_all,'',''),'','') + ''

-- Count of Feature Nodes in xml
SELECT @node_count = @processed_xml.query(' {count(/Product/Item/Features/Feature)}').value('count[1]','int')

WHILE @counter <= @node_count
BEGIN
SET @processed_xml.modify('replace value of ((/Product/Item/Features/Feature/@Order)[position()=sql:variable("@counter")][1])
with sql:variable("@counter")')
SET @counter = @counter + 1
END

-- New return statement for function
RETURN cast(@processed_xml as nvarchar(max))


Challenges faced were related to using SQL variable in both replace value of clause as well as with clause. The final output would look like below.




Gud Day Biscuits


Cashew nuts


Honey




Britannia Marie


Wheat


Barley






Feel free to contact me in case you need help.
-Vighnesh Bendre

Working with large xml files in c# .net

Working with large (huge) xml files is always a pain in the … The reason? These files can’t be loaded in to memory. On my desktop, where I have 2 gigs memory, I can’t open the file in even notepad. I was presented with a challenge recently to manipulate one such large xml file. The xml file was of 550+ MB. I know many would say I have seen bigger xml files than this. But the heart of the matter is if I can’t open 550+ MB file in notepad or in xmldocument in c#, then I can’t open any file bigger than this. And hence the logic to play with these files would remain same.

The scenario: We have an xml file from which we want to remove a single node without removing its children. In the below sample xml fragment, the node has to be removed. The children nodes, must then be attached to ( node’s parent) node.








One
Two

100.22
GoodDay



3
4
Five

200.09
CrackJack






Proposed Solution: To start with, I tried to work with xmlDocument, because this is the easiest way to manipulate xml data. But as I mentioned earlier, the xml file size is too large (675 MB). When I try to pass the xml file and create an object of xmlDocument, the system stops responding for couple of minutes and then throws an out of memory exception. Then I switched to xmlReader and xmlWriter provided by .net. In the below code what I am trying to do is to read the xml file using xmlReader and then write to a new xml file using xmlWriter. The xmlReader reads the xml file one element at a time. In the process I remove node and add nodes to its parent .



///
/// This method creates xmlReader with the large xml file. It also creates new xml file for writing
/// processed xml data. In here, I loop through xmlReader (large xmlfile) and process it further.
///

private void XMLReaderRealScan1()
{
//take xmlReaderSettings to remove white space.
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
//XmlWriterSettings xws = new XmlWriterSettings();
//xws.Indent = true; //indenting would increase the file size. It went beyond 2 gigs.

XmlReader xR = XmlReader.Create("D:\\xmlfile\\large.xml", settings);
XmlWriter xW = XmlWriter.Create("D:\\xmlfile\\large_New.xml");

string sNode = "";
//read the xml file...
while (xR.Read())
{
//since xmlReader does not give handle on the node/attribute, we have to determine with
//the help of NodeType. I have mentioned possible nodetypes being used in our xml file.
switch (xR.NodeType)
{
case XmlNodeType.Element:
if (xR.Name == "Features")
{
//if it is features node, then dont add the node in the xmlwriter. instead call
//another method to write all its children nodes to parent node.
RemoveFeaturesNode(xR.ReadSubtree(), xW);
}
else
{
//write start element in the new xml file using xmlWriter.
xW.WriteStartElement(xR.Name);
//also write all the attributes in the node.
while (xR.MoveToNextAttribute())
xW.WriteAttributes(xR, false);
}
break;
case XmlNodeType.Text:
//write the text in the node
xW.WriteString(xR.Value);
break;
case XmlNodeType.CDATA:
break;
case XmlNodeType.ProcessingInstruction:
xW.WriteProcessingInstruction(xR.Name, xR.Value);
break;
case XmlNodeType.Comment:
xW.WriteComment(xR.Value);
break;
case XmlNodeType.Whitespace:
xW.WriteWhitespace(xR.Value);
break;
case XmlNodeType.SignificantWhitespace:
break;
case XmlNodeType.EndElement:
//if the nodeType is features, then dont write it to the processed xml file.
if (xR.Name != "Features")
{
xW.WriteEndElement();
}
break;
}
}
xW.Close();
}

///
/// this method will consider all the child nodes of node and write them to xml file using xmlwriter.
///

/// xmlreader.ReadSubTree must be passed
/// the xml writer to write xml file
private void RemoveFeaturesNode(XmlReader xmlrd, XmlWriter xW)
{
while (xmlrd.Read())
{
//the xR.ReadSubTree (which is passed to this method as xmlrd), will give node as well
//as all the children nodes. and hence we will have to omit node.
if (xmlrd.Name != "Features")
{
switch (xmlrd.NodeType)
{
case XmlNodeType.Element:
xW.WriteStartElement(xmlrd.Name);
while (xmlrd.MoveToNextAttribute())
xW.WriteAttributes(xmlrd, false);
break;
case XmlNodeType.Text:
xW.WriteString(xmlrd.Value);
break;
case XmlNodeType.EndElement:
xW.WriteEndElement();
break;
}
}
}
}


The Output: The output would look something like this. With this approach, we have achieved an one time activity to change the large xml file without taking it into memory.







One
Two
100.22
GoodDay


3
4
Five
200.09
CrackJack








Feel free to contact me in case you need help.
-Vighnesh Bendre