Monday, December 29, 2008

Select DISTINCT on DataTable

In a project I'm doing I needed to basically do a DISTINCT on a couple fields in a DataTable that I had used earlier in code. Did some Google searching and came up with this MS KB on the subject. I was hoping to find it in the Framework, but hey, writing a little extra code never hurt anyway. Looking over the code though I found that I just didn't really like it. Biggest reason being it only took one field. I wanted to be able to pass in n number of fields, so I rewrote it and did a little tweaking to it. Here's what I came up with in both VB and C#
VB
Public Shared Function SelectDistinct(ByVal SourceTable As DataTable, ByVal ParamArray FieldNames() As String) As DataTable Dim lastValues() As Object Dim newTable As DataTable If FieldNames Is Nothing OrElse FieldNames.Length = 0 Then Throw New ArgumentNullException("FieldNames") End If lastValues = New Object(FieldNames.Length - 1) {} newTable = New DataTable For Each field As String In FieldNames newTable.Columns.Add(field, SourceTable.Columns(field).DataType) Next For Each Row As DataRow In SourceTable.Select("", String.Join(", ", FieldNames)) If Not fieldValuesAreEqual(lastValues, Row, FieldNames) Then newTable.Rows.Add(createRowClone(Row, newTable.NewRow(), FieldNames)) setLastValues(lastValues, Row, FieldNames) End If Next Return newTableEnd Function
Private Shared Function fieldValuesAreEqual(ByVal lastValues() As Object, ByVal currentRow As DataRow, ByVal fieldNames() As String) As Boolean Dim areEqual As Boolean = True For i As Integer = 0 To fieldNames.Length - 1 If lastValues(i) Is Nothing OrElse Not lastValues(i).Equals(currentRow(fieldNames(i))) Then areEqual = False Exit For End If Next Return areEqualEnd Function
Private Shared Function createRowClone(ByVal sourceRow As DataRow, ByVal newRow As DataRow, ByVal fieldNames() As String) As DataRow For Each field As String In fieldNames newRow(field) = sourceRow(field) Next Return newRowEnd Function
Private Shared Sub setLastValues(ByVal lastValues() As Object, ByVal sourceRow As DataRow, ByVal fieldNames() As String) For i As Integer = 0 To fieldNames.Length - 1 lastValues(i) = sourceRow(fieldNames(i)) NextEnd Sub
C#
private static DataTable SelectDistinct(DataTable SourceTable, params string[] FieldNames){ object[] lastValues; DataTable newTable; DataRow[] orderedRows; if (FieldNames == null FieldNames.Length == 0) throw new ArgumentNullException("FieldNames"); lastValues = new object[FieldNames.Length]; newTable = new DataTable(); foreach (string fieldName in FieldNames) newTable.Columns.Add(fieldName, SourceTable.Columns[fieldName].DataType); orderedRows = SourceTable.Select("", string.Join(", ", FieldNames)); foreach (DataRow row in orderedRows) { if (!fieldValuesAreEqual(lastValues, row, FieldNames)) { newTable.Rows.Add(createRowClone(row, newTable.NewRow(), FieldNames)); setLastValues(lastValues, row, FieldNames); } } return newTable;}
private static bool fieldValuesAreEqual(object[] lastValues, DataRow currentRow, string[] fieldNames){ bool areEqual = true; for (int i = 0; i < fieldNames.Length; i++) { if (lastValues[i] == null !lastValues[i].Equals(currentRow[fieldNames[i]])) { areEqual = false; break; } } return areEqual;}
private static DataRow createRowClone(DataRow sourceRow, DataRow newRow, string[] fieldNames){ foreach (string field in fieldNames) newRow[field] = sourceRow[field]; return newRow;}
private static void setLastValues(object[] lastValues, DataRow sourceRow, string[] fieldNames){ for (int i = 0; i < fieldNames.Length; i++) lastValues[i] = sourceRow[fieldNames[i]];}
I thought I'd also point out a downfall from the KB that I corrected in my example. The beginning of the For Each calls SourceTable.Select("", FieldName)) in it. This can hinder performance a bit in certain situations, because in C# that statement will be evaulated for every loop iteration. You can see in the C# code that I have I actually call the Select Method and store the results in a variable and use it in the loop instead. One other interesting thing to note about the different between the VB and the C# code is that I DID NOT put the results of the Select Method in a variable in the VB code even though I did in the C# code. This is because VB works differently in that the actual For Each loop only evaluates the first iteration of the loop, which is nice and saves and extra line or two of code. Another thing that's different between the two (sorry, I think the differences between the two are interesting) is that to instantiate a new Object Array in the lastValues variable in VB you have to put {} after it, and in C# you don't have to. This is because since VB uses the same characters for Methods vs. Indexers, the compiler wouldn't be able to tell the difference between calling a constructor with one parameter vs. creating a new array, so it's necessary. In C# it knows since you're using the indexer characters, you mean a new array, not a new object in this case. That said, if you like using the VB Functions, you could also just call ReDim lastValues(FieldNames.Length), but I tend to stay away from those since I do a lot of code in both VB and C#.

No comments: