最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c# - Count duplicate rows in a DataTable - Stack Overflow

programmeradmin0浏览0评论

I want to perform a select distinct on a DataTable using Columns that are stored in string array: string[] columnsToBeUnique.

This is what I have at the moment but it doesn't return any values...

                var result = dataTable1
                    .AsEnumerable()
                    .DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))
                    .ToArray();

Could someone assist me?

I want to perform a select distinct on a DataTable using Columns that are stored in string array: string[] columnsToBeUnique.

This is what I have at the moment but it doesn't return any values...

                var result = dataTable1
                    .AsEnumerable()
                    .DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))
                    .ToArray();

Could someone assist me?

Share Improve this question edited yesterday Panagiotis Kanavos 132k16 gold badges203 silver badges265 bronze badges asked yesterday Rico StrydomRico Strydom 6111 gold badge9 silver badges28 bronze badges 6
  • 1 Please supply a minimal reproducible example. – Gert Arnold Commented yesterday
  • 2 What results do you expect to begin with? LINQ's DistinctBy is very, very different from SQL's DISTINCT. While SQL's DISTINCT will return distinct rows, LINQ's DistincBy will return the first row for each set of key values. In either case, you DON'T need to concatenate column names – Panagiotis Kanavos Commented yesterday
  • 2 If you want distinct rows in the SQL sense, you don't need LINQ at all. An ADO.NET DataTable has methods for filtering,sorting, project columns and ways to create a DataTableView. That view can in turn be converted to a table with ToTable, optionally with distinct values. So dataTable1.DefaultView().ToTable(true) will return distinct rows in the SQL sense. You can select and return only specific columns with dataTable1.DefaultView.ToTable(true,columnsToBeUnique) – Panagiotis Kanavos Commented yesterday
  • 1 If you want LINQ's DistinctBy, remember you're working with DataRow objects. x is a DataRow. You get specific column values with x["someName"]. You can get the desired row values with columnsToBeUnique.Select(name=>x[name]), eg .DistinctBy(x => columnsToBeUnique.Select(name=>x[name]).ToArray()) – Panagiotis Kanavos Commented yesterday
  • 1 @RicoStrydom I just want a record count of the duplicate record that's a completely different question. In SQL you'd do a GROUP BY ... HAVING COUNT(*)>1. Same in ADO.NET and LINQ. DISTINCT and DISTINCT BY will return single rows as well, not just duplicates. – Panagiotis Kanavos Commented yesterday
 |  Show 1 more comment

2 Answers 2

Reset to default 1

To count duplicates in SQL you'd use GROUP BY ... HAVING COUNT(*)>1, not DISTINCT. DISTINCT returns single rows, not just duplicates.

In .NET 9 CountBy can be used as a shortcut :

var dups=dataTable1.CountBy(row => keyCols.Select(name=>row[name]).ToArray())
                   .Where(pair=>pair.Value>1);
var dup_count=duplicates.Count();

The code could be cleaned up a bit by creating an extension method to return the values of selected columns in a row :

public static object[] GetColumnValues(this DataRow,string[] columns)
{
    return columns.Select(name=>row[name]).ToArray();
}
...

var dups=dataTable1.CountBy(row => row.GetColumnValues(keyCols))
                   .Where(pair=>pair.Value>1);
var dup_count=duplicates.Count();

keyCols.Select(name=>x[name]).ToArray() collects the the values of all the key columns in a row. It works because AsEnumerable() returns an IEnumerable<DataRow>. In turn, DataRow has an Item[] indexer that allows accessing values by column name or index.

In previous .NET versions we'd need GroupBy to group the columns, then a Select to return each group's row count :

var dups=dataTable1.GroupBy(row=>row.GetColumnValues(keyCols))
                   .Select(g=>new {Key=g.Key,Count=g.Count()})
                   .Where(pair=>pair.Count>1);
var dup_count=duplicates.Count();

If the question was how to get distinct rows from the DataTable, there would be no need for LINQ :

var uniques=dataTable1.DefaultView.ToTable(true,columnsToBeUnique);

A DataTable already allows filtering and sorting. It's also possible to create DataView objects that show a filtered and sorted subset of the data. The view contents can be copied into a new DataTable with DataView.ToTable(bool distinct, params string[] columnNames), possibly discarding duplicates.

Your approach doesn't work because you're just checking if the table contains columns with the given names. Furthermore you are concatenating your column names with comma, which doesn't make any sense:

.DistinctBy(x => x.Table.Columns.Contains(string.Join(",", columnsToBeUnique)))

You want to remove all duplicate rows according to these columns. So this apporach should work:

var result = dataTable1
    .AsEnumerable()
    .DistinctBy( row => string.Join( separator, columnsToBeUnique.Select( c => row[c]?.ToString() ?? string.Empty ) ) )
    .ToArray();

However, this is not fail safe, for example if the separator is contained in any of the row's fields, you would get a wrong result. So a more robust approach is to use a custom IEqualityComparer<DataRow>:

public class DataRowComparer : IEqualityComparer<DataRow>
{
    private readonly string[] _columnsToBeUnique;

    public DataRowComparer(string[] columnsToBeUnique)
    {
        _columnsToBeUnique = columnsToBeUnique;
    }

    public bool Equals(DataRow? x, DataRow? y)
    {
        if (x == null && y == null) return true;
        if (x == null || y == null) return false;

        foreach (string column in _columnsToBeUnique)
        {
            if (!Equals(x[column], y[column]))
            {
                return false;
            }
        }
        return true;
    }

    public int GetHashCode(DataRow obj)
    {
        int hash = 17;
        foreach (string column in _columnsToBeUnique)
        {
            object value = obj[column];
            hash = hash * 23 + (value?.GetHashCode() ?? 0);
        }
        return hash;
    }
}

Now you can use that for the Distinct:

var result = dataTable1
    .AsEnumerable()
    .Distinct(new DataRowComparer(columnsToBeUnique))
    .ToArray();
发布评论

评论列表(0)

  1. 暂无评论