x. I..." />

How to distinct a list using LINQ?

The grouping is easy enough, but doing an efficient "MinBy" with standard LINQ to Objects is slightly messy: var lowestByID = items. GroupBy(x => x. ID) .

Select(group => group. Aggregate((best, next) => best. ExpirationTime x.ID) .

Select(group => group. MinBy(x => x. ExpirationTime)).

At last, the right answer! – LukeH Feb 9 at 15:51 +1 This is the most efficient way. – David B Feb 9 at 15:51 @LukeH: I might be wrong, but I think I picked up the aggregate trick for O(n) MinBy from one of your answers.

:) – Ani Feb 9 at 15:55 3 Using Aggregate to implement Max, very clever! – ohadsc Feb 9 at 15:57 2 @David B: It's the most efficient in the sense that it's O(N); however, I don't think it's actually the most efficient way. See my answer for a more efficient (but less "clean"-looking) approach.

– Dan Tao Feb 9 at 17:38.

Linq Distinct on a particular Property Simple! You want to group them and pick a winner out of the group. List distinctEvents = allEvents .

GroupBy(e => e. Id) . Select(g => g.

OrderBy(e => e. ExpirationTime).First()) .ToList().

1 Nice! However note that sorting is o(nlogn) whereas max is o(n) – ohadsc Feb 9 at 15:53 @ohadsc You are correct. I'm deliberately trading away a little performance for ease of use/read.

Also - one would expect each group to be quite a bit smaller than the total list, so these mini- orderings are faster than ordering the whole list. – David B Feb 9 at 15:56.

Assuming you can implement IComparable on your Event class (since LINQ's Min doesn't have an overload returning the original item otherwise), you can do: var distinct = events. GroupBy(evt => evt. Id).

Select(grp => grp.Min()); Example: void Main() { var events = new List { new Event(1, DateTime. Now), new Event(1, DateTime.Now. AddDays(1)), new Event(2, DateTime.Now.

AddDays(2)), new Event(2, DateTime.Now. AddDays(-22)), }; var distinct = events. GroupBy(evt => evt.Id).

Select(grp => grp.Min()); } public class Event : IComparable { public Event(int id, DateTime exp) { Id = id; Expiration = exp; } public int Id {get; set;} public DateTime Expiration {get; set;} public int CompareTo(Event other) { return Expiration. CompareTo(other. Expiration); } }.

Using Min this way is pretty cool. +1 – David B Feb 9 at 16:02.

I believe this should outperform the GroupBy suggestion (see brief explanation below): IEnumerable DistinctEvents(IEnumerable events) { var dict = new Dictionary(); foreach (Event e in events) { Event existing; if (!dict. TryGetValue(e. Id, out existing) || e.

ExpirationTime (very similar to a Dictionary>) which actually populates internal collections with the contents of the input sequence. This requires more memory and also has a performance impact, particularly due to the fact that while the sub-collections will have amortized O(1) insertion time, they will occasionally need to resize themselves, which will be O(N) (where N is the size of the sub-collection). This is not a big deal, but it's still a lot more work you really need to be doing.

A consequence of point #1 is that this in turn requires iterating over each element in the input sequence before GroupBy can provide an enumerator (so it's deferred execution, but then the entire input sequence needs to be iterated before iterating over the result of GroupBy). Then you're iterating over each group again in the call to Aggregate; so in all, you're iterating over the elements in the input sequence twice, which is more times than necessary to accomplish the task at hand. As I said, the algorithmic complexity is the same, which means the two approaches should be equally scalable; this one is simply faster.

I took the liberty of testing both approaches (out of curiosity, mostly) and found the above to execute in roughly half the time and cause fewer GC collections (a rough approximation of memory usage) than the GroupBy approach. These are minute concerns, which it would normally be a waste of time to think too much about. The only reason I mention them is that you asked for an efficient solution (and even bolded the term); so I figured you would want to take these kinds of factors into consideration.

1 Nice, that's a lot of effort; benchmarking and all. (This is one of the issues with the 'information pipeline' in LINQ to Objects, the operators don't have big-picture knowledge, so the entire query cannot be optimized on that basis) – Ani Feb 9 at 17:42 @Ani: Yeah, and to be fair I see that the OP did specifically ask for a "LINQ query"; my answer doesn't really fit that description. I always find it a little odd, though, when developers seek to find the most "efficient" solution to a problem and add the requirement that it must use LINQ (kind of like "I want the best tool for this job, and that tool must be a hammer").

As for the benchmarking, it's something I do so often I just have a little sandbox project with all the benchmarking tools included; essentially I pop in delegates and see how they perform over a bunch of iterations. – Dan Tao Feb 9 at 17:51 @Ani: ...which isn't to say that I don't spend too much time on SO (I clearly do)! – Dan Tao Feb 9 at 17:52 @Dan: And there's no reason that you couldn't make this method into a generic PartitionedMinBy extension method that could be usable in a LINQ query.

Accept partitionKeySelector and compareKeySelector delegates as arguments and away you go... – LukeH Feb 9 at 18:02 @LukeH: You're totally right; I guess I just felt that this was such specialized behavior that a generic version might be more trouble than it's worth (the ol' YAGNI principle). For instance I would think an ideal generic version, in addition to two selector functions, would also accept an optional IEqualityComparer for the key selector and an IComparer for the value selector. And I doubt it would get used much.

But you're right that it's definitely doable. – Dan Tao Feb 9 at 18:07.

Events. GroupBy(e => e. ID).

Select(g => new { ID = g. Key, Time = g. Min(e => e.

ExpirationTime) }).

2 This doesn't return Events. – David B Feb 9 at 15:59.

I think this should do it: events. GroupBy(x => x. ID, (key, items) => items.

First(y => y. ExpirationTime == items. Min(z => z.

ExpirationTime))) Will group by ID, selecting as the result the event in items (where items represents all the events with the same ID) with the smallest ExpirationTime.

It will not Distinct, because 1) Where produces IEnumerable, so you have to flatten by SelectMany 2) Where can include several Events that have same ExpirationDate – Andrey Feb 9 at 15:51 2 Where(Min) is O(n^2) – David B Feb 9 at 16:00 You're right, but First should also work. – Kirk Woll Feb 9 at 16:01.

List events = null; events . GroupBy( e => e. ID ) .

Select( g => g. First( e => e. ExpirationTime == g.

Max( t => t. ExpirationTime ) ) ).

Nice, will however require at most 2 passes on the list, as opposed to max which requires 1 – ohadsc Feb 9 at 15:55.

Var distinct = events. GroupBy(evt => evt.Id). New Event(1, DateTime.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions