Simplest way to get the top n elements of a Scala Iterable?

My solution (bound to Int, but should be easily changed to Ordered (a few minutes please).

My solution (bound to Int, but should be easily changed to Ordered (a few minutes please): def top (n: Int, li: List Int) : ListInt = { def updateSofar (sofar: List Int, el: Int) : List Int = { // println (el + " - " + sofar) if (el _) else sofar } /* better readable: val sofar = li. Take (n). SortWith (_ > _) val rest = li.

Drop (n) (sofar /: rest) (updateSofar (_, _)) */ (li. Take (n). SortWith (_ > _) /: li.

Drop (n)) (updateSofar (_, _)) } usage: val li = List (4, 3, 6, 7, 1, 2, 9, 5) top (2, li) For above list, take the first 2 (4, 3) as starting TopTen (TopTwo). Sort them, such that the first element is the bigger one (if any). Repeatedly iterate through the rest of the list (li.

Drop(n)), and compare the current element with the maximum of the list of minimums; replace, if neccessary, and resort again. Improvements: Throw away Int, and use ordered. Throw away (_ > _) and use a user-Ordering to allow BottomTen.(Harder: pick the middle 10 :) ) Throw away List, and use Iterable instead update (abstraction): def extremeN T(n: Int, li: List T) (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)): ListT = { def updateSofar (sofar: List T, el: T) : List T = if (comp1 (el, sofar.

Head)) (el :: sofar. Tail). SortWith (comp2 (_, _)) else sofar (li.

Take (n) . SortWith (comp2 (_, _)) /: li. Drop (n)) (updateSofar (_, _)) } /* still bound to Int: def top (n: Int, li: List Int) : ListInt = { extremeN (n, li) ((_ _)) } def bottom (n: Int, li: List Int) : ListInt = { extremeN (n, li) ((_ > _), (_ Boolean), comp2: ((T, T) => Boolean)): ListT = { def sortedIns (el: T, list: ListT): ListT = if (list.

IsEmpty) List (el) else if (comp2 (el, list. Head)) el :: list else list. Head :: sortedIns (el, list.

Tail) def updateSofar (sofar: List T, el: T) : List T = if (comp1 (el, sofar. Head)) sortedIns (el, sofar. Tail) else sofar (li.

Take (n) . SortWith (comp2 (_, _)) /: li. Drop (n)) (updateSofar (_, _)) } top/bottom method and usage as above.

For small groups of top/bottom Elements, the sorting is rarely called, a few times in the beginning, and then less and less often over time. For example, 70 times with top (10) of 10 000, and 90 times with top (10) of 100 000.

Thanks for your solution. It's short and easy to understand. I adopted it to work with Iterables of arbitrary types.

Unfortunately, stackoverflow does not allow me to put my code in this comment. – Stefan Endrullis Apr 15 '11 at 15:31 Adopted the original, or the second solution with extremN? You may append to your question (but mark it '# update:', to make the discussion understandable) or open an answer yourself.

– user unknown Apr 15 '11 at 15:46 I used your original version. I will append the code to the question. – Stefan Endrullis Apr 15 '11 at 16:09 Meanwhile I updated my lower solution a bit farther; top and bottom use Ordering too.

– user unknown Apr 15 '11 at 16:15 Instead of sorting, you should do sorted insertion in the tail. It will be much faster. – Daniel C.

Sobral Apr 15 '117 at 0:52.

You don't need to sort the entire collection in order to determine the top N elements. However, I don't believe that this functionality is supplied by the raw library, so you would have to roll you own, possibly using the pimp-my-library pattern. For example, you can get the nth element of a collection as follows: class PimpA, Repr error("Cannot get " + n + " element of empty collection") case Some(piv) => trav.

Foreach { a => val cf = ord. Compare(piv, a) if (cf == 0) etp ::= a else if (cf > 0) ltp ::= a else mtp ::= a } if (n if (ord. Compare(tt, elem) if (ord.

Compare(tt, elem) == 0) { be += tt soFar += 1 } if (soFar == n) break } } } b.result() } Unfortunately I'm having trouble getting this pimp to be discovered via this implicit: implicit def t2nA, Repr List(4, 3, 6, 7, 1, 2, 8, 5). TopN(4) :9: error: could not find implicit value for evidence parameter of type (ListInt) => scala.collection. TraversableLikeA,ListInt List(4, 3, 6, 7, 1, 2, 8, 5).

TopN(4) ^ However, the code actually works OK: scala> new Pimp(List(4, 3, 6, 7, 1, 2, 8, 5)). TopN(4) res3: ListInt = List(3, 1, 2, 4) And scala> new Pimp("ioanusdhpisjdmpsdsvfgewqw"). TopN(6) res2: java.lang.

String = adddfe.

Yet another version: val big = (1 to 100000) def maxesA(n:Int)(l:TraversableA)(implicit o:OrderingA) = l. FoldLeft(collection.immutable.SortedSet. EmptyA) { (xs,y) => if (xs.

Size import o. _ if (xs. Size Sort(lt _) else { val first = xs.

Head if (first.

If the goal is to not sort the whole list then you could do something like this (of course it could be optimized a tad so that we don't change the list when the number clearly shouldn't be there): List(1,6,3,7,3,2). FoldLeft(ListInt()){(l, n) => (n :: l).sorted. Take(2)}.

Sorted.reverse. Take(2) to get the top 2. It's simple but I'm not sure of the efficiency as sorted is built on top of java.util.Arrays.

Sort, so this may create a lot of temp arrays. – huynhjl Apr 15 '11 at 12:08 And also it fails if the goal was to avoid sorting the complete list. – thoredge Apr 15 '11 at 12:30 Just to be sure we are on the same page, your solution returns the bottom 2 and this is why I mentioned reverse to make your solution work.

– huynhjl Apr 15 '11 at 12:50 You're correct. I misunderstood your first comment. – thoredge Apr 15 '11 at 13:01.

I implemented such an ranking algorithm recently in the Rank class of Apache Jackrabbit (in Java tough). See the take method for the gist of it. The basic idea is to quicksort but terminate prematurely as soon as the top n elements have been found.

Oh, that's good to know. It's probably the most efficient solution. The only drawback I see is that it generates an array with all elements of the Iterable.

Thus, in extreme cases it may not be applicable because of restricted memory. – Stefan Endrullis Apr 15 '11 at 15:46.

For small values of n and large lists, getting the top n elements can be implemented by picking out the max element n times: def topT(n:Int, iter:IterableT)(implicit ord: OrderingT): IterableT = { def partitionMax(acc: IterableT, it: IterableT): IterableT = { val max = it. Max(ord) val (nextElems, rest) = it. Partition(ord.

Gteq(_, max)) val maxElems = acc ++ nextElems if (maxElems. Size >= n || rest. IsEmpty) maxElems.

Take(n) else partitionMax(maxElems, rest) } if (iter. IsEmpty) iter. Take(0) else partitionMax(iter.

Take(0), iter) } This does not sort the entire list and takes an Ordering. I believe every method I call in partitionMax is O(list size) and I only expect to call it n times at most, so the overall efficiency for small n will be proportional to the size of the iterator. Scala> top(5, List.

Range(1,1000000)) res13: IterableInt = List(999999, 999998, 999997, 999996, 999995) scala> top(5, List. Range(1,1000000))(OrderingInt. On(- _)) res14: IterableInt = List(1, 2, 3, 4, 5) You could also add a branch for when n gets close to size of the iterable, and switch to iter.toList.

SortBy(_. MyAttr). Take(n).

It does not return the type of collection provided, but you can look at How do I apply the pimp-my-library pattern to Scala collections? If this is a requirement.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions