Mathematica fast 2D binning algorithm?

I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use Round Floor or Ceiling (with a second argument) in place of Nearest the code below will be much faster. On my system, it tests faster than the GatherBy solution also posted.

I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use Round, Floor, or Ceiling (with a second argument) in place of Nearest, the code below will be much faster. On my system, it tests faster than the GatherBy solution also posted.

Assuming I understand your requirements, I propose: data = RandomReal100, {75, 3}; bins = {0, 20, 40, 60, 80, 100}; Reap Sow{#3, #2}, bins ~Nearest~ # & @@@ data, bins, ReapSow#, bins ~Nearest~ #2 & @@@ #2, bins, Tr@#2 &2 & 2 ~Flatten~ 1 ~Total~ {3} // MatrixForm Refactored: fbins_ := ReapSow{##2}, bins ~Nearest~ #& @@@ #, bins, #22 & bin2Ddata_, X_, Y_ := fXdata, fY#2, #2~Total~2 & & ~Flatten~ 1 ~Total~ {3} Use: bin2Ddata, xbins, ybins.

Your code is really difficult to follow! – Szabolcs Nov 18 at 8:48 @Szabolcs, can you suggest how I might make it more readable? – Mr.Wizard Nov 18 at 8:55 I will once I figure out why it doesn't give the same result as mine ... those numbers in bins, they are the centres of the bins, right?

– Szabolcs Nov 18 at 9:03 2 My first comment was just a cry of frustration, not directly aimed at you :-) But seriously, I do think it's objectively hard to read (despite "easy to read" being a subjective and cultural thing, depending on what we're used to). Take for example the line ReapSow#1, bins~Nearest~#2 & @@@ #2, bins, Tr@#2 &. There are three occurrences of #2, each of which are arguments of a different function.

There's the misleading use of Tr as a shorthand for Total. Unorthodox use of inline notation. – Szabolcs Nov 18 at 9:34 2 I'd like to suggest a change.

Using Nearest is fine, but it must do a lot of preprocessing every time it is used. I'd suggest wrapping everything in f with With{nn=Nearestbins}, ... and use nn# instead of bins ~Nearest~ #. This does all the preprocessing required once, and on my machine over 10^6 points it shaves off nearly 7 seconds.

– rcollyer Nov 18 at 13:49.

Here is a method based on Szabolcs's post that is about about an order of magnitude faster. Data = RandomReal5, {500000, 3}; (*500k values*) zvalues = dataAll, 3; epsilon = 1*^-10;(*prevent 101 index*) (*rescale and round (x,y) coordinates to index pairs in the 1..100 range*) indexes = 1 + Floor(1 - epsilon) 100 RescaledataAll, {1, 2}; res2 = Module{gb = GatherByTranspose{indexes, zvalues}, First}, SparseArray gbAll, 1, 1 -> TotalgbAll, All, 2, {2}; // AbsoluteTiming Gives about {2.012217, Null} AbsoluteTiming System`SetSystemOptions "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}; res3 = SparseArrayindexes -> zvalues; System`SetSystemOptions "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}; Gives about {0.195228, Null} res3 == res2 True "TreatRepeatedEntries" -> 1 adds duplicate positions up.

I am having trouble understanding what it does. – acl Nov 20 at 17:54 1 If you have rules {2,3}->1, {2,3}->10 and, say, {2,3}->2 the value at mat2,3 will be 1+10+2; "TreatRepeatedEntries" means to add up the value if position entries are repeated. The default behavior of SparseArray is to not do that.

TreatRepeatedEntries"->1 activates that feature - with this you can write very efficient code. – ruebenko Nov 20 at 18:07 1 That's very useful to know, thanks. A shame it's undocumented though... – acl Nov 20 at 18:14 Why are these kinds of things undocumented?

You showed SparseArray properties before, and I believe I asked if there were other similar useful undocumented features, but this was not forthcoming. Since these are not in the official documentation, but you are willing to divulge them, would you please consider putting together an index of these hidden but powerful methods? – Mr.Wizard Nov 21 at 1:32 1 Id did show this on several occasions, at least once on MathGroup: forums.wolfram.

Com/mathgroup/archive/2010/Dec/msg00335.html. – ruebenko Nov 21 at 10:44.

Here's my approach: data = RandomReal5, {500000, 3}; (* 500k values *) zvalues = dataAll, 3; epsilon = 1*^-10; (* prevent 101 index *) (* rescale and round (x,y) coordinates to index pairs in the 1..100 range *) indexes = 1 + Floor(1 - epsilon) 100 RescaledataAll, {1, 2}; (* approach 1: create bin-matrix first, then fill up elements by adding zvalues *) res1 = Module {result = ConstantArray0, {100, 100}}, Do AddToresult##, zvaluesi & @@ indexesi, {i, Lengthindexes} ; result ; // Timing (* approach 2: gather zvalues by indexes, add them up, convert them to a matrix *) res2 = Module{gb = GatherByTranspose{indexes, zvalues}, First}, SparseArraygbAll, 1, 1 -> (Total /@ gbAll, All, 2) ; // Timing res1 == res2 These two approaches (res1 & res2) can handle 100k and 200k elements per second, respectively, on this machine. Is this sufficiently fast, or do you need to run this whole program in a loop?

– Mr.Wizard Nov 18 at 8:56 I fixed Part. With AddTo, it just avoids some parenthesis. I tend to use this form when I put Set inside a Function, but I don't have strong feelings about the style.

– Szabolcs Nov 18 at 9:02 This appears to be much faster than what I did, but can you easily handle irregular bins? (I used Nearest for this reason.) – Mr.Wizard Nov 18 at 9:08 Total /@ gbAll, All, 2 is equivalent to TotalgbAll, All, 2, {2}. – ruebenko Nov 180 at 17:26.

Which is perfect for a problem like this one. Data = RandomReal100, {75, 3}; bins = Range0, 100, 20; binMiddles = (Most@bins + Rest@bins)/2; nearest = NearestbinMiddles; SelectEquivalents data , TagElement -> ({First@nearest#1, First@nearest#2} &) , TransformElement -> (#3 &) , TransformResults -> (Total#2 &) , TagPattern -> FlattenOuterList, binMiddles, binMiddles, 1 , FinalFunction -> (PartitionFlatten# /. {} -> 0, LengthbinMiddles &) If you would want to group according to more than two dimensions you could use in FinalFunction this function to give to the list result the desired dimension (I don't remember where I found it).

InverseFlattenl_,dimensions_:= FoldPartition#, #2 &, l, MostReversedimensions.

As I mentioned on Mr.Wizard's solution, you'll want to use NearestFunctions instead of reinvoking Nearest everytime. In other words, calculate Nearestxbins and Nearestybins first, and you get a non-trivial speed-up. – rcollyer Nov 18 at 14:01 Also, why TransformResults -> (Append#1, Total#2 &) instead of TransformResults -> ({#1, Total#2} &)?

– rcollyer Nov 18 at 14:03 2 Because #1 is a list and I want a "plotable" result. – Faysal Aberkane Nov 18 at 14:36 Absolutely. I should have seen that myself.

– rcollyer Nov 18 at 14:44 The OP is talking about lists of bin boundaries. You are using lists of bin midpoints. – Sjoerd C.De Vries Nov 18 at 21:42.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions