Iterating through a scipy.sparse vector (or matrix)?

Edit: bbtrb's method (using coo_matrix ) is much faster than my original suggestion, using nonzero . Sven Marnach's suggestion to use itertools. Izip also improves the speed.

Current fastest is using_tocoo_izip.

Up vote 6 down vote favorite 5 share g+ share fb share tw.

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following: from scipy. Sparse import lil_matrix x = lil_matrix( (20,1) ) x13,0 = 1 x15,0 = 2 c = 0 for I in x: print c, I c = c+1 the output is 0 1 2 3 4 5 6 7 8 9 10 11 12 13 (0, 0) 1.0 14 15 (0, 0) 2.0 16 17 18 19 so it appears the iterator is touching every element, not just the nonzero entries.

I've had a look at the API docs.scipy.org/doc/scipy/reference/gener... and searched around a bit, but I can't seem to find a solution that works. Thanks in advance for your help. Python scipy sparse link|improve this question asked Nov 30 '10 at 21:46RandomGuy362211 80% accept rate.

Edit: bbtrb's method (using coo_matrix) is much faster than my original suggestion, using nonzero. Sven Marnach's suggestion to use itertools. Izip also improves the speed.

Current fastest is using_tocoo_izip: import scipy. Sparse import random import itertools def using_nonzero(x): rows,cols = x.nonzero() for row,col in zip(rows,cols): ((row,col), xrow,col) def using_coo(x): cx = scipy.sparse. Coo_matrix(x) for i,j,v in zip(cx.

Row, cx. Col, cx. Data): (i,j,v) def using_tocoo(x): cx = x.tocoo() for i,j,v in zip(cx.

Row, cx. Col, cx. Data): (i,j,v) def using_tocoo_izip(x): cx = x.tocoo() for i,j,v in itertools.

Izip(cx. Row, cx. Col, cx.

Data): (i,j,v) N=200 x = scipy.sparse. Lil_matrix( (N,N) ) for _ in xrange(N): xrandom. Randint(0,N-1),random.

Randint(0,N-1)=random. Randint(1,100) yields these timeit results: % python -mtimeit -s'import test' 'test. Using_tocoo_izip(test.

X)' 1000 loops, best of 3: 670 usec per loop % python -mtimeit -s'import test' 'test. Using_tocoo(test. X)' 1000 loops, best of 3: 706 usec per loop % python -mtimeit -s'import test' 'test.

Using_coo(test. X)' 1000 loops, best of 3: 802 usec per loop % python -mtimeit -s'import test' 'test. Using_nonzero(test.

X)' 100 loops, best of 3: 5.25 msec per loop.

Obviously it's better. – Kabie Nov 30 '10 at 21:59 How about using izip() instead of zip()? Should be faster for big matrices.

– Sven Marnach Dec 1 '10 at 12:47 @Sven Marnach: Thanks; indeed that is faster. – unutbu Dec 1 '10 at 13:44 nice, didn't know about izip(). Actually I'm a bit surprised that tocoo() is faster than the coo_matrix() constructor... – bbtrb Dec 1 '10 at 14:43.

The fastest way should be by converting to a coo_matrix: cx = scipy.sparse. Coo_matrix(x) for i,j,v in zip(cx. Row, cx.

Col, cx. Data): print "(%d, %d), %s" % (i,j,v).

1. You're right; for larger matrices this is much faster. – unutbu Dec 1 '10 at 3:27 Is it faster to convert and then iterate, or is this assuming that I can change to work with coo_matrix?

– RandomGuy Dec 1 '10 at 16:16 @scandido: this depends on what you are going to achieve. Coo_matrix is a very simple format and very fast to construct and access but might be ill-suited for other tasks. Here's an overview over the different matrix formats scipy.org/SciPyPackages/Sparse and especially section "constructing from scratch faster, with coo_matrix" – bbtrb Dec 1 '10 at 16:46.

I had the same problem and actually, if your concern is only speed, the fastest way (more than 1 order of magnitude faster) is to convert the sparse matrix to a dense one (x.todense()), and iterating over the nonzero elements in the dense matrix. (Though, of course, this approach requires a lot more memory).

I can't imagine the latter would be faster. But, of course using a dense matrix will be much, much faster if you have enough memory. – RandomGuy Dec 29 '10 at 16:58 I guess it depends on the scenario, and the kind of data.

I've been doing some prophiling on a script that iterates on matrices containing at least 50-100M boolean elements. When iterating, converting to dense and then iterating requires way less time than iterating using the 'best solution' from unutbu's answer. But of course the memory usage increases A LOT.

– Davide C Dec 29 '10 at 23:56.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions