![]() The len(indptr) = len(data) + 1 = len(indices) + 1 because for each row, we represent it with the start and end index (similar to how we index a list). The second row contains two values b, c which we then index from 1:3 and so on. In the below example, we see that the first row contains a single value a and so we index it with 0:1. This definition confuses me and I choose to interpret it this way: it tells us how many of the values are contained in each row. indptr: stands for index pointer and returns an array of row starts.In the diagram below, the first non-zero value occurs at row 0 column 5, hence 5 appears as the first value in the indices array, followed by 1 (row 1, column 1). indices: an array of column indices - starting from the first row (from left to right), we identify non-zero positions and return their indices in that row.data: the values of the non-zero values - these are the non-zero values that are stored within the sparse matrix.To efficiently represent a sparse matrix, CSR uses three numpy arrays to store some relevant information, including: compressed sparse column) is used for write-once-read-many tasks. Compressed Sparse Row (CSR)Įven though there are many types of sparse matrices in SciPy such as Dictionary of Keys (DOK) and List of List (LIL), I will only touch on Compressed Sparse Rows (CSR) because it is the most used and widely known format.ĬSR (and also CSC, a.k.a. of 7 runs, 10000 loops each) > 56Ĭlearly, the best performance in terms of both time and space is obtained when we store a sparse matrix with the sparse module. of 7 runs, 100 loops each) > 2880112 %timeit square(B_sparse) print(getsizeof(B_sparse)) > 187 µs ± 5.24 µs per loop (mean ± std. of 7 runs, 1 loop each) > 56 %timeit square(B_full) print(getsizeof(B_full)) > 2.18 ms ± 56.5 µs per loop (mean ± std. ![]() of 7 runs, 100 loops each) > 2880112 %timeit square(A_sparse) print(getsizeof(A_sparse)) > 409 ms ± 11.5 ms per loop (mean ± std. %timeit square(A_full) print(getsizeof(A_full)) > 6.91 ms ± 84.5 µs per loop (mean ± std. We then time these different matrices stored in these different forms and also how much memory they use. B_sparse = sparse.csc_matrix(B_full) # Create a square function to return the square of the matrix def square(A): return np.power(A, 2) B_full = np.diag(np.random.rand(600)) # Matrix 4: Store B_full as a sparse matrix. A_sparse = sparse.csc_matrix(A_full) # Matrix 3: Create a sparse matrix (stored as a full matrix). A_full = np.random.rand(600, 600) # Matrix 2: Store A_full as a sparse matrix (though it is dense). import numpy as np from scipy import sparse from sys import getsizeof # Matrix 1: Create a dense matrix (stored as a full matrix). using their row and column indices).īefore we dive into what CSR is, let’s compare the efficiency difference in using numpy arrays versus sparse matrices in both time and space complexity. The idea behind the implementation is simple: Instead of storing all values in a dense matrix, let’s just store the non-zero values in some format (e.g. In Python, sparse data structures are efficiently implemented in the scipy.sparse module, which are mostly based on Numpy arrays. How then do we represent these matrices? Introducing… SciPy’s Sparse Module ![]() We shall illustrate using an example below. In addition to space complexity, dense matrices exacerbate our runtimes as well. From a simple logic standpoint, it simply doesn’t make sense to store so many zeros!įrom a mathematical standpoint, if we have a 100,000 x 100,000 matrix, this will require us to have 100,000 x 100,000 x 8 = 80 GB worth of memory to store this matrix (since each double uses 8 bytes)! Time Complexity This is because a full array occupies a block of memory for each entry, so a n x m array requires n x m blocks of memory. When dealing with sparse matrices, storing them as a full matrix (from this point referred to as a dense matrix) is simply inefficient. To formalize these two constraints, they are known as time and space complexity (memory). Mac’s Activity Monitor (Source by Author)
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |