This is one of my university projects! 🤓
I was asked to create a C++ library for handling sparse matrices... using the standard and boost libraries only! 😱
This header-only library supports both row-wise and column-wise ordering, efficient matrix transposition, matrix-matrix and matrix-vector multiplication... and much more! You are free the explore the code! 🧑💻
The parallelization is achieved with std::algorithm and tbb (parallelization library from Intel), since MPI and OpenMP were not allowed during this challenge. 📈
The full assignment for this university project is available here.
⚠️ NOTE: Carefully read the LICENSE before modifying or redistributing any part of this code!
NOTE: the entire code base has been tested and compiled using mk modules. Please, make sure you have mk modules enabled before proceeding!
In addition, the following modules are required:
module load gcc-glibc
module load boost
module load tbb
The code can be compiled through usual Makefile, which generates the executable:
./main
The Makefile supports the following phony targets:
-
make/make all: produces the basic executable, which loads only one matrix-market file (data/lnsp_131.mtx) and then performs a hard test on all the matrix functionalities.-march=native -O3flags are used, andNDEBUGmacro is declared. For each test, a table is printed out, reporting a brief description of the test, how many times the test is performed and the average timings in microseconds. For further information, see the sections about code base and results; -
make optimized_test: produces an optimized version of the executable, meaning that the flags-march=native -O3are employed when compiling and the macroNDEBUGis declared, in order to disable all the checks performed by theassertstatements included in the code. Moreover, this target will also enable all the hard tests to correctly assess the program performance on the current machine. Theses tests are carried on all the matrices located in data, plus an additional test on a tridiagonal matrix of size$1e6$ x$1e6$ (results are discussed in the corresponding section); -
make test: same asoptimized_test, but it will also perform some soft tests to ensure basic functionalities.NDEBUGis not declared, meaning that all theassertstatements are enabled throughout the code. This target is clearly not recommended to assess the performance of the code, but it is a useful tool for advanced debugging; -
make debug: it will produce a fast executable, which will cover almost all the code (except for the hard tests).NDEBUGis not declared and the optimization are all disabled (-O0 -g --coverageflags are being employed). Useful for basic and fast debugging, but not for performance evaluation; -
make run: start the produced executable (equivalent to./main); -
make doc: createhtmldocumentation at doc/html/index.html using doxygen.
The code structure is analyzed below.
This source file contains just the main program, which launches all the softs/hard tests. In the first section, the basic functionalities are tested, including new entries insertion into the matrix and the stream operator. The elements are inserted following a tridiagonal pattern, such that each i-th row appears like:
The second section will start all the soft tests, while the last section will perform more intensive tests to obtain sufficiently reliable timings.
This folder contains all the header files, used to both define the Matrix class and to perform tests on different matrix formats. This is indeed a header-only code, which can be imported only by copying this include folder. The header files containing the core functionalities are listed below:
-
include/sparse_matrix.hpp: This file declares the
algebra::Matrixclass, which must be instantiated by providing the following template parameters:-
T: the type of stored data, which must satisfy the requirements given by the conceptalgebra::type_traits::numeric(i.e. it can be either an integral type, a floating point or astd::complex); -
SO: the selected storage, which can is set by the enum classStorageOrder(providing the optionsROWWISEorCOLWISE).
This class can be constructed by proving the number of rows and of columns (which, however, will be updated automatically if the user inserts new elements using the
operator()or loads the matrix from a Matrix-Market file by calling theread_frommethod). In addition, the matrix size can be changed dynamically through theresizemethod (and out-of-range entries are discarded).Regarding the class structure, the
Matrixclass stores the uncompressed and compressed data as privateshared_ptr(the container types are discussed later). The choice of using shared pointers has been made to allow the default copy-assignment (operator=) to perfom a shallow copy of the matrix, by just copying the pointers to the data. In addition, this design is crucial fortransposemethod implementation, which indeed returns a tranpose view of the original matrix, without deep-copying data. However, a deep copy can be enforced by calling the methoddeep_copy.Some of the
Matrixmethods support parallelization, achieved by means of standard algorithms, combined withtbbmodules. The parallelization can be enabled dynamically by calling the methodenable_parallel(and disabled bydisable_parallel). Below a list of all the methods/friend functions allowing for parallel execution:-
compress: compresses either to CSR format (if the selected storage order isROWWISE) or to CSC format (only forCOLWISEordered matrices); -
norm: the norm can be specified through a template parameter of typeNormType- thanks to the efficiency of the methodtranpose, infinity-norm is implemented as the 1-norm applied to the transpose matrix; -
operator*: parallelized only when multiplying by astd::vector.
Instead, the methods/friend functions supporting only sequential execution are:
-
uncompress: convert back the matrix to COO-map format; -
operator*: only sequential in all the overloads for matrix-matrix multiplication.
In both the functions above, the limitation preveting parallelization is found in the COO data writing. Indeed, this COO map (which is indeed a
boost::container::flat_map) cannot be written by multiple threads concurrently.Some private methods are then defined:
-
iterate_over_comp_data,iterate_over_uncomp_data,iterate_over_elements: these methods are used in most ofMatrixcode to iterate over matrix data and to perform tasks on them (they are declared to improve readability and to define a standard way to loop over matrix elements); -
check_data_integrity: enabled only whenNDEBUGmacro is not declared, useful to check integrity after performing tasks that may alternate/invalidate data (such as compressing/uncompression, conversion from CSC to CSR, etc).
Finally, the
operator*overload for matrix-matrix multiplication receives only compressed formats in input and returns a COO matrix. Its implementation can be then analyzed case-by-case:-
$\textbf{CSR} \cdot \textbf{CSR}$ and$\textbf{CSC} \cdot \textbf{CSR}$ : these are the most straight-forward cases and are implemented by the same operator overload; clearly, the multiplication is performed as:$C_{ij} = \sum_k A_{ik} B_{kj}$ , so the algorithm iterates over non-zero elements of$A$ , extract the indices$i,k$ and use$k$ to transverse$B$ ; -
$\textbf{CSC} \cdot \textbf{CSC}$ : thanks to thetransposemethod efficiency, this case is implemented as the transpose counterpart of$\textbf{CSR} \cdot \textbf{CSR}$ case; -
$\textbf{CSR} \cdot \textbf{CSC}$ :$B$ is indeed converted from CSC into CSR format, so that we can exploit again the$\textbf{CSR} \cdot \textbf{CSR}$ code (clearly, this conversion has en extra memory requirement to store the converted matrix and to store a temporary vector for row pointers, further details inside the code).
-
-
include/containers.hpp: a collection of containers, used to store compressed data, uncompresssed data and pair of indices:
-
algebra::containers::index_array_type: alias forstd::array<size_t, 2>, i.e. it is an array used to store a pair of indices; -
algebra::containers::IndexPosition: accessor toindex_array_type- the key idea is that ordering-major index (e.g. row index for row-wise ordering) is always referred to the first entry ofindex_array_type, ensuring the correct ordering in COO map (i.e. there is no need to change thestd::lessoperator for standard arrays, as it guarantees a lexicographical ordering). Then, the row position inindex_array_typeis either$0$ for row-wise ordering, or$1$ for column-wise ordering, in such a way to ensure that the ordering-major index is always located in the first position ofindex_array_type. This design is critical in the aforementionedtransposemethod implementation; -
algebra::containers::UncompressedDataMap: struct inheriting from an orderedboost::container::flat_map<index_array_type, T>, used to store COO data as pair of indices + value.NOTE: the choice of
flat_mapis essential in the parallelization, since it is a random-access container. Moreover, by default, the underlying sequence container is a vector, makingflat_mapalso contiguous. On the other hand, thestd::mapcannot satisfy these requirements and cannot be accessed in parallel using standard algorithms andtbb; -
algebra::containers::CompressedDataStruct: a struct collecting 3 vectors, used to store pointers, indices and values in compressed format.
-
-
include/execution.hpp: a collection of function taken from standard algortihms; in addition, we can mention the
algebra::execution::atomic_addfunction, which ensures that the addition operation is free from data races (i.e. threads cannot write concurrently to the same variable thanks to the definition of astd::atomic_ref, implemented in the<atomic>library); this function is called withstd::memory_order_relaxedoption, which guarantees the operation atomicity but no ordering constraint is enforced (indeed, the sum is commutative, so we do not care about ordering constraints on data writing); -
include/enumerations.hpp: a collection of enumerations, used as options in
Matrixclass; -
include/type_traits.hpp: a collection of type traits used in
Matrixclass; -
include/util.hpp: a collection of utilities to fill matrices in different ways, extensively used to perform soft tests; see the documentation of each function for further details;
-
include/test.hpp: a collection of all possible soft and hard tests, which are performed in src/main.cpp.
NOTE: Soft tests are designed especially to ensure basic functionalities and to guarantee that the parallel execution is carried out without data races (on this purpose, 2 dense matrices are defined whose size is chosen in such a way to trigger data races, see the code for further details);
This folder contains all the headers needed to perform the soft tests and the hard tests:
-
include/test/lambdas.hpp: a collection of lambda functions that invokes all the
Matrixmethods to be tested; -
include/test/soft_test.hpp: definition of the functions needed to run soft tests;
-
include/test/hard_test.hpp: definition of the functions needed to run hard tests and take timings;
-
include/test/table.hpp: print out hard test timings in a table.
The hard tests with timings are all carried on the matrices located in data folder, plus an additional test on a tridiagonal matrix of size
-
SIZE NNZ ELEMENTS REFERENCE 131 x 131 536 here -
SIZE NNZ ELEMENTS REFERENCE 40401 x 40401 201201 here -
data/nasasrb.mtx: a real symmetric matrix for shuttle rocket booster structural problem:
SIZE NNZ ELEMENTS REFERENCE 54870 x 54870 2677324 here
Let us firstly analyze the test on NASA matrix. Indeed, this matrix is large enough to clearly observe the performance difference between compressed and uncompressed formats and to assess parallelization effectiveness:
DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 4.814232e+06
COO (col-wise) 4.804494e+06
DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.167255e+04 3.032581e+04 -4.25207%
COO (col-wise) 3.244682e+04 3.025604e+04 -6.75191%
DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR 4.181365e+04
CSC 4.152029e+04
Both row-wise and column-wise orderings exhibit very similar timings in these 2 tests, where a certain speed-up can be appreciated in the parallel execution. However, this speed-up is lower when compared to the other results below, and this is due to the atomic operation needed to increment the compressed vector of pointers without data races.
DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 1.800000e+00
COO (col-wise) 1.300000e+00
CSR 1.870000e+00
CSC 1.670000e+00
Because of the shared pointers design, the transpose operation is indeed very efficient, since it involves the copy of a shared pointers only. Indeed, the cost of this operation is on the order of
DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.896490e+03 3.016360e+03 -22.5878%
COO (col-wise) 5.688280e+03 2.904200e+03 -48.9441%
CSR 2.581770e+03 3.170960e+03 +22.8212%
CSC 1.902620e+03 8.676200e+02 -54.3987%
1-norm value: 4.06713e+09
DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 5.670250e+03 2.887500e+03 -49.0763%
COO (col-wise) 3.977150e+03 3.010860e+03 -24.296%
CSR 1.901000e+03 8.673800e+02 -54.3724%
CSC 2.738480e+03 3.185260e+03 +16.3149%
Infinity-norm value: 4.06713e+09
DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.351290e+03 1.453054e+05 +4235.8%
COO (col-wise) 3.335940e+03 1.452259e+05 +4253.37%
CSR 2.337030e+03 8.679600e+02 -62.8606%
CSC 2.335280e+03 8.942000e+02 -61.7091%
Frobenius norm value: 1.84364e+10
In all the norms implementation, we can clearly observe a speed-up when switching from uncompressed to compressed formats, as expected.
Looking at the results in compressed formats, it is clear that the 1-norm can be performed more efficiently on CSC rather than on CSR (due to the matrix transversing). Indeed, the timings for the infinity-norm are swapped with respect to those of the 1-norm (since the infinity-norm is just the transpose counterpart of the 1-norm).
In addition, the storage ordering does not alter the results on the Frobenius norm. However, the code slows down when performing parallel execution with COO format: this behaviour occurs since all the threads write atomically to the same accumulator with COO format; on the other hand, for compressed formats, each thread is assigned a row (or column), so it can write on a separate accumulator without competing with the other threads, resulting in a
DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 6.949310e+03 3.019950e+03 -56.5432%
COO (col-wise) 3.976550e+03 3.148680e+03 -20.8188%
CSR 2.332420e+03 1.301130e+03 -44.2155%
CSC 2.931810e+03 3.226100e+03 +10.0378%
The matrix-vector multiplication is tested by multiplying the matrix by a randomly-filled vector. As expected, we can observe a clear speed-up when using compressed formats. Indeed the minimum time is reached by CSR format in parallel mode, while the CSC slows down in parallel execution: indeed, this is due to the atomic accumulation on the resulting vector, causing threads competition in CSC format (on the other hand, for CSR case, each thread is assigned a row, so it can write on separate entries of the resulting vector, without competing with other threads).
DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR - CSR 4.368534e+06
CSC - CSC 4.367381e+06
CSR - CSC 4.400043e+06
CSC - CSR 8.818197e+07
Finally, matrix-matrix multiplication timings are taken by multiplying the test matrix by itself. The first three tests exhibits very similar timings, proving that the flat_map is called at each iteration).
For completeness, the full test output is reported below. As expected, we can notice that parallelization is not convenient for relatively small matrices, resulting in a slowdown of the algorithm. It is finally interesting to notice that, for data/lnsp_131.mtx, reading from Matrix-Market file is faster with the column-wise ordering rather than the row-wise ordering; indeed, entries are ordered column-wise in this particular file and the code gives a hint when inserting new entries in the COO map, in order to emplace them at the back if possible. This hint speeds up reading from file, provided that entries in the Matrix-Market file are row-wise or column-wise ordered.
------------------- PERFORMANCE EVALUTATION -------------------
--- Starting new test on 'data/lnsp_131.mtx' ---
Loading matrix...
Number of rows: 131
Number of column: 131
Number of non-zero elements: 536
DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 1.121000e+03
COO (col-wise) 9.460000e+02
DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.530000e+00 3.220000e+01 +812.181%
COO (col-wise) 1.950000e+00 1.111000e+01 +469.744%
DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR 3.020000e+00
CSC 2.230000e+00
DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 7.000000e-01
COO (col-wise) 7.600000e-01
CSR 7.200000e-01
CSC 7.700000e-01
DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 1.030000e+00 1.908000e+01 +1752.43%
COO (col-wise) 1.250000e+00 1.666000e+01 +1232.8%
CSR 1.270000e+00 1.633000e+01 +1185.83%
CSC 1.060000e+00 1.551000e+01 +1363.21%
1-norm value: 6.32241e+09
DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 1.170000e+00 1.827000e+01 +1461.54%
COO (col-wise) 1.100000e+00 1.992000e+01 +1710.91%
CSR 1.080000e+00 1.568000e+01 +1351.85%
CSC 1.310000e+00 1.668000e+01 +1173.28%
Infinity-norm value: 1.81886e+10
DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 1.010000e+00 1.773000e+01 +1655.45%
COO (col-wise) 1.000000e+00 2.445000e+01 +2345%
CSR 1.160000e+00 1.296000e+01 +1017.24%
CSC 1.010000e+00 1.234000e+01 +1121.78%
Frobenius norm value: 1.51533e+10
DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 1.400000e+00 1.187000e+01 +747.857%
COO (col-wise) 9.600000e-01 1.128000e+01 +1075%
CSR 1.280000e+00 7.750000e+00 +505.469%
CSC 1.160000e+00 9.980000e+00 +760.345%
DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR - CSR 1.108000e+02
CSC - CSC 9.140000e+01
CSR - CSC 1.000000e+02
CSC - CSR 2.140000e+02
--- Starting new test on 'data/chem_master1.mtx' ---
Loading matrix...
Number of rows: 40401
Number of column: 40401
Number of non-zero elements: 201201
DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 1.730960e+05
COO (col-wise) 1.527610e+05
DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 5.996200e+02 6.102500e+02 +1.77279%
COO (col-wise) 5.909400e+02 5.940600e+02 +0.527972%
DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR 7.004700e+02
CSC 6.825900e+02
DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 1.420000e+00
COO (col-wise) 1.580000e+00
CSR 2.460000e+00
CSC 2.180000e+00
DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.042500e+02 3.962700e+02 +30.2449%
COO (col-wise) 3.071000e+02 3.613300e+02 +17.6587%
CSR 2.351700e+02 3.723200e+02 +58.3195%
CSC 1.321300e+02 1.928900e+02 +45.985%
1-norm value: 4419.59
DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.054400e+02 3.564300e+02 +16.6939%
COO (col-wise) 3.063500e+02 3.515800e+02 +14.7642%
CSR 1.394400e+02 1.878200e+02 +34.6959%
CSC 2.305600e+02 4.197100e+02 +82.0394%
Infinity-norm value: 4429.59
DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 2.455000e+02 1.211270e+04 +4833.89%
COO (col-wise) 2.709900e+02 1.193771e+04 +4305.22%
CSR 2.479400e+02 1.537200e+02 -38.0011%
CSC 2.484600e+02 2.048100e+02 -17.5682%
Frobenius norm value: 413911
DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.190200e+02 3.265200e+02 +2.35095%
COO (col-wise) 3.088500e+02 3.772600e+02 +22.1499%
CSR 2.054600e+02 2.091500e+02 +1.79597%
CSC 2.543700e+02 3.914200e+02 +53.8782%
DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR - CSR 4.876480e+04
CSC - CSC 4.852520e+04
CSR - CSC 4.937780e+04
CSC - CSR 1.598874e+05
--- Starting new test on 'data/nasasrb.mtx' ---
Loading matrix...
Number of rows: 54870
Number of column: 54870
Number of non-zero elements: 2677324
DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 4.814232e+06
COO (col-wise) 4.804494e+06
DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.167255e+04 3.032581e+04 -4.25207%
COO (col-wise) 3.244682e+04 3.025604e+04 -6.75191%
DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR 4.181365e+04
CSC 4.152029e+04
DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 1.800000e+00
COO (col-wise) 1.300000e+00
CSR 1.870000e+00
CSC 1.670000e+00
DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.896490e+03 3.016360e+03 -22.5878%
COO (col-wise) 5.688280e+03 2.904200e+03 -48.9441%
CSR 2.581770e+03 3.170960e+03 +22.8212%
CSC 1.902620e+03 8.676200e+02 -54.3987%
1-norm value: 4.06713e+09
DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 5.670250e+03 2.887500e+03 -49.0763%
COO (col-wise) 3.977150e+03 3.010860e+03 -24.296%
CSR 1.901000e+03 8.673800e+02 -54.3724%
CSC 2.738480e+03 3.185260e+03 +16.3149%
Infinity-norm value: 4.06713e+09
DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.351290e+03 1.453054e+05 +4235.8%
COO (col-wise) 3.335940e+03 1.452259e+05 +4253.37%
CSR 2.337030e+03 8.679600e+02 -62.8606%
CSC 2.335280e+03 8.942000e+02 -61.7091%
Frobenius norm value: 1.84364e+10
DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 6.949310e+03 3.019950e+03 -56.5432%
COO (col-wise) 3.976550e+03 3.148680e+03 -20.8188%
CSR 2.332420e+03 1.301130e+03 -44.2155%
CSC 2.931810e+03 3.226100e+03 +10.0378%
DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR - CSR 4.368534e+06
CSC - CSC 4.367381e+06
CSR - CSC 4.400043e+06
CSC - CSR 8.818197e+07
--- Starting new test on tridiagonal matrices of size 1000000x1000000 ---
DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 4.074254e+04 3.865133e+04 -5.13274%
COO (col-wise) 4.071763e+04 3.868196e+04 -4.99948%
DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR 4.743600e+04
CSC 4.727395e+04
DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
COO (row-wise) 1.780000e+00
COO (col-wise) 1.470000e+00
CSR 1.900000e+00
CSC 1.670000e+00
DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 5.726010e+03 4.065750e+03 -28.9951%
COO (col-wise) 5.573470e+03 4.057330e+03 -27.2028%
CSR 6.303390e+03 5.754590e+03 -8.70643%
CSC 4.899710e+03 3.833220e+03 -21.7664%
1-norm value: 6
DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 5.583940e+03 4.071460e+03 -27.0863%
COO (col-wise) 5.774800e+03 4.053750e+03 -29.8028%
CSR 4.917550e+03 3.846310e+03 -21.784%
CSC 6.445630e+03 5.715610e+03 -11.3258%
Infinity-norm value: 6
DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 3.726070e+03 1.838877e+05 +4835.17%
COO (col-wise) 3.719930e+03 1.838017e+05 +4841%
CSR 4.576360e+03 3.813310e+03 -16.6737%
CSC 4.577570e+03 3.831620e+03 -16.2958%
Frobenius norm value: 3741.66
DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL PARALLEL VARIATION
COO (row-wise) 5.505850e+03 3.952120e+03 -28.2196%
COO (col-wise) 5.504320e+03 3.941120e+03 -28.3995%
CSR 5.298580e+03 4.397780e+03 -17.0008%
CSC 5.977870e+03 5.534490e+03 -7.41702%
DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT SEQUENTIAL
CSR - CSR 5.196310e+05
CSC - CSC 5.183504e+05
CSR - CSC 5.474888e+05
CSC - CSR 4.897638e+05