Skip to content

TheBusyDev/SparseMatrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse Matrix Challenge

This is one of my university projects! 🤓
I was asked to create a C++ library for handling sparse matrices... using the standard and boost libraries only! 😱
This header-only library supports both row-wise and column-wise ordering, efficient matrix transposition, matrix-matrix and matrix-vector multiplication... and much more! You are free the explore the code! 🧑‍💻
The parallelization is achieved with std::algorithm and tbb (parallelization library from Intel), since MPI and OpenMP were not allowed during this challenge. 📈
The full assignment for this university project is available here.

⚠️ NOTE: Carefully read the LICENSE before modifying or redistributing any part of this code!

Table of contents

How to compile

Dependencies

NOTE: the entire code base has been tested and compiled using mk modules. Please, make sure you have mk modules enabled before proceeding! In addition, the following modules are required:

module load gcc-glibc
module load boost
module load tbb

Makefile

The code can be compiled through usual Makefile, which generates the executable:

./main

The Makefile supports the following phony targets:

  • make / make all: produces the basic executable, which loads only one matrix-market file (data/lnsp_131.mtx) and then performs a hard test on all the matrix functionalities. -march=native -O3 flags are used, and NDEBUG macro is declared. For each test, a table is printed out, reporting a brief description of the test, how many times the test is performed and the average timings in microseconds. For further information, see the sections about code base and results;
  • make optimized_test: produces an optimized version of the executable, meaning that the flags -march=native -O3 are employed when compiling and the macro NDEBUG is declared, in order to disable all the checks performed by the assert statements included in the code. Moreover, this target will also enable all the hard tests to correctly assess the program performance on the current machine. Theses tests are carried on all the matrices located in data, plus an additional test on a tridiagonal matrix of size $1e6$ x $1e6$ (results are discussed in the corresponding section);
  • make test: same as optimized_test, but it will also perform some soft tests to ensure basic functionalities. NDEBUG is not declared, meaning that all the assert statements are enabled throughout the code. This target is clearly not recommended to assess the performance of the code, but it is a useful tool for advanced debugging;
  • make debug: it will produce a fast executable, which will cover almost all the code (except for the hard tests). NDEBUG is not declared and the optimization are all disabled (-O0 -g --coverage flags are being employed). Useful for basic and fast debugging, but not for performance evaluation;
  • make run: start the produced executable (equivalent to ./main);
  • make doc: create html documentation at doc/html/index.html using doxygen.

Code base

The code structure is analyzed below.

This source file contains just the main program, which launches all the softs/hard tests. In the first section, the basic functionalities are tested, including new entries insertion into the matrix and the stream operator. The elements are inserted following a tridiagonal pattern, such that each i-th row appears like: $\text{row}_i = {1,2,3}$.

The second section will start all the soft tests, while the last section will perform more intensive tests to obtain sufficiently reliable timings.

This folder contains all the header files, used to both define the Matrix class and to perform tests on different matrix formats. This is indeed a header-only code, which can be imported only by copying this include folder. The header files containing the core functionalities are listed below:

  • include/sparse_matrix.hpp: This file declares the algebra::Matrix class, which must be instantiated by providing the following template parameters:

    • T: the type of stored data, which must satisfy the requirements given by the concept algebra::type_traits::numeric (i.e. it can be either an integral type, a floating point or a std::complex);
    • SO: the selected storage, which can is set by the enum class StorageOrder (providing the options ROWWISE or COLWISE).

    This class can be constructed by proving the number of rows and of columns (which, however, will be updated automatically if the user inserts new elements using the operator() or loads the matrix from a Matrix-Market file by calling the read_from method). In addition, the matrix size can be changed dynamically through the resize method (and out-of-range entries are discarded).

    Regarding the class structure, the Matrix class stores the uncompressed and compressed data as private shared_ptr (the container types are discussed later). The choice of using shared pointers has been made to allow the default copy-assignment (operator=) to perfom a shallow copy of the matrix, by just copying the pointers to the data. In addition, this design is crucial for transpose method implementation, which indeed returns a tranpose view of the original matrix, without deep-copying data. However, a deep copy can be enforced by calling the method deep_copy.

    Some of the Matrix methods support parallelization, achieved by means of standard algorithms, combined with tbb modules. The parallelization can be enabled dynamically by calling the method enable_parallel (and disabled by disable_parallel). Below a list of all the methods/friend functions allowing for parallel execution:

    • compress: compresses either to CSR format (if the selected storage order is ROWWISE) or to CSC format (only for COLWISE ordered matrices);
    • norm: the norm can be specified through a template parameter of type NormType - thanks to the efficiency of the method tranpose, infinity-norm is implemented as the 1-norm applied to the transpose matrix;
    • operator*: parallelized only when multiplying by a std::vector.

    Instead, the methods/friend functions supporting only sequential execution are:

    • uncompress: convert back the matrix to COO-map format;
    • operator*: only sequential in all the overloads for matrix-matrix multiplication.

    In both the functions above, the limitation preveting parallelization is found in the COO data writing. Indeed, this COO map (which is indeed a boost::container::flat_map) cannot be written by multiple threads concurrently.

    Some private methods are then defined:

    • iterate_over_comp_data, iterate_over_uncomp_data, iterate_over_elements: these methods are used in most of Matrix code to iterate over matrix data and to perform tasks on them (they are declared to improve readability and to define a standard way to loop over matrix elements);
    • check_data_integrity: enabled only when NDEBUG macro is not declared, useful to check integrity after performing tasks that may alternate/invalidate data (such as compressing/uncompression, conversion from CSC to CSR, etc).

    Finally, the operator* overload for matrix-matrix multiplication receives only compressed formats in input and returns a COO matrix. Its implementation can be then analyzed case-by-case:

    • $\textbf{CSR} \cdot \textbf{CSR}$ and $\textbf{CSC} \cdot \textbf{CSR}$: these are the most straight-forward cases and are implemented by the same operator overload; clearly, the multiplication is performed as: $C_{ij} = \sum_k A_{ik} B_{kj}$, so the algorithm iterates over non-zero elements of $A$, extract the indices $i,k$ and use $k$ to transverse $B$;
    • $\textbf{CSC} \cdot \textbf{CSC}$: thanks to the transpose method efficiency, this case is implemented as the transpose counterpart of $\textbf{CSR} \cdot \textbf{CSR}$ case;
    • $\textbf{CSR} \cdot \textbf{CSC}$: $B$ is indeed converted from CSC into CSR format, so that we can exploit again the $\textbf{CSR} \cdot \textbf{CSR}$ code (clearly, this conversion has en extra memory requirement to store the converted matrix and to store a temporary vector for row pointers, further details inside the code).
  • include/containers.hpp: a collection of containers, used to store compressed data, uncompresssed data and pair of indices:

    • algebra::containers::index_array_type: alias for std::array<size_t, 2>, i.e. it is an array used to store a pair of indices;

    • algebra::containers::IndexPosition: accessor to index_array_type - the key idea is that ordering-major index (e.g. row index for row-wise ordering) is always referred to the first entry of index_array_type, ensuring the correct ordering in COO map (i.e. there is no need to change the std::less operator for standard arrays, as it guarantees a lexicographical ordering). Then, the row position in index_array_type is either $0$ for row-wise ordering, or $1$ for column-wise ordering, in such a way to ensure that the ordering-major index is always located in the first position of index_array_type. This design is critical in the aforementioned transpose method implementation;

    • algebra::containers::UncompressedDataMap: struct inheriting from an ordered boost::container::flat_map<index_array_type, T>, used to store COO data as pair of indices + value.

      NOTE: the choice of flat_map is essential in the parallelization, since it is a random-access container. Moreover, by default, the underlying sequence container is a vector, making flat_map also contiguous. On the other hand, the std::map cannot satisfy these requirements and cannot be accessed in parallel using standard algorithms and tbb;

    • algebra::containers::CompressedDataStruct: a struct collecting 3 vectors, used to store pointers, indices and values in compressed format.

  • include/execution.hpp: a collection of function taken from standard algortihms; in addition, we can mention the algebra::execution::atomic_add function, which ensures that the addition operation is free from data races (i.e. threads cannot write concurrently to the same variable thanks to the definition of a std::atomic_ref, implemented in the <atomic> library); this function is called with std::memory_order_relaxed option, which guarantees the operation atomicity but no ordering constraint is enforced (indeed, the sum is commutative, so we do not care about ordering constraints on data writing);

  • include/enumerations.hpp: a collection of enumerations, used as options in Matrix class;

  • include/type_traits.hpp: a collection of type traits used in Matrix class;

  • include/util.hpp: a collection of utilities to fill matrices in different ways, extensively used to perform soft tests; see the documentation of each function for further details;

  • include/test.hpp: a collection of all possible soft and hard tests, which are performed in src/main.cpp.

    NOTE: Soft tests are designed especially to ensure basic functionalities and to guarantee that the parallel execution is carried out without data races (on this purpose, 2 dense matrices are defined whose size is chosen in such a way to trigger data races, see the code for further details);

This folder contains all the headers needed to perform the soft tests and the hard tests:

Results analysis

The hard tests with timings are all carried on the matrices located in data folder, plus an additional test on a tridiagonal matrix of size $1e6$ x $1e6$. The CPU model on which these tests are performed is AMD Ryzen 5 4500H (3GHz, 6 cores, 12 threads). The chosen Matrix-Market files are:

Let us firstly analyze the test on NASA matrix. Indeed, this matrix is large enough to clearly observe the performance difference between compressed and uncompressed formats and to assess parallelization effectiveness:

DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        4.814232e+06
COO (col-wise)        4.804494e+06

DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.167255e+04     3.032581e+04        -4.25207%
COO (col-wise)        3.244682e+04     3.025604e+04        -6.75191%

DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR                   4.181365e+04
CSC                   4.152029e+04

Both row-wise and column-wise orderings exhibit very similar timings in these 2 tests, where a certain speed-up can be appreciated in the parallel execution. However, this speed-up is lower when compared to the other results below, and this is due to the atomic operation needed to increment the compressed vector of pointers without data races.

DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        1.800000e+00
COO (col-wise)        1.300000e+00
CSR                   1.870000e+00
CSC                   1.670000e+00

Because of the shared pointers design, the transpose operation is indeed very efficient, since it involves the copy of a shared pointers only. Indeed, the cost of this operation is on the order of $\sim 1\ \mu s$.

DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.896490e+03     3.016360e+03        -22.5878%
COO (col-wise)        5.688280e+03     2.904200e+03        -48.9441%
CSR                   2.581770e+03     3.170960e+03        +22.8212%
CSC                   1.902620e+03     8.676200e+02        -54.3987%
1-norm value: 4.06713e+09

DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        5.670250e+03     2.887500e+03        -49.0763%
COO (col-wise)        3.977150e+03     3.010860e+03         -24.296%
CSR                   1.901000e+03     8.673800e+02        -54.3724%
CSC                   2.738480e+03     3.185260e+03        +16.3149%
Infinity-norm value: 4.06713e+09

DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.351290e+03     1.453054e+05         +4235.8%
COO (col-wise)        3.335940e+03     1.452259e+05        +4253.37%
CSR                   2.337030e+03     8.679600e+02        -62.8606%
CSC                   2.335280e+03     8.942000e+02        -61.7091%
Frobenius norm value: 1.84364e+10

In all the norms implementation, we can clearly observe a speed-up when switching from uncompressed to compressed formats, as expected.

Looking at the results in compressed formats, it is clear that the 1-norm can be performed more efficiently on CSC rather than on CSR (due to the matrix transversing). Indeed, the timings for the infinity-norm are swapped with respect to those of the 1-norm (since the infinity-norm is just the transpose counterpart of the 1-norm).

In addition, the storage ordering does not alter the results on the Frobenius norm. However, the code slows down when performing parallel execution with COO format: this behaviour occurs since all the threads write atomically to the same accumulator with COO format; on the other hand, for compressed formats, each thread is assigned a row (or column), so it can write on a separate accumulator without competing with the other threads, resulting in a $-60%$ variation on timings.

DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        6.949310e+03     3.019950e+03        -56.5432%
COO (col-wise)        3.976550e+03     3.148680e+03        -20.8188%
CSR                   2.332420e+03     1.301130e+03        -44.2155%
CSC                   2.931810e+03     3.226100e+03        +10.0378%

The matrix-vector multiplication is tested by multiplying the matrix by a randomly-filled vector. As expected, we can observe a clear speed-up when using compressed formats. Indeed the minimum time is reached by CSR format in parallel mode, while the CSC slows down in parallel execution: indeed, this is due to the atomic accumulation on the resulting vector, causing threads competition in CSC format (on the other hand, for CSR case, each thread is assigned a row, so it can write on separate entries of the resulting vector, without competing with other threads).

DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR - CSR             4.368534e+06
CSC - CSC             4.367381e+06
CSR - CSC             4.400043e+06
CSC - CSR             8.818197e+07

Finally, matrix-matrix multiplication timings are taken by multiplying the test matrix by itself. The first three tests exhibits very similar timings, proving that the $\textbf{CSC}\rightarrow\textbf{CSR}$ conversion perfomed in the $\textbf{CSR} \cdot \textbf{CSC}$ case does not affect the global performance. The case $\textbf{CSC} \cdot \textbf{CSR}$ is the slowest one, due to the non-optimal insertion into the output matrix (which is in COO format, so a new insertion in a flat_map is called at each iteration).

Results

For completeness, the full test output is reported below. As expected, we can notice that parallelization is not convenient for relatively small matrices, resulting in a slowdown of the algorithm. It is finally interesting to notice that, for data/lnsp_131.mtx, reading from Matrix-Market file is faster with the column-wise ordering rather than the row-wise ordering; indeed, entries are ordered column-wise in this particular file and the code gives a hint when inserting new entries in the COO map, in order to emplace them at the back if possible. This hint speeds up reading from file, provided that entries in the Matrix-Market file are row-wise or column-wise ordered.

------------------- PERFORMANCE EVALUTATION -------------------
--- Starting new test on 'data/lnsp_131.mtx' ---
Loading matrix...
Number of rows:              131
Number of column:            131
Number of non-zero elements: 536

DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        1.121000e+03
COO (col-wise)        9.460000e+02

DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.530000e+00     3.220000e+01        +812.181%
COO (col-wise)        1.950000e+00     1.111000e+01        +469.744%

DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR                   3.020000e+00
CSC                   2.230000e+00

DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        7.000000e-01
COO (col-wise)        7.600000e-01
CSR                   7.200000e-01
CSC                   7.700000e-01

DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        1.030000e+00     1.908000e+01        +1752.43%
COO (col-wise)        1.250000e+00     1.666000e+01         +1232.8%
CSR                   1.270000e+00     1.633000e+01        +1185.83%
CSC                   1.060000e+00     1.551000e+01        +1363.21%
1-norm value: 6.32241e+09

DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        1.170000e+00     1.827000e+01        +1461.54%
COO (col-wise)        1.100000e+00     1.992000e+01        +1710.91%
CSR                   1.080000e+00     1.568000e+01        +1351.85%
CSC                   1.310000e+00     1.668000e+01        +1173.28%
Infinity-norm value: 1.81886e+10

DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        1.010000e+00     1.773000e+01        +1655.45%
COO (col-wise)        1.000000e+00     2.445000e+01           +2345%
CSR                   1.160000e+00     1.296000e+01        +1017.24%
CSC                   1.010000e+00     1.234000e+01        +1121.78%
Frobenius norm value: 1.51533e+10

DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        1.400000e+00     1.187000e+01        +747.857%
COO (col-wise)        9.600000e-01     1.128000e+01           +1075%
CSR                   1.280000e+00     7.750000e+00        +505.469%
CSC                   1.160000e+00     9.980000e+00        +760.345%

DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR - CSR             1.108000e+02
CSC - CSC             9.140000e+01
CSR - CSC             1.000000e+02
CSC - CSR             2.140000e+02

--- Starting new test on 'data/chem_master1.mtx' ---
Loading matrix...
Number of rows:              40401
Number of column:            40401
Number of non-zero elements: 201201

DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        1.730960e+05
COO (col-wise)        1.527610e+05

DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        5.996200e+02     6.102500e+02        +1.77279%
COO (col-wise)        5.909400e+02     5.940600e+02       +0.527972%

DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR                   7.004700e+02
CSC                   6.825900e+02

DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        1.420000e+00
COO (col-wise)        1.580000e+00
CSR                   2.460000e+00
CSC                   2.180000e+00

DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.042500e+02     3.962700e+02        +30.2449%
COO (col-wise)        3.071000e+02     3.613300e+02        +17.6587%
CSR                   2.351700e+02     3.723200e+02        +58.3195%
CSC                   1.321300e+02     1.928900e+02         +45.985%
1-norm value: 4419.59

DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.054400e+02     3.564300e+02        +16.6939%
COO (col-wise)        3.063500e+02     3.515800e+02        +14.7642%
CSR                   1.394400e+02     1.878200e+02        +34.6959%
CSC                   2.305600e+02     4.197100e+02        +82.0394%
Infinity-norm value: 4429.59

DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        2.455000e+02     1.211270e+04        +4833.89%
COO (col-wise)        2.709900e+02     1.193771e+04        +4305.22%
CSR                   2.479400e+02     1.537200e+02        -38.0011%
CSC                   2.484600e+02     2.048100e+02        -17.5682%
Frobenius norm value: 413911

DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.190200e+02     3.265200e+02        +2.35095%
COO (col-wise)        3.088500e+02     3.772600e+02        +22.1499%
CSR                   2.054600e+02     2.091500e+02        +1.79597%
CSC                   2.543700e+02     3.914200e+02        +53.8782%

DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR - CSR             4.876480e+04
CSC - CSC             4.852520e+04
CSR - CSC             4.937780e+04
CSC - CSR             1.598874e+05

--- Starting new test on 'data/nasasrb.mtx' ---
Loading matrix...
Number of rows:              54870
Number of column:            54870
Number of non-zero elements: 2677324

DESCRIPTION: Read from Matrix-Market file
N. TIMES: 1
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        4.814232e+06
COO (col-wise)        4.804494e+06

DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.167255e+04     3.032581e+04        -4.25207%
COO (col-wise)        3.244682e+04     3.025604e+04        -6.75191%

DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR                   4.181365e+04
CSC                   4.152029e+04

DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        1.800000e+00
COO (col-wise)        1.300000e+00
CSR                   1.870000e+00
CSC                   1.670000e+00

DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.896490e+03     3.016360e+03        -22.5878%
COO (col-wise)        5.688280e+03     2.904200e+03        -48.9441%
CSR                   2.581770e+03     3.170960e+03        +22.8212%
CSC                   1.902620e+03     8.676200e+02        -54.3987%
1-norm value: 4.06713e+09

DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        5.670250e+03     2.887500e+03        -49.0763%
COO (col-wise)        3.977150e+03     3.010860e+03         -24.296%
CSR                   1.901000e+03     8.673800e+02        -54.3724%
CSC                   2.738480e+03     3.185260e+03        +16.3149%
Infinity-norm value: 4.06713e+09

DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.351290e+03     1.453054e+05         +4235.8%
COO (col-wise)        3.335940e+03     1.452259e+05        +4253.37%
CSR                   2.337030e+03     8.679600e+02        -62.8606%
CSC                   2.335280e+03     8.942000e+02        -61.7091%
Frobenius norm value: 1.84364e+10

DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        6.949310e+03     3.019950e+03        -56.5432%
COO (col-wise)        3.976550e+03     3.148680e+03        -20.8188%
CSR                   2.332420e+03     1.301130e+03        -44.2155%
CSC                   2.931810e+03     3.226100e+03        +10.0378%

DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR - CSR             4.368534e+06
CSC - CSC             4.367381e+06
CSR - CSC             4.400043e+06
CSC - CSR             8.818197e+07

--- Starting new test on tridiagonal matrices of size 1000000x1000000 ---
DESCRIPTION: Compression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        4.074254e+04     3.865133e+04        -5.13274%
COO (col-wise)        4.071763e+04     3.868196e+04        -4.99948%

DESCRIPTION: Uncompression
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR                   4.743600e+04
CSC                   4.727395e+04

DESCRIPTION: Transpose matrix
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
COO (row-wise)        1.780000e+00
COO (col-wise)        1.470000e+00
CSR                   1.900000e+00
CSC                   1.670000e+00

DESCRIPTION: 1-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        5.726010e+03     4.065750e+03        -28.9951%
COO (col-wise)        5.573470e+03     4.057330e+03        -27.2028%
CSR                   6.303390e+03     5.754590e+03        -8.70643%
CSC                   4.899710e+03     3.833220e+03        -21.7664%
1-norm value: 6

DESCRIPTION: Infinity-norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        5.583940e+03     4.071460e+03        -27.0863%
COO (col-wise)        5.774800e+03     4.053750e+03        -29.8028%
CSR                   4.917550e+03     3.846310e+03         -21.784%
CSC                   6.445630e+03     5.715610e+03        -11.3258%
Infinity-norm value: 6

DESCRIPTION: Frobenius norm
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        3.726070e+03     1.838877e+05        +4835.17%
COO (col-wise)        3.719930e+03     1.838017e+05           +4841%
CSR                   4.576360e+03     3.813310e+03        -16.6737%
CSC                   4.577570e+03     3.831620e+03        -16.2958%
Frobenius norm value: 3741.66

DESCRIPTION: Matrix-vector multiplication
N. TIMES: 100
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL         PARALLEL        VARIATION
COO (row-wise)        5.505850e+03     3.952120e+03        -28.2196%
COO (col-wise)        5.504320e+03     3.941120e+03        -28.3995%
CSR                   5.298580e+03     4.397780e+03        -17.0008%
CSC                   5.977870e+03     5.534490e+03        -7.41702%

DESCRIPTION: Matrix-matrix multiplication
N. TIMES: 5
AVERAGE TIMINGS (microsec):
FORMAT                  SEQUENTIAL
CSR - CSR             5.196310e+05
CSC - CSC             5.183504e+05
CSR - CSC             5.474888e+05
CSC - CSR             4.897638e+05

About

C++ parallel implementation of sparse matrix memory storage, exploiting standard algorithms and tbb

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •