# C++ - OpenBLAS Matrix Multiplication

### C++ - OpenBLAS Matrix Multiplication

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /www/wp-content/plugins/latex/latex.php on line 47

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /www/wp-content/plugins/latex/latex.php on line 49

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /www/wp-content/plugins/latex/latex.php on line 47

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /www/wp-content/plugins/latex/latex.php on line 49

Matrices are extremely popular in many fields of computer science but many operations are slow, especially the useful ones like matrix multiplication where the complexity reaches . There are of course algorithms to speed things up, but there are much faster ways that can fully utilize computer's hardware.

Every operation when doing matrix multiplication is independent which means it can be parallelized through multiple CPU cores or even put on a GPU if you want the best you can get. But sometimes just CPU is enough to avoid expensive copies between CPU and GPU and to reach speed ups up to 10 times. This is where OpenBLAS comes in.

## What is OpenBLAS?

An open source library for BLAS (Basic Linear Algebra Subprograms) standard. It provides standard building blocks for scalar and complex vector and matrix tasks such as multiplication.

## How does it work?

The best way to squeeze the most power of the CPU is to go to the lower level possible from the developer's perspective - assembly. By writing in assembly we usually limit ourselves to a specific CPU architecture and to CPU specific features such as AVX (Advanced Vector Extensions) that really boost things up. Therefore in OpenBLAS every algorithm is rewritten for a specific CPU family to perform at its best and it's doing it great.

## OpenBLAS Matrix Multiplication example

void matrix_multiplcation(double *A, int A_width, int A_height,
double *B, int B_width, int B_height,
double *AB, bool tA, bool tB, double beta)
{
int A_height = tA ? A_width  : A_height;
int A_width  = tA ? A_height : A_width;

int B_height = tB ? B_width  : B_height;
int B_width  = tB ? B_height : B_width;

int m = A_height;
int n = B_width;
int k = A_width;

// Error, width and height should match!
assert(A_width == B_height);

int lda = tA ? m : k;
int ldb = tB ? k : n;

#define TRANSPOSE(X) ((X) ? CblasTrans : CblasNoTrans)

// http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html
cblas_dgemm(CblasRowMajor,
TRANSPOSE(tA), TRANSPOSE(tB),
m, n, k, 1_0,
A, lda,
B, ldb,
beta,
AB, n);

#undef TRANSPOSE(X)
}


### Explanation

Function parameters and first part of the code should be self-explanatory. It takes three matrices as parameters, A, B and AB where AB is a dot product between A and B.  Dimensions for both matrices are provided and information if A, B or both matrices should be transposed. Additionally beta factor is provided which scales output matrix by that factor. In short the formula: