[Update] | openblas – Pickpeup

openblas: คุณกำลังดูกระทู้

Improving R Perfomance by installing optimized BLAS/LAPACK libraries

July 25th, 2018

Carlos Santillan

Principal Software Engineer

Bentley Systems Inc.

https://www.linkedin.com/in/carlos-santillan/

csantill@gmail.com

https://github.com/csantill/RPerformanceWBLAS

We frequently hear that R is slow, if your code makes heavy usage of vector/matrix operations you will see significant performance improvements.

The precompiled R distribution that is downloaded from CRAN makes use of the reference BLAS/LAPACK implementation for linear algebra operations, These implementations are built to be stable and cross platform compatible but are not optimized for performance. Most R programmers use these default libraries and are unaware that highly optimized libraries are available and switching to these can have a significant perfomance improvement. We will review the steps to install highly optimized libraries and benchmark their performance.

Base Libraries

BLAS

The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations. The Level 1 BLAS perform scalar, vector and vector-vector operations, the Level 2 BLAS perform matrix-vector operations, and the Level 3 BLAS perform matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they are commonly used in the development of high quality linear algebra software

LAPACK

LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.

You can use sessionInfo() in R to see which version of BLAS and LAPACK your R session is using

sessionInfo()

Here is the output when using the default BLAS/LAPACK libraries on linux.

## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.4  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] tools_3.4.4     htmltools_0.3.6 yaml_2.1.19     Rcpp_0.12.17   
##  [9] stringi_1.2.3   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
## [13] digest_0.6.15   evaluate_0.10.1

Alternative libraries

There are several highly optimized libraries that can be used instead of the default base libraries. These libraries are optimized to take advantage of the hardware they are run on, and can be significatantly faster than the base implementation (operations such as Matrix multiplications may be over 40 times faster)

OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version

AtlasBLAS

The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance.

GotoBLAS

GotoBLAS and GotoBLAS2 are open source implementations of the BLAS (Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types. GotoBLAS was developed by Kazushige Goto at the Texas Advanced Computing Center (UT). As of 2003, it was used in seven of the world’s ten fastest supercomputers.

READ  [NEW] 激ロック ラウドロック ポータルサイト | 風とロック配布場所 - Pickpeup

Intel MKL

Intel Math Kernel Library

Features highly optimized, threaded, and vectorized math functions that maximize performance on each processor family Uses industry-standard C and Fortran APIs for compatibility with popular BLAS, LAPACK, and FFTW functions—no code changes required

The Intel MKL used to a commercial product, free for student use, paid license for other uses, it looks like it is now free

Free for all, no royalties, no restrictions on company or project size, access to current & older versions of libraries (only current version for Intel® MPI), Forum Support.

The above mentioned libraries are optimized for perfomance and are used in production.

GotoBLAS at one time ran on 7 of the top 10 supercomputers in the world, ATLASBLAS is used for Matlab (for some platforms), Matlab uses Veclib on Mac.

Installing the libraries

The following steps have been tested on Ubuntu 18.04 “Bionic Beaver”, but they should work on debian distributions. We can use apt-cache search to see what libblas debian packages are available to install

apt-cache search libblas
csantill@gawain:/usr/bin$ apt-cache search libblas
libblas-dev - Basic Linear Algebra Subroutines 3, static library
libblas3 - Basic Linear Algebra Reference implementations, shared library
libatlas-base-dev - Automatically Tuned Linear Algebra Software, generic static
libatlas3-base - Automatically Tuned Linear Algebra Software, generic shared
libblas-test - Basic Linear Algebra Subroutines 3, testing programs
libblasr - tools for aligning PacBio reads to target sequences
libblasr-dev - tools for aligning PacBio reads to target sequences (development files)
libopenblas-base - Optimized BLAS (linear algebra) library (shared library)
libopenblas-dev - Optimized BLAS (linear algebra) library (development files)

Install OpenBLAS and AtlasBLAS wit the following command

$ sudo apt-get install libopenblas-base libatlas3-base

Installing the Intel MKL

The following script form “Adding Intel MKL easily via a simple script” provides a simple way to install the Intel MKL

sudo script.sh
#!/bin/bash
##
## cf https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-apt-repo

cd /tmp
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB

## all products:
#sudo wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list
## just MKL
sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
## other (TBB, DAAL, MPI, ...) listed on page

apt-get update
apt-get install intel-mkl-64bit-2018.2-046   ## wants 500+mb :-/  installs to 1.8 gb :-/

## update alternatives
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so     libblas.so-x86_64-linux-gnu      /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3   libblas.so.3-x86_64-linux-gnu    /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so   liblapack.so-x86_64-linux-gnu    /opt/intel/mkl/lib/intel64/libmkl_rt.so 50
update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu  /opt/intel/mkl/lib/intel64/libmkl_rt.so 50

echo "/opt/intel/lib/intel64"     >  /etc/ld.so.conf.d/mkl.conf
echo "/opt/intel/mkl/lib/intel64" >> /etc/ld.so.conf.d/mkl.conf
ldconfig

echo "MKL_THREADING_LAYER=GNU" >> /etc/environment

Microsoft R Open

R Open (formerly Revolution R Open) is an enhanced distribution of R.

The current version, Microsoft R Open 3.5.0, is based on (and 100% compatible with) R-3.5.0, has full compatibility with all packages, scripts and applications that work with R. It includes additional capabilities for improved performance, reproducibility, and support for Windows, Linux-based and OS X platforms.

(Windows and Linux version use Intel MKL libraries, OS X uses Accelerate ML)

https://mran.microsoft.com/open

Selecting the default BLAS/LAPACK

Once you have installed the different BLAS/LAPACK libraries you can use update alternatives to select the default library

Note: You will have to to restart R/R Studio for the change to take effect, you can use sessionInfo() to confirm that you are using the library you want

Change default BLAS

Run update-alternatives to select one of the installed Alternatives

sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu
csantill@gawain:~$ sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu
There are 4 choices for the alternative libblas.so.3-x86_64-linux-gnu (providing /usr/lib/x86_64-linux-gnu/libblas.so.3).

  Selection    Path                                             Priority   Status
------------------------------------------------------------
  0            /opt/intel/mkl/lib/intel64/libmkl_rt.so           50        auto mode
* 1            /opt/intel/mkl/lib/intel64/libmkl_rt.so           50        manual mode
  2            /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3      35        manual mode
  3            /usr/lib/x86_64-linux-gnu/blas/libblas.so.3       10        manual mode
  4            /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3   40        manual mode

Press <enter> to keep the current choice[*], or type selection number: 

Change default LAPACK

sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu
csantill@gawain:~$ sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu
There are 4 choices for the alternative liblapack.so.3-x86_64-linux-gnu (providing /usr/lib/x86_64-linux-gnu/liblapack.so.3).

  Selection    Path                                               Priority   Status
------------------------------------------------------------
  0            /opt/intel/mkl/lib/intel64/libmkl_rt.so             50        auto mode
* 1            /opt/intel/mkl/lib/intel64/libmkl_rt.so             50        manual mode
  2            /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3      35        manual mode
  3            /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3     10        manual mode
  4            /usr/lib/x86_64-linux-gnu/openblas/liblapack.so.3   40        manual mode

Press <enter> to keep the current choice[*], or type selection number: 

BenchMarks

I ran the R Benchmark 2.5 scripts under each of the configurations. The accompanying article Why is R slow? some explanations and MKL/OpenBLAS setup to try to fix this was one of the references I used.

READ  [Update] BoA NO.1 歌詞 | boa 日本語 - Pickpeup

The script I used can be found

https://github.com/pachamaltese/r-with-intel-mkl/blob/master/00-benchmark-scripts/1-r-benchmark-25.R

And can be run from

source("1-r-benchmark-25.R")

As can be seen from the Benchmark results OpenBLAS is about 2.85 times faster than the default BLAS and MKL is about 3.25 times faster, also AtlasBLAS is about twice as fast as the default. With some operations such as Matrix Cross Product being over 20 times faster.

R Benchmark 2.5

Number of times each test is run: 3

BLAS
OpenBLAS
MKL
ATLASBLAS

A) Creation, transp., deformation of 2.5kx2.5k matrix
1.424
1.411
1.611
1.371

B) 2.4Kx2.4K normal distributed random matrix ^1k
1.028
1.030
1.030
1.024

C) Sorting of 7M random values
1.423
1.433
1.346
1.441

D) 2.8Kx2.8K cross-product matrix (b = a’ * a)
20.687
0.789
0.577
3.756

E) Linear regr. over a 3Kx3K matrix (c = a  b’)
9.898
0.561
0.322
1.939

Trimmed geom. mean (2 extremes eliminated)
2.717
1.047
0.928
1.565

BLAS
OpenBLAS
MKL
ATLASBLAS

F) FFT over 2,400,000 random values
0.575
0.535
0.570
0.534

G) Eigenvalues of a 640×640 random matrix
1.470
1.028
0.480
0.921

H) Determinant of a 2500×2500 random matrix
6.727
0.456
0.322
1.873

I) Cholesky decomposition of a 3000×3000 matrix
8.314
0.444
0.321
1.617

J) Inverse of a 1600×1600 random matrix
6.069
0.493
0.296
1.603

Trimmed geom. mean (2 extremes eliminated)
3.915
0.494
0.368
1.336

BLAS
OpenBLAS
MKL
ATLASBLAS

K) 3,500,000 Fibonacci numbers calculation (vector calc)
1.069
1.035
1.006
1.029

L) Creation of a 3000×3000 Hilbert matrix (matrix calc)
0.530
0.497
0.510
0.493

N) Grand common divisors of 400,000 pairs (recursion)
0.502
0.500
0.579
0.478

M) Creation of a 500×500 Toeplitz matrix (loops)
0.580
0.540
0.086
0.542

O) Escoufier’s method on a 45×45 matrix (mixed)
0.636
0.469
0.507
0.538

Trimmed geom. mean (2 extremes eliminated)
0.580
0.512
0.531
0.524

BLAS
OpenBLAS
MKL
ATLASBLAS

Total time for all 15 tests
60.932
11.222
9.563
19.158

Overall mean (sum of I, II and III trimmed means/3)
1.834
0.642
0.566
1.031

Results

The Intel MKL (Microsoft R open)) are the clear winners for the used configuration. They are faster in most operations than OpenBLAS or ATLAS, It should be noted that while the MKL performs faster when running on Intel CPUs, OpenBLAS is typically faster on AMD CPUs.

Mac

Apple provides the Accelarate Framework which includes optimized BLAS libraries (veclib). The R installation downloaded from CRAN ships with both the vecLib and reference BLAS libraries (The reference BLAS is used by default)

The Mac OSX R FAQ 10.5 Which BLAS is used and how can it be changed? describes the steps to change the default BLAS to use the vecLib.

The following article R: Use faster vecLib


## 
# use faster vecLib library
cd /Library/Frameworks/R.framework/Resources/lib
ln -sf  /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/libBLAS.dylib libRblas.dylib

# return to default settings
cd /Library/Frameworks/R.framework/Resources/lib
ln -sf libRblas.0.dylib libRblas.dylib

sessionInfo() should return something similar to below if you are using the Accelerate Framework

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

Benchmarks Mac OSX

Benchmark 2.5

Number of times each test is run: 3

Matrix calculation

BLAS
vecLib

Creation, transp., deformation of a 2500×2500 matrix
0.911666666666668
0.907666666666665

2400×2400 normal distributed random matrix ^1000
0.173000000000002
0.172666666666668

Sorting of 7,000,000 random values
0.717000000000004
0.715333333333334

2800×2800 cross-product matrix (b = a’ * a)
13.115
0.472000000000001

Linear regr. over a 3000×3000 matrix (c = a  b’)
6.31833333333333
0.429

Trimmed geom. mean (2 extremes eliminated)
1.60442438940899
0.525173235045746

  1. Matrix functions

BLAS
vecLib

FFT over 2,400,000 random values
0.424000000000007
0.469000000000001

Eigenvalues of a 640×640 random matrix
0.904666666666671
0.479333333333332

Determinant of a 2500×2500 random matrix
4.507
0.379333333333335

Cholesky decomposition of a 3000×3000 matrix
5.22133333333333
0.406666666666666

Inverse of a 1600×1600 random matrix
4.00333333333333
1.433

Trimmed geom. mean (2 extremes eliminated)
2.53668164510219
0.45048778200099

  1. Programmation

BLAS
vecLib

3,500,000 Fibonacci numbers calculation (vector calc)
0.320000000000003
0.322666666666665

Creation of a 3000×3000 Hilbert matrix (matrix calc)
0.336999999999989
0.349000000000004

Grand common divisors of 400,000 pairs (recursion)
0.284666666666662
0.280000000000001

Creation of a 500×500 Toeplitz matrix (loops)
0.36099999999999
0.359666666666665

Escoufier’s method on a 45×45 matrix (mixed)
0.401999999999987
0.278000000000006

Trimmed geom. mean (2 extremes eliminated)
0.33891882626425
0.315921503050365

BLAS
vecLib

Total time for all 15 tests
38
7.45333333333334

READ  We love Muzik -DAFO (official Audio) | müzik | En son Tay şarkıları

Overall mean (sum of I, II and III trimmed means/3)
1.11316695385059
0.421232232758123

Windows

While it is possible to use alternative BLAS and LAPACK libraries on Windows, I could not found a source of updated precompiled binaries I found it easier just to use Microsoft R Open that uses the Intel MKL

Paralell Development

OpenBLAS and the Intel MKL (and R Open) take advantage of multiple cores and multithreading, For some operations they will use multiple threads. This is great if your operation is serial in nature but it could interfer if you are trying to perform parallel operations (such as parallel foreach, parapply, etc.)

You should disable multi threading to avoid this. Fortunatley most of the performance from using these libraries comes from vectorized math not multi threading (see the following article for more info Edge cases in using the Intel MKL and parallel programming, while this is about MKL the same applies to openBLAS )

with openBLAS you can disable multihreading by using

openblas_set_num_threads(1) 

on R open

setMKLthreads(1)

if you are using MKL then you should set environment variable MKL_NUM_THREADS ( you can set this in your .profile )

export MKL_NUM_THREADS=1


wien2k 18.2 installtion wiith gfortran and openblas updated video


This is the updated video for wien2k installtion some of the command and text used in installtion process is mentioned below.
Run the following commands one by one.
\”Must install openblas and compile \”openblas\” with g fortran after gfortran installtion\”
For installtion of openblas watch the video give on the side bar.
other wise use the following commands
cd openblas
make FC=gfortran
sudo make install
after finishing openblas run the following commands one by one run the gfortran command before installing openblas because for openblas compilation you will need gfortran
sudo aptget install update
sudo aptget install gfortran
sudo aptget install csh
sudo aptget install tcsh
sudo aptget install buildessential
sudo aptget install liblapackdev
sudo aptget install libblasdev

\”After these command must install openblas and compile with gfortran by using following commands
cd openblas
make FC=gfortran
sudo make install
after this follow the procedure for setting libraries\”
2.
tar xvf WIEN2k_18.2.tar (skip this if you downloaded files separately)
gunzip .gz
chmod +x ./expand_lapw
./expand_lapw
./siteconfig_lapw
If the word too long appear than
change the word csh to tcsh in \”siteconfig\” file in wien2k directory
./siteconfig_lapw
system: choose LG gfortran+gotolib
compiler: gfortran and cc

lblas llapack lpthread
L Linker Flags: $(FOPT) L../SRC_lib
P Preprocessor flags ‘DParallel’
R R_LIBS (LAPACK+BLAS): lblas llapack lpthread
set the above R_libs (import step)
Recommended options for system linuxgfortran are:
Compiler options: ffreeform O2 ffreelinelengthnone
Linker Flags: $(FOPT) L../SRC_lib
Preprocessor flags: ‘DParallel’
R_LIB (LAPACK+BLAS): lblas llapack lpthread
Current settings:
O Compiler options: ffreeform O2 ffreelinelengthnone
L Linker Flags: $(FOPT) L../SRC_lib
P Preprocessor flags ‘DParallel’
R R_LIBS (LAPACK+BLAS): lblas llapack lpthread

X LIBX options:
PO Parallel options
S Save and Quit
Q Quit and abandon changes
To change an item select option.
Selection:

….


3.
./userconfig_lapw


4. source ~/.bashrc
5. w2web p 7890 or port of choice
6. then go to http://localhost:7890/ in firefox or other browser
remember, the first time you do w2web, just press enter. Don’t change the default user name and password. change them after you first set it up
wien2k installtion wien2k18.2 wien2k2018installtion wien2kcompilation dft abinitio densityfunctionalthoery

นอกจากการดูบทความนี้แล้ว คุณยังสามารถดูข้อมูลที่เป็นประโยชน์อื่นๆ อีกมากมายที่เราให้ไว้ที่นี่: ดูเพิ่มเติม

wien2k 18.2 installtion wiith gfortran and openblas updated video

WIEN2k (gfortran + gcc + OpenBlas + OpenMP)


WIEN2k (gfortran + gcc + OpenBlas + OpenMP)

Usage of Maths and Scientific Libraries – (among those covered: BLAS, LAPACK)


Speaker: Dr. Jussi Enkovaara (CSC)
\”Prace Conference 2014\”,
Partnership for Advanced Computing in Europe,
Tel Aviv University,
12.2.14

Usage of Maths and Scientific Libraries - (among those covered: BLAS, LAPACK)

WIEN2k (gfortran and gcc compile, using default wien2k library)


x lapw h
This compile can calculate any sample with k point parallelization.

WIEN2k (gfortran and gcc compile, using default wien2k library)

43.91.40.blas aaj bhi open blas hoga


43.91.40.blas aaj bhi open blas hoga

นอกจากการดูบทความนี้แล้ว คุณยังสามารถดูข้อมูลที่เป็นประโยชน์อื่นๆ อีกมากมายที่เราให้ไว้ที่นี่: ดูวิธีอื่นๆMusic of Turkey

ขอบคุณมากสำหรับการดูหัวข้อโพสต์ openblas

Leave a Comment