SPLASH2 LU (2)
Applied batched checks to the inner-loop
daxpy function
5.3 times speedup with 8PEs