Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition

Mo Zhou
Package: octave
Version: 4.4.1-4
Severity: grave
X-Debbugs-CC: [hidden email], [hidden email]

Dear octave maintainer,

I received an astonishing bug report[1] saying that MKL returns wrong
result for matrix multiplication. However, my further investigation
suggests that the problem is very likely a threading bug of octave.
OpenMP is highly suspected according to my following experimental
results.

=======================================================================

Please reproduce the problem with the following octave script:

921193.m

```
x=1:100000; y=[x;x]*[x;x]';
disp(reshape(y, 1, 4))
```

The correct result is
   333338333350000   333338333350000   333338333350000   333338333350000

However sometimes octave yields random results, which is possibly
a symptom of race condition between threads or alike.

------------------------ MKL ------------------------------------------

libblas.so, libblas.so.3 = libmkl_rt.so

$ octave 921193.m
   1.1033e+15   1.1033e+15   1.1038e+15   1.1038e+15

$ MKL_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

$ MKL_NUM_THREADS=2 octave 921193.m
   642428448891136   642428448891136   642428448891136   642428448891136

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=intel octave 921193.m
   489859913436624   489859913436624   488279025495504   488279025495504

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=gnu octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=tbb octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

$ MKL_NUM_THREADS=2 MKL_THREADING_LAYER=sequential octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

The default threading model used by libmkl_rt is the "intel" threading
aka. libiomp5 . It seems that Octave isn't happy with libiomp5 .
In contrast, gomp (gnu), tbb, serial/sequential threading users
are not affected.

----------------------- Netlib ------------------------------------------

libblas.so, libblas.so.3 = netlib blas

$ octave 921193.m
   824104476280848   824104476280848   828286951663952   828286951663952
$ octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000
$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

Netlib blas has no multi-thread implementation. This result indicates
that octave's multi-threading functionality is questionable.

---------------------- BLIS (openmp) ------------------------------------

libblas.so, libblas.so.3 = blis (openmp). Please note that BLIS use
single thread by default even if it's compiled with openmp/pthread
threading model.

$ octave 921193.m
   757875851200128   757875851200128   796692912410048   796692912410048

$ BLIS_NUM_THREADS=1 octave 921193.m
   531523819460688   531523819460688   543945552290544   543945552290544

$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

Again, octave's multi-threading functionality is questionable.

------------------- OpenBLAS ---------------------------------------

libblas.so, libblas.so.3 = openblas

$ octave 921193.m
   907773323384928   907773323384928   925793789579664   925793789579664

$ octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

$ OPENBLAS_NUM_THREADS=1 octave 921193.m
   737565604371072   737565604371072   741382086552384   741382086552384

$ OMP_NUM_THREADS=1 octave 921193.m
   333338333350000   333338333350000   333338333350000   333338333350000

=========================================================================

According to the above experimental results, I think BLAS libraries
are innocent.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921193

Reply | Threaded
Open this post in threaded view
|

Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition

Mo Zhou
control: severity -1 important

Hi Sébastien and Sylvestre,

On Sun, Feb 03, 2019 at 10:16:05AM +0100, Sébastien Villemot wrote:
> Control: tags -1 unreproducible
>
> Dear Lumin,
>
> I've tried to reproduce the problem with Netlib BLAS, OpenBLAS and
> BLIS, but without success (I did not try with MKL since I don't want
> such a large binary blob on my system).

I tried to reproduce this issue in a docker container. It seems that
the problem only occurs after the installation of libmkl-rt. I feel
very strange and tried to do some further research:

root@6c3d05276fb0:~/x# cat x.sh
for i in $(seq 10); do octave -q --no-gui a.m ; done

root@6c3d05276fb0:~/x# OMP_NUM_THREADS=2 MKL_THREADING_LAYER=intel MKL_NUM_THREADS=1 sh x.sh
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000

root@6c3d05276fb0:~/x# OMP_NUM_THREADS=2 MKL_THREADING_LAYER=intel MKL_NUM_THREADS=2 sh x.sh
   641731270638496   641731270638496   641348657394560   641348657394560
   484510470256176   484510470256176   485846751162000   485846751162000
   640975146516736   640975146516736   641915530512672   641915530512672
   646390298635168   646390298635168   646390298635168   646390298635168
   541379330715152   541379330715152   546317613096592   546317613096592
   495137794802960   495137794802960   497281139942736   497281139942736
   418469161038160   418469161038160   417908282300720   417908282300720
   550962358819680   550962358819680   555512823424000   555512823424000
   447356352104528   447356352104528   452646448263472   452646448263472
   401738136831792   401738136831792   405814754050064   405814754050064

root@6c3d05276fb0:~/x# ln -sr /usr/lib/llvm-7/lib/libgomp.so /usr/lib/llvm-7/lib/libgomp.so.1

root@6c3d05276fb0:~/x# OMP_NUM_THREADS=2 MKL_THREADING_LAYER=intel MKL_NUM_THREADS=2 LD_LIBRARY_PATH=/usr/lib/llvm-7/lib/ sh x.sh
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000
   333338333350000   333338333350000   333338333350000   333338333350000

It turns out that the incorrect matrix product is a result of
gomp + iomp library clash: octave is linked against the GNU OMP,
while libmkl-rt.so invokes Intel(LLVM) OMP by default.

@Sylvestre Do you have any idea about measures to avoid gomp/iomp clash?
Although people should keep in mind to avoid mixing the usage of gomp
and iomp together, the matrix product error just happend silently
without any notice...

> Basically you're suggesting that Octave's basic matrix multiplication
> functionality is utterly broken, without anybody else noticing. This is
> highly unlikely.
>
> Did you try to reproduce the problem on a pristine sid chroot, or
> another system?

I think any BLAS implementation using iomp would end up with such error.
And unfortunately there is only MKL doing that.
 
I confirm that this problem is reproducible, as long as you make
gomp and iomp clash.

OpenBLAS/Netlib/BLIS are innocent even if I encountered error with
them, because LAPACK still points to libmkl-rt, which eventually
leads to gomp-iomp clash again. (I finally found the answer)

So ... How do I fix such gomp-iomp clashing issue?
(I guess it's not fixable)

Reply | Threaded
Open this post in threaded view
|

Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition

Mike Miller-2
On Sun, Feb 03, 2019 at 12:07:20 +0000, Mo Zhou wrote:
> control: severity -1 important

Severity minor because it is only caused by installation of an unrelated
package from non-free?

> I tried to reproduce this issue in a docker container. It seems that
> the problem only occurs after the installation of libmkl-rt. I feel
> very strange and tried to do some further research:
[…]
> So ... How do I fix such gomp-iomp clashing issue?
> (I guess it's not fixable)

I found this related issue: https://stackoverflow.com/q/25986091/384593

Because Octave builds with '-fopenmp', it always links unconditionally
with libgomp, which in turn makes it incompatible with libiomp5. It
seems to me the most reasonable solution is a README.Debian warning
users not to use Octave with libiomp5 or expect undefined behavior.

--
mike

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition

Mo Zhou
In reply to this post by Mo Zhou
Hi Sébastien,

On Sun, Feb 03, 2019 at 12:07:20PM +0000, Mo Zhou wrote:
> It turns out that the incorrect matrix product is a result of
> gomp + iomp library clash: octave is linked against the GNU OMP,
> while libmkl-rt.so invokes Intel(LLVM) OMP by default.

I got in touch with MKL team and they confirmed that the iomp+gomp
mixture is actually a very common error among users. They plan to change
the loading mechanism of libmkl-rt for the 2020 production line, to
avoid iomp+gomp clash (sounds like yet another magic).

So let's keep this bug open for both MKL and Octave for a while,
in case any other user came across similar errors. Maybe this
bug will be fixed in the late 2019 (they released MKL 2019 in late
2018).

Reply | Threaded
Open this post in threaded view
|

Bug#921207: Octave GEMM error on large matrix due to openmp thread race condition

Sébastien Villemot-2
Control: retitle -1 octave: gives wrong results when used with MKL

Le vendredi 01 mars 2019 à 01:41 +0000, Mo Zhou a écrit :

> On Sun, Feb 03, 2019 at 12:07:20PM +0000, Mo Zhou wrote:
> > It turns out that the incorrect matrix product is a result of
> > gomp + iomp library clash: octave is linked against the GNU OMP,
> > while libmkl-rt.so invokes Intel(LLVM) OMP by default.
>
> I got in touch with MKL team and they confirmed that the iomp+gomp
> mixture is actually a very common error among users. They plan to
> change
> the loading mechanism of libmkl-rt for the 2020 production line, to
> avoid iomp+gomp clash (sounds like yet another magic).
>
> So let's keep this bug open for both MKL and Octave for a while,
> in case any other user came across similar errors. Maybe this
> bug will be fixed in the late 2019 (they released MKL 2019 in late
> 2018).
Sounds good, thanks for the heads up.

--
⢀⣴⠾⠻⢶⣦⠀  Sébastien Villemot
⣾⠁⢠⠒⠀⣿⡁  Debian Developer
⢿⡄⠘⠷⠚⠋⠀  http://sebastien.villemot.name
⠈⠳⣄⠀⠀⠀⠀  http://www.debian.org

signature.asc (849 bytes) Download Attachment