MAINT: Refactor partial load workaround for Clang#24461
Merged
charris merged 1 commit intonumpy:mainfrom Sep 5, 2023
Merged
Conversation
c1c965a to
b3334d6
Compare
b3334d6 to
bf5a750
Compare
Clang exhibits aggressive optimization behavior when the `-ftrapping-math` flag is not fully supported, starting from -O1 optimization level. When partially loading a vector register for operations that require filling up the remaining lanes with specific values (e.g., divide operations needing non-zero integers to prevent FP exception divide-by-zero), Clang's optimizer recognizes that the full register is unnecessary for the store operation. Consequently, it optimizes out the fill step involving non-zero integers for the remaining elements. As a solution, we apply the `volatile` keyword to the returned register, followed by a symmetric operand operation like `or`, to inform the compiler about the necessity of the full vector. This refactor involves transferring this workaround from the source files to the universal intrinsic headers, also to guarantee that it is applied by all kernels. Furthermore, the workaround is disabled when the `-ftrapping-math` flag is fully supported by the Clang compiler. This patch also enables `-ftrapping-math` flag for clang-cl and suppress floating point exceptions warnings.
bf5a750 to
83cec53
Compare
Member
|
Thanks Sayed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Clang exhibits aggressive optimization behavior when the
-ftrapping-mathflag is not fully supported,starting from -O1 optimization level. When partially loading a vector register for operations that
require filling up the remaining lanes with specific values (e.g., divide operations needing non-zero
integers to prevent FP exception divide-by-zero), Clang's optimizer recognizes that the full register
is unnecessary for the store operation. Consequently, it optimizes out the fill step involving
non-zero integers for the remaining elements.
As a solution, we apply the
volatilekeyword to the returned vector, followed by a symmetricoperand operation like
or, to inform the compiler about the necessity of the full vector.This refactor involves transferring this workaround from the source files to the universal intrinsic headers,
also to guarantee that it is applied by all kernels. Furthermore, the workaround is disabled when the
-ftrapping-mathflag is fully supported by the Clang compiler.This patch also enables
-ftrapping-mathflag for clang-cl which is required to enabled SIMD optimization on operations such log/exp/sin/cos and suppress floating point exceptions warnings.