-
Notifications
You must be signed in to change notification settings - Fork 825
Implementation of GroupSharedLimit to allow increased GroupSharedMemory #7871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
damyanp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some superficial things that'll need to be addressed at some point, I'm afraid I can't really comment on the actual details of how this works.
I suspect that the whole thing needs to be run through clang-format as well.
| } | ||
| } | ||
|
|
||
| void hlsl::SetShaderProps(PSVRuntimeInfo4 *pInfo4, const DxilModule &DM) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void hlsl::SetShaderProps(PSVRuntimeInfo4 *pInfo4, const DxilModule &DM) { | |
| void hlsl::SetShaderProps(PSVRuntimeInfo4 *Info4, const DxilModule &DM) { |
For new code we generally follow the LLVM Coding Standards. This includes no-p-prefix.
| PSVRuntimeInfo1 *pInfo1, PSVRuntimeInfo2 *pInfo2, | ||
| PSVRuntimeInfo3 *pInfo3, uint8_t ShaderKind, | ||
| const char *EntryName, const char *Comment) { | ||
| PSVRuntimeInfo3 *pInfo3, PSVRuntimeInfo4 *pInfo4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO in this case following the pInfoN pattern is appropriate.
| if (SpecifiedTGSMSize > 0) { | ||
| MaxSize = SpecifiedTGSMSize; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (SpecifiedTGSMSize > 0) { | |
| MaxSize = SpecifiedTGSMSize; | |
| } | |
| if (SpecifiedTGSMSize > 0) | |
| MaxSize = SpecifiedTGSMSize; |
LLVM coding standards say to omit braces here.
Something's also up with the formatting. Did the format-checker spot it? Anyway, clang-format should fix this for you.
|
|
||
| case DxilMDHelper::kDxilGroupSharedLimitTag: { | ||
| DXASSERT(props.IsCS(), "else invalid shader kind"); | ||
| props.groupSharedLimitBytes = ConstMDToUint32(MDO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set m_bExtraMetadata if shader model is not 6.10 or higher. This is how validation catches newer metadata used in prior shader models where it's unsupported.
if (!m_pSM->IsSMAtLeast(6, 10))
m_bExtraMetadata = true;
| hlsl::SetShaderProps((PSVRuntimeInfo0 *)&DMPSV, DM); | ||
| hlsl::SetShaderProps((PSVRuntimeInfo1 *)&DMPSV, DM); | ||
| hlsl::SetShaderProps((PSVRuntimeInfo2 *)&DMPSV, DM); | ||
| hlsl::SetShaderProps((PSVRuntimeInfo4 *)&DMPSV, DM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, part of the purpose of setting the DMPSV props is to check the props (with memcmp) below, which is how Mismatched is set. This sets the props to DMPSV but doesn't check that they match the props loaded from the container.
The function doesn't take PSV3 because it is checked in PSVContentVerifier::Verify (this one requires the string table indirection).
For PSV4, memcmp would be no good because of the PSV3 string index indirection, so it should probably just check the property directly, which requires adding PSVRuntimeInfo4 *PSV4 to PSVContentVerifier::VerifyEntryProperties parameters. I'll submit suggestions too.
| void PSVContentVerifier::VerifyEntryProperties(const ShaderModel *SM, | ||
| PSVRuntimeInfo0 *PSV0, | ||
| PSVRuntimeInfo1 *PSV1, | ||
| PSVRuntimeInfo2 *PSV2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should have PSV3 and PSV4 pointers as well.
| PSVRuntimeInfo2 *PSV2) { | |
| PSVRuntimeInfo2 *PSV2, | |
| PSVRuntimeInfo4 *PSV4) { |
| else | ||
| Mismatched = memcmp(PSV0, &DMPSV, sizeof(PSVRuntimeInfo0)) != 0; | ||
|
|
||
| if (Mismatched) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (Mismatched) { | |
| if (PSV4 && PSV4->GroupSharedLimit != DMPSV.GroupSharedLimit) | |
| Mismatched = true; | |
| if (Mismatched) { |
| // Only valid for SM6.10+ | ||
| if (!SM->IsSM610Plus()) { | ||
| unsigned DiagID = Diags.getCustomDiagID( | ||
| DiagnosticsEngine::Error, "attribute GroupSharedLimit only valid for " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we prefer to place these diagnostics earlier, in SemaHLSL.cpp (probably under DiagnoseEntry and DiagnoseEntryAttrAllowedOnStage) (in spite of the fact that many existing diagnostics were placed here in the past). Note that this allows for diagnosing library entry points too.
| if (SM->IsMS()) { // Fallback to default limits | ||
| funcProps->groupSharedLimitBytes = DXIL::kMaxMSSMSize; // 28k For MS | ||
| } else if (SM->IsAS() || SM->IsCS()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think relying on SM->IsMS() will work for library targets. I think you want to use isMS (and so on) instead. That can cover the node case as well.
| if (SM->IsMS()) { // Fallback to default limits | |
| funcProps->groupSharedLimitBytes = DXIL::kMaxMSSMSize; // 28k For MS | |
| } else if (SM->IsAS() || SM->IsCS()) { | |
| if (isMS) { // Fallback to default limits | |
| funcProps->groupSharedLimitBytes = DXIL::kMaxMSSMSize; // 28k For MS | |
| } else if (isAS || isCS || isNode) { |
|
|
||
| // Check if the entry function has attribute to override TGSM size. | ||
| if (M.HasDxilEntryProps(M.GetEntryFunction())) { | ||
| DxilEntryProps &EntryProps = M.GetDxilEntryProps(M.GetEntryFunction()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code only checks non-library shaders, it seems we are missing library TGSM size validation. So, the node case is completely unchecked.
However, we do calculate and store GroupSharedBytesUsed in RDAT shader info structs. This could be used for runtime checks against device capabilities instead of a maximum size.
It looks to me like we have a bit of a design gap here. Wouldn't it be better to report the maximum size used (like GroupSharedBytesUsed) to the runtime for the runtime check?
For the compilation, we can check against the max size attribute to prevent accidental over-allocation, and for validation, we can check against the size reported in PSV0 or RDAT, as well as a version-based check (SM 6.10 required larger than default max).
Then you don't need max size in PSV0 or RDAT, just reported usage. The attribute would just drive a compile-time override of default maximum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed an issue on the proposal for this here: microsoft/hlsl-specs#761
Add a new HLSL attribute for Compute, Amp and Mesh shaders: GroupSharedLimit.
This is used to limit the amount of group shared memory a shader is allowed to statically declare, and validation will fail if the limit is exceeded.
There is no upper limit on this attribute, and it is expected that shader writers set the limit as the lowest common denominator for their target hardware and software use case (typically 48k or 64k for modern GPUs).
If no attribute is declared the existing 32k limit is used to be compatible with existing shaders.
Extends the PSV structures to include the selected limit so that runtime validation can reject the shader if it exceeds the device support.