Fix Default/OEM encoding behavior PowerShell Core#3467
Fix Default/OEM encoding behavior PowerShell Core#3467daxian-dbw merged 8 commits intoPowerShell:masterfrom
Conversation
36ab1e2 to
9580b4a
Compare
|
I was able to build locally with |
| { | ||
| uint oemCP = NativeMethods.GetOEMCP(); | ||
| encoding = System.Text.Encoding.GetEncoding((int)oemCP); | ||
| encoding = ClrFacade.GetOEMEncoding(); |
There was a problem hiding this comment.
The NativeMethods.GetOEMCP in this file can be removed if it's no longer used.
There was a problem hiding this comment.
It is used in ClrFacade.GetOEMEncoding
There was a problem hiding this comment.
ClrFacade has its own copy of GetOEMEncoding, so the one in FileSystemProvider.cs can be removed.
| #if CORECLR | ||
|
|
||
| #if UNUX // PowerShell Core on Unix | ||
| s_defaultEncoding = Encoding.GetEncoding(65001); |
There was a problem hiding this comment.
Why not using Encoding.UTF8? Isn't 65001 the same as utf8?
| #if UNUX // PowerShell Core on Unix | ||
| s_defaultEncoding = Encoding.GetEncoding(65001); | ||
| #else // PowerShell Core on Windows | ||
| uint aCp = NativeMethods.GetACP(); |
There was a problem hiding this comment.
Maybe rename the variable to currentAnsiCp? It's helpful to know what GetACP does.
| // We will revisit this if it causes any failures when running tests on Core PS. | ||
| s_defaultEncoding = Encoding.GetEncoding(28591); | ||
| #else | ||
| #if CORECLR |
There was a problem hiding this comment.
The if/def pattern used in other places is
#if UNIX
// unix behavior
#elif CORECLR
// win-core behavior
#else
// win-full behavior
#endif
Could you please follow the same?
| uint aCp = NativeMethods.GetACP(); | ||
| if (s_oemEncoding == null) | ||
| { | ||
| Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance); |
There was a problem hiding this comment.
Is it better to separate this into a method to reduce code duplication?
| internal const string QueryDosDeviceDllName = "api-ms-win-core-file-l1-1-0.dll"; /*1*/ | ||
| internal const string CreateSymbolicLinkDllName = "api-ms-win-core-file-l2-1-0.dll"; /*2*/ | ||
| internal const string GetOEMCPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*3*/ | ||
| internal const string GetACPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*3*/ |
There was a problem hiding this comment.
The /*3*/ at the end is important to track if there is a mismatch. Please add this API to the end of the list and use an incremented number.
| internal const string QueryDosDeviceDllName = "kernel32.dll"; /*1*/ | ||
| internal const string CreateSymbolicLinkDllName = "kernel32.dll"; /*2*/ | ||
| internal const string GetOEMCPDllName = "kernel32.dll"; /*3*/ | ||
| internal const string GetACPDllName = "kernel32.dll"; /*3*/ |
| { | ||
| #if CORECLR // The OEM code page '437' is not supported by CoreCLR. | ||
| // Use the default encoding (ISO-8859-1, CodePage 28591) as the OEM encoding in OneCore powershell. | ||
| #if CORECLR |
There was a problem hiding this comment.
Same here regarding the pattern.
| uint oemCp = NativeMethods.GetOEMCP(); | ||
| if (s_defaultEncoding == null) | ||
| { | ||
| Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance); |
There was a problem hiding this comment.
same here regarding the code duplication.
Why are you using |
| <PackageReference Include="System.ServiceModel.Security" Version="4.3.0" /> | ||
| <PackageReference Include="System.ServiceProcess.ServiceController" Version="4.3.0" /> | ||
| <PackageReference Include="System.Text.Encoding.CodePages" Version="4.3.0" /> | ||
| <PackageReference Include="System.Text.Encoding.CodePages" Version="4.0.11" /> |
There was a problem hiding this comment.
Why change it to 4.0.11? I don't even find such a version on nuget.org
There was a problem hiding this comment.
With 4.0.11 I can compile locally on my computer. Now I changed to 4.0.1 but failed too.
| EncodingRegisterProvider() | ||
|
|
||
| uint currentAnsiCp = NativeMethods.GetACP(); | ||
| s_defaultEncoding = Encoding.GetEncoding((int)currentAnsiCp); |
There was a problem hiding this comment.
Encoding.Default is coming back in netstandard2.0, are we going to change to it at that point?
There was a problem hiding this comment.
We have RFC for that. No we only recover "Windows PowerShell" behavior.
|
|
||
| /// <summary> | ||
| /// Facade for Encoding.Default | ||
| /// Facade for getting Encoding.Default |
There was a problem hiding this comment.
Depending on what we will do when Encoding.Default is back in netstandard2.0. If we are not going to change to Encoding.Default in powershell core, then the comment should be Facade for getting default encoding.
|
|
||
| uint currentAnsiCp = NativeMethods.GetACP(); | ||
| s_defaultEncoding = Encoding.GetEncoding((int)currentAnsiCp); | ||
|
|
There was a problem hiding this comment.
Please remove the extra blank line.
| #else // Windows PowerShell | ||
| uint oemCp = NativeMethods.GetOEMCP(); | ||
| s_oemEncoding = Encoding.GetEncoding((int)oemCp); | ||
| #endif |
There was a problem hiding this comment.
I didn't notice the duplicate code between coreclr and fullclr. In this case, maybe the following is better?
#if UNIX
s_oemEncoding = GetDefaultEncoding();
#else
#if CORECLR
EncodingRegisterProvider()
#endif
uint oemCp = NativeMethods.GetOEMCP();
s_oemEncoding = Encoding.GetEncoding((int)oemCp);
#end
There was a problem hiding this comment.
I tried - it is difficult to read. In compile time no duplicate code is. So maybe leave as is?
| } | ||
| return s_oemEncoding; | ||
| } | ||
|
|
There was a problem hiding this comment.
Remove extra line, or move private static volatile Encoding s_defaultEncoding; here as well.
|
|
||
| private static volatile Encoding s_oemEncoding; | ||
|
|
||
| private static EncodingRegisterProvider() |
There was a problem hiding this comment.
This method should be enclosed with #if CoreCLR, as it won't build for fullclr
| { | ||
| Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); | ||
| } | ||
|
|
| { | ||
| if (s_defaultEncoding == null && s_oemEncoding == null) | ||
| { | ||
| Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); |
There was a problem hiding this comment.
It's possible this line will be called more than once when there are multiple runspace. Will it fail at the second call?
There was a problem hiding this comment.
First I put the call in GetDefaultEncoding and GetOEMEncoding without "if" and second call was well.
|
@daxian-dbw |
|
@JamesWTruher @mklement0, you have been closely following up with this issue (and related), could you please take a look at the fix? |
| s_defaultEncoding = Encoding.GetEncoding(28591); | ||
| #else | ||
| #if UNUX // PowerShell Core on Unix | ||
| s_defaultEncoding = Encoding.UTF8; |
There was a problem hiding this comment.
I think it's a mistake to default to a UTF-8 encoding with BOM.
While .NET Core, at least currently, also reports the same BOM-based encoding - unofficially in Encoding.Default (not part of the contract yet) and officially in Encoding.GetDefaultEncoding(0) (and I consider that misguided too) -
at least it doesn't govern the actual default behavior in the sense of what happens if no encoding is specified. The methods of type System.IO.File default to BOM-less UTF-8 - both in .NET Framework since its inception and in .NET Core.
It think it's imperative that we use a BOM-less UTF-8 encoding in the Unix world (and, I would argue, even make that the default on Windows (Core) as well, with opt-in to legacy behavior).
| <PackageReference Include="System.ServiceModel.Security" Version="4.3.0" /> | ||
| <PackageReference Include="System.ServiceProcess.ServiceController" Version="4.3.0" /> | ||
| <PackageReference Include="System.Text.Encoding.CodePages" Version="4.3.0" /> | ||
| <PackageReference Include="System.Text.Encoding.CodePages" Version="4.0.1" /> |
There was a problem hiding this comment.
Why not use 4.3.0? The type System.Text.CodePagesEncodingProvider is available in that package.
You need to update SMA.csproj to add this package reference since that's the project depending on it. Then you can remove this entry from Microsoft.PowerShell.SDK.csproj.
There was a problem hiding this comment.
Many thanks! It is solved the problem and PowerShell is builded well.
But currently Travis is failed on test phase with
Import-Module : Unable to load DLL 'api-ms-win-core-localization-l1-2-0.dll'
Could you comment please?
There was a problem hiding this comment.
#if UNUX
As Steve pointed out, this is why 😄
|
@daxian-dbw, @JamesWTruher : I know this debate is cumbersome, but I think it's a very important one to have, and I don't think we're ready to move ahead yet - I've just added further thoughts here. |
| // We will revisit this if it causes any failures when running tests on Core PS. | ||
| s_defaultEncoding = Encoding.GetEncoding(28591); | ||
| #else | ||
| #if UNUX // PowerShell Core on Unix |
| internal const string CreateToolhelp32SnapshotDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*120*/ | ||
| internal const string Process32FirstDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*121*/ | ||
| internal const string Process32NextDllName = "api-ms-win-core-toolhelp-l1-1-0"; /*122*/ | ||
| internal const string GetACPDllName = "api-ms-win-core-localization-l1-2-0.dll"; /*123*/ |
There was a problem hiding this comment.
The API set matches the API on my laptop. I don't have access to a Nano VM at the moment to verify it there. On Nano, it must match the exporting API set exactly. You can verify it with dumpbin.exe /exports <path to api set>
There was a problem hiding this comment.
Now I don't run still dumpbin.exe /exports
MSDN docs say about:
- Nano api-ms-win-core-localization-l1-2-2.dll https://msdn.microsoft.com/en-us/library/mt588480%28v=vs.85%29.aspx?#api-ms-win-core-localization-l1-2-2.dll
- Desktop api-ms-win-core-localization-l1-2-1.dll https://msdn.microsoft.com/en-us/library/windows/desktop/dn933214(v=vs.85).aspx
What version we should use?
There was a problem hiding this comment.
GetACP is in api-ms-win-core-localization-l1-2-0.dll of runtime.win7-x64.Microsoft.NETCore.Windows.ApiSets dump.
But It seems Nano use runtime.win10-x64.Microsoft.NETCore.Windows.ApiSets
There was a problem hiding this comment.
It is correct "as-is" here. No change needed.
There was a problem hiding this comment.
Thanks! Clear.
Is Nano still under question?
There was a problem hiding this comment.
No. Nano Server contains the 1-2-0 version of the API set
3a8d68b to
0f04279
Compare
|
@daxian-dbw @SteveL-MSFT CI have passed - many thanks! Sorry for many typos (make the PR in the weekend night was bad my idea). Please continue code review. @mklement0 Can you build the branch and check our expectations? If no, could you please attach files with your tests? |
|
@iSazonov: I built on my macOS 10.12 machine and I can confirm that the following ad-hoc tests return true on macOS (and, judging by the source, therefore all Unix platforms), as desired: # Setup: *create* a BOM-less UTF-8 file.
'ö' | set-content -nonewline /tmp/$pid.txt
# Tests: Both should output $True
# Compare the raw bytes of the new file to the UTF-8 encoding of 'ö' (0xc3 0xb6)
# With the current alpha17, this would return $False, because Set-Content creates an
# ISO-8859-1 file.
$null -eq (Compare-Object (Get-Content -Encoding Byte -Raw /tmp/$pid.txt) (0xc3, 0xb6))
# See if the BOM-less UTF-8 file is *read* correctly.
'ö' -eq (Get-Content -Raw /tmp/$pid.txt)(I wasn't able to run the tests - the ) |
|
@mklement0 Thanks for confirmation! It means the code is step in right direction! All tests have passed on CI. I believe "New-PSSession basic test" too and you can ignore the error in context of the PR (or try discover a root of the error: copy-past test code in your session and see $error[0] - I think you don't install any package). |
|
I added comment about tests in first post. |
|
Great idea, but I suggest, to avoid ambiguity, that you explicitly mention in your comment that this PR implements BOM-less UTF-8, especially given that your comment links to the .NET Core To add some more fodder for upcoming tests, here's a simple one that tests whether PowerShell itself correctly reads BOM-less UTF-8 source code (such as scripts) by default in PS Core on Unix: # Setup:
# Create a no-BOM UTF-8 file with the following literal content:
# 'ö'
# which, when executed as a PS script, should echo 'ö' back, IF the script
# was correctly decoded from UTF-8 by PS.
# UTF-8 bytes: 0x27 (single quote), 0xc3 0xb6 (encoding of 'ö', U+00F6), 0x27 (single quote)
[byte[]] (0x27, 0xc3, 0xb6, 0x27) | Set-Content -Encoding Byte /tmp/$PID.ps1
# Test: See if the 'ö' is echoed back correctly.
# Should return $True
'ö' -eq (& /tmp/$PID.ps1) |
|
@mklement0 I corrected first post about BOM-less, in the code it is obvious. |
|
@iSazonov are you going to add tests to this PR? |
|
I added in first post:
I am confused. Partial test is meaningless. The full test also does not make sense until the approval of the Encoding RFC. What is compromise? |
|
@iSazonov the code changes look good to me. If the corresponding tests will come later, then we should keep that work item tracked in an issue. Is #3248 considered fixed with this PR? If not, the test work item should be tracked in the first post of that issue, similar to how you are tracking |
|
@daxian-dbw Tracking Issue #3488. |
|
@iSazonov Thank you! I will merge this PR later today. |
Based on conclusion in discussion #3248
We need internal fix because .Net Core haven't "OEM" term.
The fix should provide PowerShell Core behavior:
on Windows - as Windows PowerShell based on GetOEMCP() and legacy encodings (System.Text.Encoding.CodePages)
on Unix - the same as default encoding in PowerShell Core
- Default
We need internal fix because CoreFX default for all platforms is UTF8
Until we have completed Encoding RFC we should restore behaivor as in Windows PowerShell.
The fix should provide PowerShell Core behavior:
on Windows - as Windows PowerShell based on GetACP() and legacy encodings (System.Text.Encoding.CodePages)
on Unix - UTF-8 without BOM (We are expecting that the same will be in CoreFX)
- Tests
Now we don't have a full set of encoding tests (only for redirections). I believe we need to create them during future RFC implementation, not now.