aspnetcore/unicode/how-to-update.txt

95 lines
4.4 KiB
Plaintext

This document contains instructions for updating the Unicode data set used by
the WebEncoders project.
1) Download the latest UnicodeData.txt and Blocks.txt from the Unicode
Consortium web site. These files are normally found under
http://www.unicode.org/Public/X.Y.Z/ucd/, where X.Y.Z is the version of the
Unicode specification of interest. Replace the UnicodeData.txt and
Blocks.txt files in this folder with the files you downloaded.
2) Update unicode-copyright.txt in this folder with the following information:
- The exact URLs where you downloaded UnicodeData.txt and Blocks.txt.
- The date on which you downloaded these two files.
- The Unicode copyright and permission notice, if it has changed. The latest
copyright and permission notice can be found at the bottom of
http://www.unicode.org/copyright.html.
3) Open the Generators solution and run the DefinedCharListGenerator project.
Running this will drop a file unicode-defined-chars.bin into the output
folder. Move this file into the following directory, overwriting the
existing file in that directory:
src\Microsoft.Framework.WebEncoders.Core\compiler\resources
4) Open the Generators solution and run the UnicodeTablesGenerator project.
Running this will drop two files UnicodeRanges.generated.txt and
UnicodeRangesTests.generated.txt into the output folder.
5) Open UnicodeRanges.generated.txt in your favorite text editor. You'll see
that the file contains all of the parsed Unicode block information in
ascending code point order. Manually REMOVE the following blocks from this
text file and re-save it.
- High Surrogates (U+D800..U+DB7F)
- High Private Use Surrogates (U+DB80..U+DBFF)
- Low Surrogates (U+DC00..U+DFFF)
- Private Use Area (U+E000..U+F8FF)
6) Open src\Microsoft.Framework.WebEncoders.Core\UnicodeRanges.generated.cs in
your IDE. Delete everything within the partial class definition and replace
it with the contents of UnicodeRanges.generated.txt. (Remember to remove
the blocks mentioned in the previous step, otherwise unit tests will fail.)
Open src\Microsoft.Framework.WebEncoders.Core\UnicodeRanges.cs in your IDE.
Update the doc comment at the top of the class to reflect the appropriate
version of the Unicode specification.
7) Open UnicodeRangesTests.generated.txt in your favorite text editor. Just
like in the previous .txt file, you'll need to remove the [InlineData]
lines which map to the Unicode blocks which were manually removed.
See step (5) for the list of which blocks must be removed. Then re-save
this file.
8) Open test\Microsoft.Framework.WebEncoders.Tests\UnicodeRangesTests.cs in
your IDE. Delete all of the [InlineData] attributes on the Range_Unicode
test, then paste the contents of UnicodeRangesTests.generated.txt in
to restore the new [InlineData] list.
IMPORTANT: Don't delete the [Theory] attribute on this method!
9) Open test\Microsoft.Framework.WebEncoders.Tests\UnicodeHelpersTests.cs in
your IDE. Scroll to the bottom of the ReadListOfDefinedCharacters method,
and you'll see a section where the test special-cases CJK Ideographs and
Hangul Syllables. As more characters are added to the Unicode specification
the list of valid CJK Ideographs and Hangul Syllables can grow, so make sure
these match up with the relevant lines in UnicodeData.txt. For instance, at
the time of this writing UnicodeData.txt lists the valid Hangul Syllable
character range as follows:
AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;
If necessary, update the logic in the ReadListOfDefinedCharacters method to
account for any changes to these lines in UnicodeData.txt.
That's it! Run the unit tests and everything should be good to go. If you find
any stray comments throughout the code base that reference a specific version
of the Unicode specification, go ahead and update them so that they correctly
reflect the version you just submitted.
To recap, the files you should check in are:
src\Microsoft.Framework.WebEncoders.Core\compiler\resources\
unicode-defined-chars.bin
src\Microsoft.Framework.WebEncoders.Core\
UnicodeRanges.cs
UnicodeRanges.generated.cs
test\Microsoft.Framework.WebEncoders.Tests\
UnicodeHelpersTests.cs (if necessary, see step 9)
UnicodeRangesTests.cs
unicode\
Blocks.txt
unicode-copyright.txt
UnicodeData.txt