95 lines
4.4 KiB
Plaintext
95 lines
4.4 KiB
Plaintext
This document contains instructions for updating the Unicode data set used by
|
|
the WebEncoders project.
|
|
|
|
1) Download the latest UnicodeData.txt and Blocks.txt from the Unicode
|
|
Consortium web site. These files are normally found under
|
|
http://www.unicode.org/Public/X.Y.Z/ucd/, where X.Y.Z is the version of the
|
|
Unicode specification of interest. Replace the UnicodeData.txt and
|
|
Blocks.txt files in this folder with the files you downloaded.
|
|
|
|
2) Update unicode-copyright.txt in this folder with the following information:
|
|
- The exact URLs where you downloaded UnicodeData.txt and Blocks.txt.
|
|
- The date on which you downloaded these two files.
|
|
- The Unicode copyright and permission notice, if it has changed. The latest
|
|
copyright and permission notice can be found at the bottom of
|
|
http://www.unicode.org/copyright.html.
|
|
|
|
3) Open the Generators solution and run the DefinedCharListGenerator project.
|
|
Running this will drop a file unicode-defined-chars.bin into the output
|
|
folder. Move this file into the following directory, overwriting the
|
|
existing file in that directory:
|
|
src\Microsoft.Framework.WebEncoders.Core\compiler\resources
|
|
|
|
4) Open the Generators solution and run the UnicodeTablesGenerator project.
|
|
Running this will drop two files UnicodeRanges.generated.txt and
|
|
UnicodeRangesTests.generated.txt into the output folder.
|
|
|
|
5) Open UnicodeRanges.generated.txt in your favorite text editor. You'll see
|
|
that the file contains all of the parsed Unicode block information in
|
|
ascending code point order. Manually REMOVE the following blocks from this
|
|
text file and re-save it.
|
|
- High Surrogates (U+D800..U+DB7F)
|
|
- High Private Use Surrogates (U+DB80..U+DBFF)
|
|
- Low Surrogates (U+DC00..U+DFFF)
|
|
- Private Use Area (U+E000..U+F8FF)
|
|
|
|
6) Open src\Microsoft.Framework.WebEncoders.Core\UnicodeRanges.generated.cs in
|
|
your IDE. Delete everything within the partial class definition and replace
|
|
it with the contents of UnicodeRanges.generated.txt. (Remember to remove
|
|
the blocks mentioned in the previous step, otherwise unit tests will fail.)
|
|
|
|
Open src\Microsoft.Framework.WebEncoders.Core\UnicodeRanges.cs in your IDE.
|
|
Update the doc comment at the top of the class to reflect the appropriate
|
|
version of the Unicode specification.
|
|
|
|
7) Open UnicodeRangesTests.generated.txt in your favorite text editor. Just
|
|
like in the previous .txt file, you'll need to remove the [InlineData]
|
|
lines which map to the Unicode blocks which were manually removed.
|
|
See step (5) for the list of which blocks must be removed. Then re-save
|
|
this file.
|
|
|
|
8) Open test\Microsoft.Framework.WebEncoders.Tests\UnicodeRangesTests.cs in
|
|
your IDE. Delete all of the [InlineData] attributes on the Range_Unicode
|
|
test, then paste the contents of UnicodeRangesTests.generated.txt in
|
|
to restore the new [InlineData] list.
|
|
|
|
IMPORTANT: Don't delete the [Theory] attribute on this method!
|
|
|
|
9) Open test\Microsoft.Framework.WebEncoders.Tests\UnicodeHelpersTests.cs in
|
|
your IDE. Scroll to the bottom of the ReadListOfDefinedCharacters method,
|
|
and you'll see a section where the test special-cases CJK Ideographs and
|
|
Hangul Syllables. As more characters are added to the Unicode specification
|
|
the list of valid CJK Ideographs and Hangul Syllables can grow, so make sure
|
|
these match up with the relevant lines in UnicodeData.txt. For instance, at
|
|
the time of this writing UnicodeData.txt lists the valid Hangul Syllable
|
|
character range as follows:
|
|
|
|
AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
|
|
D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;
|
|
|
|
If necessary, update the logic in the ReadListOfDefinedCharacters method to
|
|
account for any changes to these lines in UnicodeData.txt.
|
|
|
|
That's it! Run the unit tests and everything should be good to go. If you find
|
|
any stray comments throughout the code base that reference a specific version
|
|
of the Unicode specification, go ahead and update them so that they correctly
|
|
reflect the version you just submitted.
|
|
|
|
To recap, the files you should check in are:
|
|
|
|
src\Microsoft.Framework.WebEncoders.Core\compiler\resources\
|
|
unicode-defined-chars.bin
|
|
|
|
src\Microsoft.Framework.WebEncoders.Core\
|
|
UnicodeRanges.cs
|
|
UnicodeRanges.generated.cs
|
|
|
|
test\Microsoft.Framework.WebEncoders.Tests\
|
|
UnicodeHelpersTests.cs (if necessary, see step 9)
|
|
UnicodeRangesTests.cs
|
|
|
|
unicode\
|
|
Blocks.txt
|
|
unicode-copyright.txt
|
|
UnicodeData.txt
|