This document contains instructions for updating the Unicode data set used by
the WebEncoders project.

1) Download the latest UnicodeData.txt and Blocks.txt from the Unicode
   Consortium web site. These files are normally found under
   http://www.unicode.org/Public/X.Y.Z/ucd/, where X.Y.Z is the version of the
   Unicode specification of interest. Replace the UnicodeData.txt and
   Blocks.txt files in this folder with the files you downloaded.

2) Update unicode-copyright.txt in this folder with the following information:
   - The exact URLs where you downloaded UnicodeData.txt and Blocks.txt.
   - The date on which you downloaded these two files.
   - The Unicode copyright and permission notice, if it has changed. The latest
     copyright and permission notice can be found at the bottom of
	 http://www.unicode.org/copyright.html.

3) Open the Generators solution and run the DefinedCharListGenerator project.
   Running this will drop a file unicode-defined-chars.bin into the output
   folder. Move this file into the following directory, overwriting the
   existing file in that directory:
   src\Microsoft.Extensions.WebEncoders.Core\compiler\resources

4) Open the Generators solution and run the UnicodeTablesGenerator project.
   Running this will drop two files UnicodeRanges.generated.txt and
   UnicodeRangesTests.generated.txt into the output folder.

5) Open UnicodeRanges.generated.txt in your favorite text editor. You'll see
   that the file contains all of the parsed Unicode block information in
   ascending code point order. Manually REMOVE the following blocks from this
   text file and re-save it.
   - High Surrogates (U+D800..U+DB7F)
   - High Private Use Surrogates (U+DB80..U+DBFF)
   - Low Surrogates (U+DC00..U+DFFF)
   - Private Use Area (U+E000..U+F8FF)

6) Open src\Microsoft.Extensions.WebEncoders.Core\UnicodeRanges.generated.cs in
   your IDE. Delete everything within the partial class definition and replace
   it with the contents of UnicodeRanges.generated.txt. (Remember to remove
   the blocks mentioned in the previous step, otherwise unit tests will fail.)

   Open src\Microsoft.Extensions.WebEncoders.Core\UnicodeRanges.cs in your IDE.
   Update the doc comment at the top of the class to reflect the appropriate
   version of the Unicode specification.

7) Open UnicodeRangesTests.generated.txt in your favorite text editor. Just
   like in the previous .txt file, you'll need to remove the [InlineData]
   lines which map to the Unicode blocks which were manually removed.
   See step (5) for the list of which blocks must be removed. Then re-save
   this file.

8) Open test\Microsoft.Extensions.WebEncoders.Tests\UnicodeRangesTests.cs in
   your IDE. Delete all of the [InlineData] attributes on the Range_Unicode
   test, then paste the contents of UnicodeRangesTests.generated.txt in
   to restore the new [InlineData] list.

   IMPORTANT: Don't delete the [Theory] attribute on this method!

9) Open test\Microsoft.Extensions.WebEncoders.Tests\UnicodeHelpersTests.cs in
   your IDE. Scroll to the bottom of the ReadListOfDefinedCharacters method,
   and you'll see a section where the test special-cases CJK Ideographs and
   Hangul Syllables. As more characters are added to the Unicode specification
   the list of valid CJK Ideographs and Hangul Syllables can grow, so make sure
   these match up with the relevant lines in UnicodeData.txt. For instance, at
   the time of this writing UnicodeData.txt lists the valid Hangul Syllable
   character range as follows:

     AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
     D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;

   If necessary, update the logic in the ReadListOfDefinedCharacters method to
   account for any changes to these lines in UnicodeData.txt.

That's it! Run the unit tests and everything should be good to go. If you find
any stray comments throughout the code base that reference a specific version
of the Unicode specification, go ahead and update them so that they correctly
reflect the version you just submitted.

To recap, the files you should check in are:

src\Microsoft.Extensions.WebEncoders.Core\compiler\resources\
  unicode-defined-chars.bin

src\Microsoft.Extensions.WebEncoders.Core\
  UnicodeRanges.cs
  UnicodeRanges.generated.cs

test\Microsoft.Extensions.WebEncoders.Tests\
  UnicodeHelpersTests.cs (if necessary, see step 9)
  UnicodeRangesTests.cs

unicode\
  Blocks.txt
  unicode-copyright.txt
  UnicodeData.txt
