Documentation

UnicodeBasic

General API #

As a general rule, for a given a Unicode property called Unicode_Property, for example:

Unicode general categories are encoded using the type GC. This type has a boolean algebra structure with inclusion , meet/intersection &&&, join/union ||| and complement ~~~. The relation is provided to check whether a character belongs to a given category. For example, c ∈ (GC.L &&& ~~~GC.Lt) ||| GC.Z checks whether c is a either a non-titlecase letter or a separator.

Name #

@[inline]

Get character name

When the Unicode property Name is empty, a unique code label is returned as recommended in Unicode Standard, section 4.8. These labels start with '<' (U+003C) and end with '>' (U+003E) so they are distinguishable from actual name values.

Unicode property: Name

Equations
Instances For

    Script #

    @[inline]

    Get character script

    Unicode property: Script

    Equations
    Instances For
      @[inline]

      Get script name

      Returns none if the script code is unassigned.

      Unicode property: Script

      Equations
      Instances For

        Bidirectional Properties #

        @[inline]

        Get character bidirectional class

        Unicode property: Bidi_Class

        Equations
        Instances For
          @[inline]

          Check if bidirectional mirrored character

          Unicode property: Bidi_Mirrored

          Equations
          Instances For
            @[inline]

            Check if bidirectional control character

            Unicode property: Bidi_Control

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For

              General Category #

              @[inline]
              def Unicode.getGC (char : Char) :

              Get character general category

              Caveat: This function never returns a derived general category. Use Unicode.isInGeneralCategory to check whether a character belongs to a general category (derived or not).

              Unicode property: General_Category

              Equations
              Instances For
                @[implicit_reducible]
                Equations
                @[implicit_reducible]
                instance Unicode.instDecidableMemCharGC (char : Char) (cat : GC) :
                Decidable (char cat)
                Equations
                @[reducible, inline]

                Check if letter character (L)

                This is a derived category (L = Lu | Ll | Lt | Lm | Lo).

                Unicode Property: General_Category=L

                Equations
                Instances For
                  @[reducible, inline]

                  Check if lowercase letter character (Ll)

                  Unicode Property: General_Category=Ll

                  Equations
                  Instances For
                    @[reducible, inline]

                    Check if titlecase letter character (Lt)

                    Unicode Property: General_Category=Lt

                    Equations
                    Instances For
                      @[reducible, inline]

                      Check if uppercase letter character (Lu)

                      Unicode Property: General_Category=Lu

                      Equations
                      Instances For
                        @[reducible, inline]

                        Check if cased letter character (LC)

                        This is a derived category (L = Lu | Ll | Lt).

                        Unicode Property: General_Category=LC

                        Equations
                        Instances For
                          @[reducible, inline]

                          Check if modifier letter character (Lm)

                          Unicode Property: General_Category=Lm

                          Equations
                          Instances For
                            @[reducible, inline]

                            Check if other letter character (Lo)

                            Unicode Property: General_Category=Lo

                            Equations
                            Instances For
                              @[reducible, inline]

                              Check if mark character (M)

                              This is a derived category (M = Mn | Mc | Me).

                              Unicode Property: General_Category=M

                              Equations
                              Instances For
                                @[reducible, inline]

                                Check if nonspacing combining mark character (Mn)

                                Unicode Property: General_Category=Mn

                                Equations
                                Instances For
                                  @[reducible, inline]

                                  Check if spacing combining mark character (Mc)

                                  Unicode Property: General_Category=Mc

                                  Equations
                                  Instances For
                                    @[reducible, inline]

                                    Check if enclosing combining mark character (Me)

                                    Unicode Property: General_Category=Me

                                    Equations
                                    Instances For
                                      @[reducible, inline]

                                      Check if number character (N)

                                      This is a derived category (N = Nd | Nl | No).

                                      Unicode Property: General_Category=N

                                      Equations
                                      Instances For
                                        @[reducible, inline]

                                        Check if decimal number character (Nd)

                                        Unicode Property: General_Category=Nd

                                        Equations
                                        Instances For
                                          @[reducible, inline]

                                          Check if letter number character (Nl)

                                          Unicode Property: General_Category=Nl

                                          Equations
                                          Instances For
                                            @[reducible, inline]

                                            Check if other number character (No)

                                            Unicode Property: General_Category=No

                                            Equations
                                            Instances For
                                              @[reducible, inline]

                                              Check if punctuation character (P)

                                              This is a derived category (P = Pc | Pd | Ps | Pe | Pi | Pf | Po).

                                              Unicode Property: General_Category=P

                                              Equations
                                              Instances For
                                                @[reducible, inline]

                                                Check if connector punctuation character (Pc)

                                                Unicode Property: General_Category=Pc

                                                Equations
                                                Instances For
                                                  @[reducible, inline]

                                                  Check if dash punctuation character (Pd)

                                                  Unicode Property: General_Category=Pd

                                                  Equations
                                                  Instances For
                                                    @[reducible, inline]

                                                    Check if grouping punctuation character (PG)

                                                    This is a derived category (PG = Ps | Pe).

                                                    Unicode Property: General_Category=PG

                                                    Equations
                                                    Instances For
                                                      @[reducible, inline]

                                                      Check if open punctuation character (Ps)

                                                      Unicode Property: General_Category=Ps

                                                      Equations
                                                      Instances For
                                                        @[reducible, inline]

                                                        Check if close punctuation character (Pe)

                                                        Unicode Property: General_Category=Pe

                                                        Equations
                                                        Instances For
                                                          @[reducible, inline]

                                                          Check if quoting punctuation character (PQ)

                                                          This is a derived category (PQ = Pi | Pf).

                                                          Unicode Property: General_Category=PQ

                                                          Equations
                                                          Instances For
                                                            @[reducible, inline]

                                                            Check if initial punctuation character (Pi)

                                                            Unicode Property: General_Category=Pi

                                                            Equations
                                                            Instances For
                                                              @[reducible, inline]

                                                              Check if initial punctuation character (Pf)

                                                              Unicode Property: General_Category=Pf

                                                              Equations
                                                              Instances For
                                                                @[reducible, inline]

                                                                Check if other punctuation character (Po)

                                                                Unicode Property: General_Category=Po

                                                                Equations
                                                                Instances For
                                                                  @[reducible, inline]

                                                                  Check if symbol character (S)

                                                                  This is a derived category (S = Sm | Sc | Sk | So).

                                                                  Unicode Property: General_Category=S

                                                                  Equations
                                                                  Instances For
                                                                    @[reducible, inline]

                                                                    Check if math symbol character (Sm)

                                                                    Unicode Property: General_Category=Sm

                                                                    Equations
                                                                    Instances For
                                                                      @[reducible, inline]

                                                                      Check if currency symbol character (Sc)

                                                                      Unicode Property: General_Category=Sc

                                                                      Equations
                                                                      Instances For
                                                                        @[reducible, inline]

                                                                        Check if modifier symbol character (Sk)

                                                                        Unicode Property: General_Category=Sk

                                                                        Equations
                                                                        Instances For
                                                                          @[reducible, inline]

                                                                          Check if other symbol character (So)

                                                                          Unicode Property: General_Category=So

                                                                          Equations
                                                                          Instances For
                                                                            @[reducible, inline]

                                                                            Check if separator character (Z)

                                                                            This is a derived property (Z = Zs | Zl | Zp).

                                                                            Unicode Property: General_Category=Z

                                                                            Equations
                                                                            Instances For
                                                                              @[reducible, inline]

                                                                              Check if space separator character (Zs)

                                                                              Unicode Property: General_Category=Zs

                                                                              Equations
                                                                              Instances For
                                                                                @[reducible, inline]

                                                                                Check if line separator character (Zl)

                                                                                Unicode Property: General_Category=Zl

                                                                                Equations
                                                                                Instances For
                                                                                  @[reducible, inline]

                                                                                  Check if paragraph separator character (Zp)

                                                                                  Unicode Property: General_Category=Zp

                                                                                  Equations
                                                                                  Instances For
                                                                                    @[reducible, inline]

                                                                                    Check if other character (C)

                                                                                    This is a derived category (C = Cc | Cf | Cs | Co | Cn).

                                                                                    Unicode Property: General_Category=C

                                                                                    Equations
                                                                                    Instances For
                                                                                      @[reducible, inline]

                                                                                      Check if control character (Cc)

                                                                                      Unicode Property: General_Category=Cc

                                                                                      Equations
                                                                                      Instances For
                                                                                        @[reducible, inline]

                                                                                        Check if format character (Cf)

                                                                                        Unicode Property: General_Category=Cf

                                                                                        Equations
                                                                                        Instances For
                                                                                          @[reducible, inline]

                                                                                          Check if surrogate character (Cs)

                                                                                          Does not actually occur since Lean does not regard surrogate code points as characters.

                                                                                          Unicode Property: General_Category=Cs

                                                                                          Equations
                                                                                          Instances For
                                                                                            @[reducible, inline]

                                                                                            Check if private use character (Co)

                                                                                            Unicode Property: General_Category=Co

                                                                                            Equations
                                                                                            Instances For
                                                                                              @[reducible, inline]

                                                                                              Check if unassigned character (Cn)

                                                                                              Unicode Property: General_Category=Cn

                                                                                              Equations
                                                                                              Instances For

                                                                                                Case Type and Mapping #

                                                                                                @[inline]

                                                                                                Check if lowercase letter character

                                                                                                Generated by General_Category=Ll | Other_Lowercase.

                                                                                                Unicode property: Lowercase

                                                                                                Equations
                                                                                                Instances For
                                                                                                  @[inline]

                                                                                                  Check if uppercase letter character

                                                                                                  Generated by General_Category=Lu | Other_Uppercase.

                                                                                                  Unicode property: Uppercase

                                                                                                  Equations
                                                                                                  Instances For
                                                                                                    @[inline]
                                                                                                    def Unicode.isCased (char : Char) :

                                                                                                    Check if cased letter character

                                                                                                    Generated by General_Category=LC | Other_Lowercase | Other_Uppercase.

                                                                                                    Unicode property: Cased

                                                                                                    Equations
                                                                                                    Instances For
                                                                                                      @[inline]

                                                                                                      Check if character is ignorable for casing purposes

                                                                                                      Generated from general categories Lm, Mn, Me, Sk, Cf, and word break properties MidLetter, MidNumLet, Single_Quote.

                                                                                                      Unicode property: Case_Ignorable

                                                                                                      Equations
                                                                                                      Instances For
                                                                                                        @[inline]

                                                                                                        Map character to lowercase

                                                                                                        This function does not handle the case where the lowercase mapping would consist of more than one character.

                                                                                                        Unicode property: Simple_Lowercase_Mapping

                                                                                                        Equations
                                                                                                        • One or more equations did not get rendered due to their size.
                                                                                                        Instances For
                                                                                                          @[inline]

                                                                                                          Map character to uppercase

                                                                                                          This function does not handle the case where the uppercase mapping would consist of more than one character.

                                                                                                          Unicode property: Simple_Uppercase_Mapping

                                                                                                          Equations
                                                                                                          • One or more equations did not get rendered due to their size.
                                                                                                          Instances For
                                                                                                            @[inline]

                                                                                                            Map character to titlecase

                                                                                                            This function does not handle the case where the titlecase mapping would consist of more than one character.

                                                                                                            Unicode property: Simple_Titlecase_Mapping

                                                                                                            Equations
                                                                                                            • One or more equations did not get rendered due to their size.
                                                                                                            Instances For

                                                                                                              Decomposition Type and Mapping #

                                                                                                              Get canonical combining class of character

                                                                                                              Unicode property: Canonical_Combining_Class

                                                                                                              Equations
                                                                                                              Instances For

                                                                                                                Get canonical decomposition of character (NFD)

                                                                                                                Unicode properties: Decomposition_Mapping Decomposition_Type=Canonical

                                                                                                                Equations
                                                                                                                • One or more equations did not get rendered due to their size.
                                                                                                                Instances For

                                                                                                                  Get decomposition mapping of a character

                                                                                                                  This is used in normalization to canonical decomposition (NFD) and compatibility decomposition (NFKD).

                                                                                                                  Unicode properties: Decomposition_Type Decomposition_Mapping

                                                                                                                  Equations
                                                                                                                  Instances For

                                                                                                                    Numeric Type and Value #

                                                                                                                    @[inline]

                                                                                                                    Check if character represents a numerical value

                                                                                                                    Unicode property: Numeric_Type=Numeric

                                                                                                                    Equations
                                                                                                                    • One or more equations did not get rendered due to their size.
                                                                                                                    Instances For
                                                                                                                      @[inline]
                                                                                                                      def Unicode.isDigit (char : Char) :

                                                                                                                      Check if character represents a digit in base 10

                                                                                                                      Unicode property: Numeric_Type=Digit

                                                                                                                      Equations
                                                                                                                      • One or more equations did not get rendered due to their size.
                                                                                                                      Instances For
                                                                                                                        @[inline]
                                                                                                                        def Unicode.getDigit? (char : Char) :
                                                                                                                        Option (Fin 10)

                                                                                                                        Get value of digit

                                                                                                                        Unicode properties: Numeric_Type=Digit Numeric_Value

                                                                                                                        Equations
                                                                                                                        • One or more equations did not get rendered due to their size.
                                                                                                                        Instances For
                                                                                                                          @[inline]

                                                                                                                          Check if character represents a decimal digit

                                                                                                                          For this property, a character must be part of a contiguous sequence representing the ten decimal digits in order from 0 to 9.

                                                                                                                          Unicode property: Numeric_Type=Decimal

                                                                                                                          Equations
                                                                                                                          • One or more equations did not get rendered due to their size.
                                                                                                                          Instances For
                                                                                                                            @[inline]

                                                                                                                            Get decimal digit range

                                                                                                                            If the character is part of a contiguous sequence representing the ten decimal digits in order from 0 to 9, this function returns the first and last characters from this range.

                                                                                                                            Unicode properties: Numeric_Type=Decimal Numeric_Value

                                                                                                                            Equations
                                                                                                                            • One or more equations did not get rendered due to their size.
                                                                                                                            Instances For
                                                                                                                              @[inline]

                                                                                                                              Check if character represents a hexadecimal digit

                                                                                                                              Unicode property: Hex_Digit

                                                                                                                              Equations
                                                                                                                              • One or more equations did not get rendered due to their size.
                                                                                                                              Instances For
                                                                                                                                @[inline]
                                                                                                                                def Unicode.getHexDigit? (char : Char) :
                                                                                                                                Option (Fin 16)

                                                                                                                                Get value of a hexadecimal digit

                                                                                                                                Unicode property: Hex_Digit

                                                                                                                                Equations
                                                                                                                                • One or more equations did not get rendered due to their size.
                                                                                                                                Instances For

                                                                                                                                  Other Properties #

                                                                                                                                  @[inline]

                                                                                                                                  Check if noncharacter

                                                                                                                                  Unicode property: Noncharacter_Code_Point

                                                                                                                                  Equations
                                                                                                                                  Instances For
                                                                                                                                    @[inline]

                                                                                                                                    Check if ignorable character

                                                                                                                                    Unicode property: Default_Ignorable_Code_Point

                                                                                                                                    Equations
                                                                                                                                    Instances For
                                                                                                                                      @[inline]

                                                                                                                                      Check if white space character

                                                                                                                                      Unicode property: White_Space

                                                                                                                                      Equations
                                                                                                                                      Instances For
                                                                                                                                        @[inline]
                                                                                                                                        def Unicode.isMath (char : Char) :

                                                                                                                                        Check if mathematical symbol character

                                                                                                                                        Generated by GeneralCategory=Sm | Other_Math.

                                                                                                                                        Unicode property: Math

                                                                                                                                        Equations
                                                                                                                                        Instances For
                                                                                                                                          @[inline]

                                                                                                                                          Check if alphabetic character

                                                                                                                                          Generated by GeneralCategory=L | GeneralCategory=Nl | Other_Alphabetic.

                                                                                                                                          Unicode property: Alphabetic

                                                                                                                                          Equations
                                                                                                                                          Instances For
                                                                                                                                            @[reducible, inline]
                                                                                                                                            abbrev Unicode.isAlpha (char : Char) :

                                                                                                                                            Check if alphabetic character

                                                                                                                                            Generated by GeneralCategory=L | GeneralCategory=Nl | Other_Alphabetic.

                                                                                                                                            Unicode property: Alphabetic

                                                                                                                                            Equations
                                                                                                                                            Instances For