string – Whats the difference between str.isdigit, isnumeric and isdecimal in python?

string – Whats the difference between str.isdigit, isnumeric and isdecimal in python?

By definition, isdecimal()isdigit()isnumeric(). That is, if a string is decimal, then itll also be digit and numeric.

Therefore, given a string s and test it with those three methods, therell only be 4 types of results.

+-------------+-----------+-------------+----------------------------------+
| isdecimal() | isdigit() | isnumeric() |          Example                 |
+-------------+-----------+-------------+----------------------------------+
|    True     |    True   |    True     | 038, ੦੩੮, 038           |
|  False      |    True   |    True     | ⁰³⁸,  ⒊⒏, ⓪③⑧          |
|  False      |  False    |    True     | ↉⅛⅘, ⅠⅢⅧ, ⑩⑬㊿, 壹貳參  |
|  False      |  False    |  False      | abc, 38.0, -38             |
+-------------+-----------+-------------+----------------------------------+

1. Some examples of characters isdecimal()==True

(thus isdigit()==True and isnumeric()==True)

0123456789  DIGIT ZERO~NINE
٠١٢٣٤٥٦٧٨٩  ARABIC-INDIC DIGIT ZERO~NINE
०१२३४५६७८९  DEVANAGARI DIGIT ZERO~NINE
০১২৩৪৫৬৭৮৯  BENGALI DIGIT ZERO~NINE
੦੧੨੩੪੫੬੭੮੯  GURMUKHI DIGIT ZERO~NINE
૦૧૨૩૪૫૬૭૮૯  GUJARATI DIGIT ZERO~NINE
୦୧୨୩୪୫୬୭୮୯  ORIYA DIGIT ZERO~NINE
௦௧௨௩௪௫௬௭௮௯  TAMIL DIGIT ZERO~NINE
౦౧౨౩౪౫౬౭౮౯  TELUGU DIGIT ZERO~NINE
೦೧೨೩೪೫೬೭೮೯  KANNADA DIGIT ZERO~NINE
൦൧൨൩൪൫൬൭൮൯  MALAYALAM DIGIT ZERO~NINE
๐๑๒๓๔๕๖๗๘๙  THAI DIGIT ZERO~NINE
໐໑໒໓໔໕໖໗໘໙  LAO DIGIT ZERO~NINE
༠༡༢༣༤༥༦༧༨༩  TIBETAN DIGIT ZERO~NINE
၀၁၂၃၄၅၆၇၈၉  MYANMAR DIGIT ZERO~NINE
០១២៣៤៥៦៧៨៩  KHMER DIGIT ZERO~NINE
0123456789  FULLWIDTH DIGIT ZERO~NINE
   MATHEMATICAL BOLD DIGIT ZERO~NINE
   MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO~NINE
   MATHEMATICAL SANS-SERIF DIGIT ZERO~NINE
   MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO~NINE
   MATHEMATICAL MONOSPACE DIGIT ZERO~NINE

2. Some examples of characters isdecimal()==False but isdigit()==True

(thus isnumeric()==True)

⁰¹²³⁴⁵⁶⁷⁸⁹  SUPERSCRIPT ZERO~NINE
₀₁₂₃₄₅₆₇₈₉  SUBSCRIPT ZERO~NINE
 ⒈⒉⒊⒋⒌⒍⒎⒏⒐  DIGIT ZERO~NINE FULL STOP
   DIGIT ZERO~NINE COMMA
⓪①②③④⑤⑥⑦⑧⑨  CIRCLED DIGIT ZERO~NINE
⓿❶❷❸❹❺❻❼❽❾  NEGATIVE CIRCLED DIGIT ZERO~NINE
⑴⑵⑶⑷⑸⑹⑺⑻⑼  PARENTHESIZED DIGIT ONE~NINE
➀➁➂➃➄➅➆➇➈  DINGBAT CIRCLED SANS-SERIF DIGIT ONE~NINE
⓵⓶⓷⓸⓹⓺⓻⓼⓽  DOUBLE CIRCLED DIGIT ONE~NINE
➊➋➌➍➎➏➐➑➒  DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE~NINE
፩፪፫፬፭፮፯፰፱  ETHIOPIC DIGIT ONE~NINE

3. Some examples of characters isdecimal()==False and isdigit()==False but isnumeric()==True

½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉  VULGAR FRACTION
৴৵৶৷৸৹  BENGALI CURRENCY NUMERATOR
௰௱௲  TAMIL NUMBER TEN, ONE HUNDRED, ONE THOUSAND
౸౹౺౻౼౽౾  TELUGU FRACTION DIGIT
൰൱൲൳൴൵  MALAYALAM NUMBER, MALAYALAM FRACTION
༳༪༫༬༭༮༯༰༱༲  TIBETAN DIGIT HALF ZERO~NINE
፲፳፴፵፶፷፸፹፺፻፼  ETHIOPIC NUMBER TEN~NINETY, HUNDRED, TEN THOUSAND
៰៱៲៳៴៵៶៷៸៹  KHMER SYMBOL LEK ATTAK
ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯ  ROMAN NUMERAL
ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿ  SMALL ROMAN NUMERAL
ↀↁↂↅↆ  ROMAN NUMERAL
⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳㉑㉒㉓㉔㉕㉖㉗㉘㉙㉚㉛㉜㉝㉞㉟㊱㊲㊳㊴㊵㊶㊷㊸㊹㊺㊻㊼㊽㊾㊿  CIRCLED NUMBER TEN~FIFTY
㉈㉉㉊㉋㉌㉍㉎㉏  CIRCLED NUMBER TEN~EIGHTY ON BLACK SQUARE
⑽⑾⑿⒀⒁⒂⒃⒄⒅⒆⒇  PARENTHESIZED NUMBER TEN~TWENTY
⒑⒒⒓⒔⒕⒖⒗⒘⒙⒚⒛  NUMBER TEN~TWENTY FULL STOP
⓫⓬⓭⓮⓯⓰⓱⓲⓳⓴  NEGATIVE CIRCLED NUMBER ELEVEN
⓾➉❿➓  various styles of CIRCLED NUMBER TEN
   DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
〇  IDEOGRAPHIC NUMBER ZERO
〡〢〣〤〥〦〧〨〩〸〹〺  HANGZHOU NUMERAL ONE~TEN, TWENTY, THIRTY
㆒㆓㆔㆕  IDEOGRAPHIC ANNOTATION ONE~FOUR MARK
㈠㈡㈢㈣㈤㈥㈦㈧㈨㈩  PARENTHESIZED IDEOGRAPH ONE~TEN
㊀㊁㊂㊃㊄㊅㊆㊇㊈㊉  CIRCLED IDEOGRAPH ONE~TEN
一二三四五六七八九十壹貳參肆伍陸柒捌玖拾零百千萬億兆弐貮贰㒃㭍漆什㐅陌阡佰仟万亿幺兩㠪亖卄卅卌廾廿  CJK UNIFIED IDEOGRAPH
參拾兩零六陸什  CJK COMPATIBILITY IDEOGRAPH
   AEGEAN NUMBER ONE~NINE, TEN~NINETY
   AEGEAN NUMBER ONE~NINE HUNDRED, ONE~NINE THOUSAND
   AEGEAN NUMBER TEN~NINETY THOUSAND
   GREEK ACROPHONIC ATTIC
   COUNTING ROD UNIT DIGIT ONE~NINE
   COUNTING ROD TENS DIGIT ONE~NINE

Its mostly about unicode classifications. Heres some examples to show discrepancies:

>>> def spam(s):
...     for attr in isnumeric, isdecimal, isdigit:
...         print(attr, getattr(s, attr)())
...         
>>> spam(½)
isnumeric True
isdecimal False
isdigit False
>>> spam(³)
isnumeric True
isdecimal False
isdigit True

Specific behaviour is in the official docs here.

Script to find all of them:

import sys
import unicodedata
from collections import defaultdict

d = defaultdict(list)
for i in range(sys.maxunicode + 1):
    s = chr(i)
    t = s.isnumeric(), s.isdecimal(), s.isdigit()
    if len(set(t)) == 2:
        try:
            name = unicodedata.name(s)
        except ValueError:
            name = fcodepoint{i}
        print(s, name)
        d[t].append(s)

string – Whats the difference between str.isdigit, isnumeric and isdecimal in python?

The Python documentation notes the difference between the three methods.

str.isdigit

Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.

str.isnumeric

Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.

str.isdecimal

Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those that can be used to form numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category “Nd”.


Like @Wim said, the main difference between the three methods is the way they handle specific unicode characters.

Leave a Reply

Your email address will not be published. Required fields are marked *