string – Whats the difference between str.isdigit, isnumeric and isdecimal in python?
string – Whats the difference between str.isdigit, isnumeric and isdecimal in python?
By definition, isdecimal()
⊆ isdigit()
⊆ isnumeric()
. That is, if a string is decimal
, then itll also be digit
and numeric
.
Therefore, given a string s
and test it with those three methods, therell only be 4 types of results.
+-------------+-----------+-------------+----------------------------------+
| isdecimal() | isdigit() | isnumeric() | Example |
+-------------+-----------+-------------+----------------------------------+
| True | True | True | 038, ੦੩੮, 038 |
| False | True | True | ⁰³⁸, ⒊⒏, ⓪③⑧ |
| False | False | True | ↉⅛⅘, ⅠⅢⅧ, ⑩⑬㊿, 壹貳參 |
| False | False | False | abc, 38.0, -38 |
+-------------+-----------+-------------+----------------------------------+
1. Some examples of characters isdecimal()==True
(thus isdigit()==True
and isnumeric()==True
)
0123456789 DIGIT ZERO~NINE
٠١٢٣٤٥٦٧٨٩ ARABIC-INDIC DIGIT ZERO~NINE
०१२३४५६७८९ DEVANAGARI DIGIT ZERO~NINE
০১২৩৪৫৬৭৮৯ BENGALI DIGIT ZERO~NINE
੦੧੨੩੪੫੬੭੮੯ GURMUKHI DIGIT ZERO~NINE
૦૧૨૩૪૫૬૭૮૯ GUJARATI DIGIT ZERO~NINE
୦୧୨୩୪୫୬୭୮୯ ORIYA DIGIT ZERO~NINE
௦௧௨௩௪௫௬௭௮௯ TAMIL DIGIT ZERO~NINE
౦౧౨౩౪౫౬౭౮౯ TELUGU DIGIT ZERO~NINE
೦೧೨೩೪೫೬೭೮೯ KANNADA DIGIT ZERO~NINE
൦൧൨൩൪൫൬൭൮൯ MALAYALAM DIGIT ZERO~NINE
๐๑๒๓๔๕๖๗๘๙ THAI DIGIT ZERO~NINE
໐໑໒໓໔໕໖໗໘໙ LAO DIGIT ZERO~NINE
༠༡༢༣༤༥༦༧༨༩ TIBETAN DIGIT ZERO~NINE
၀၁၂၃၄၅၆၇၈၉ MYANMAR DIGIT ZERO~NINE
០១២៣៤៥៦៧៨៩ KHMER DIGIT ZERO~NINE
0123456789 FULLWIDTH DIGIT ZERO~NINE
MATHEMATICAL BOLD DIGIT ZERO~NINE
MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO~NINE
MATHEMATICAL SANS-SERIF DIGIT ZERO~NINE
MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO~NINE
MATHEMATICAL MONOSPACE DIGIT ZERO~NINE
2. Some examples of characters isdecimal()==False
but isdigit()==True
(thus isnumeric()==True
)
⁰¹²³⁴⁵⁶⁷⁸⁹ SUPERSCRIPT ZERO~NINE
₀₁₂₃₄₅₆₇₈₉ SUBSCRIPT ZERO~NINE
⒈⒉⒊⒋⒌⒍⒎⒏⒐ DIGIT ZERO~NINE FULL STOP
DIGIT ZERO~NINE COMMA
⓪①②③④⑤⑥⑦⑧⑨ CIRCLED DIGIT ZERO~NINE
⓿❶❷❸❹❺❻❼❽❾ NEGATIVE CIRCLED DIGIT ZERO~NINE
⑴⑵⑶⑷⑸⑹⑺⑻⑼ PARENTHESIZED DIGIT ONE~NINE
➀➁➂➃➄➅➆➇➈ DINGBAT CIRCLED SANS-SERIF DIGIT ONE~NINE
⓵⓶⓷⓸⓹⓺⓻⓼⓽ DOUBLE CIRCLED DIGIT ONE~NINE
➊➋➌➍➎➏➐➑➒ DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE~NINE
፩፪፫፬፭፮፯፰፱ ETHIOPIC DIGIT ONE~NINE
3. Some examples of characters isdecimal()==False
and isdigit()==False
but isnumeric()==True
½⅓¼⅕⅙⅐⅛⅑⅒⅔¾⅖⅗⅘⅚⅜⅝⅞⅟↉ VULGAR FRACTION
৴৵৶৷৸৹ BENGALI CURRENCY NUMERATOR
௰௱௲ TAMIL NUMBER TEN, ONE HUNDRED, ONE THOUSAND
౸౹౺౻౼౽౾ TELUGU FRACTION DIGIT
൰൱൲൳൴൵ MALAYALAM NUMBER, MALAYALAM FRACTION
༳༪༫༬༭༮༯༰༱༲ TIBETAN DIGIT HALF ZERO~NINE
፲፳፴፵፶፷፸፹፺፻፼ ETHIOPIC NUMBER TEN~NINETY, HUNDRED, TEN THOUSAND
៰៱៲៳៴៵៶៷៸៹ KHMER SYMBOL LEK ATTAK
ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯ ROMAN NUMERAL
ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿ SMALL ROMAN NUMERAL
ↀↁↂↅↆ ROMAN NUMERAL
⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳㉑㉒㉓㉔㉕㉖㉗㉘㉙㉚㉛㉜㉝㉞㉟㊱㊲㊳㊴㊵㊶㊷㊸㊹㊺㊻㊼㊽㊾㊿ CIRCLED NUMBER TEN~FIFTY
㉈㉉㉊㉋㉌㉍㉎㉏ CIRCLED NUMBER TEN~EIGHTY ON BLACK SQUARE
⑽⑾⑿⒀⒁⒂⒃⒄⒅⒆⒇ PARENTHESIZED NUMBER TEN~TWENTY
⒑⒒⒓⒔⒕⒖⒗⒘⒙⒚⒛ NUMBER TEN~TWENTY FULL STOP
⓫⓬⓭⓮⓯⓰⓱⓲⓳⓴ NEGATIVE CIRCLED NUMBER ELEVEN
⓾➉❿➓ various styles of CIRCLED NUMBER TEN
DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO
〇 IDEOGRAPHIC NUMBER ZERO
〡〢〣〤〥〦〧〨〩〸〹〺 HANGZHOU NUMERAL ONE~TEN, TWENTY, THIRTY
㆒㆓㆔㆕ IDEOGRAPHIC ANNOTATION ONE~FOUR MARK
㈠㈡㈢㈣㈤㈥㈦㈧㈨㈩ PARENTHESIZED IDEOGRAPH ONE~TEN
㊀㊁㊂㊃㊄㊅㊆㊇㊈㊉ CIRCLED IDEOGRAPH ONE~TEN
一二三四五六七八九十壹貳參肆伍陸柒捌玖拾零百千萬億兆弐貮贰㒃㭍漆什㐅陌阡佰仟万亿幺兩㠪亖卄卅卌廾廿 CJK UNIFIED IDEOGRAPH
參拾兩零六陸什 CJK COMPATIBILITY IDEOGRAPH
AEGEAN NUMBER ONE~NINE, TEN~NINETY
AEGEAN NUMBER ONE~NINE HUNDRED, ONE~NINE THOUSAND
AEGEAN NUMBER TEN~NINETY THOUSAND
GREEK ACROPHONIC ATTIC
COUNTING ROD UNIT DIGIT ONE~NINE
COUNTING ROD TENS DIGIT ONE~NINE
Its mostly about unicode classifications. Heres some examples to show discrepancies:
>>> def spam(s):
... for attr in isnumeric, isdecimal, isdigit:
... print(attr, getattr(s, attr)())
...
>>> spam(½)
isnumeric True
isdecimal False
isdigit False
>>> spam(³)
isnumeric True
isdecimal False
isdigit True
Specific behaviour is in the official docs here.
Script to find all of them:
import sys
import unicodedata
from collections import defaultdict
d = defaultdict(list)
for i in range(sys.maxunicode + 1):
s = chr(i)
t = s.isnumeric(), s.isdecimal(), s.isdigit()
if len(set(t)) == 2:
try:
name = unicodedata.name(s)
except ValueError:
name = fcodepoint{i}
print(s, name)
d[t].append(s)
string – Whats the difference between str.isdigit, isnumeric and isdecimal in python?
The Python documentation notes the difference between the three methods.
str.isdigit
Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.
str.isnumeric
Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.
str.isdecimal
Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those that can be used to form numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category “Nd”.
Like @Wim said, the main difference between the three methods is the way they handle specific unicode characters.