utf8dump
Description
utf8dump
interpretes stdin as a UTF-8
stream and dumps the codepoint and optionally name of each
character to stdout.
For example, the title of Pike & Thompson's UTF-8 paper would be displayed as:
00048 H 00065 e 0006c l 0006c l 0006f o 00020 00057 W 0006f o 00072 r 0006c l 00064 d 0000a . 0006f o 00072 r 0000a . 0039a Κ 003b1 α 003bb λ 003b7 η 003bc μ 003ad έ 003c1 ρ 003b1 α 00020 003ba κ 003cc ό 003c3 σ 003bc μ 003b5 ε 0000a . 0006f o 00072 r 0000a . 03053 こ 03093 ん 0306b に 03061 ち 0306f は 00020 04e16 世 0754c 界 0000a .
or — when invoked with the -c
option — as:
00048 H LATIN CAPITAL LETTER H 00065 e LATIN SMALL LETTER E 0006c l LATIN SMALL LETTER L 0006c l LATIN SMALL LETTER L 0006f o LATIN SMALL LETTER O 00020 SPACE 00057 W LATIN CAPITAL LETTER W 0006f o LATIN SMALL LETTER O 00072 r LATIN SMALL LETTER R 0006c l LATIN SMALL LETTER L 00064 d LATIN SMALL LETTER D 0000a . LINE FEED 0006f o LATIN SMALL LETTER O 00072 r LATIN SMALL LETTER R 0000a . LINE FEED 0039a Κ GREEK CAPITAL LETTER KAPPA 003b1 α GREEK SMALL LETTER ALPHA 003bb λ GREEK SMALL LETTER LAMDA 003b7 η GREEK SMALL LETTER ETA 003bc μ GREEK SMALL LETTER MU 003ad έ GREEK SMALL LETTER EPSILON WITH TONOS 003c1 ρ GREEK SMALL LETTER RHO 003b1 α GREEK SMALL LETTER ALPHA 00020 SPACE 003ba κ GREEK SMALL LETTER KAPPA 003cc ό GREEK SMALL LETTER OMICRON WITH TONOS 003c3 σ GREEK SMALL LETTER SIGMA 003bc μ GREEK SMALL LETTER MU 003b5 ε GREEK SMALL LETTER EPSILON 0000a . LINE FEED 0006f o LATIN SMALL LETTER O 00072 r LATIN SMALL LETTER R 0000a . LINE FEED 03053 こ HIRAGANA LETTER KO 03093 ん HIRAGANA LETTER N 0306b に HIRAGANA LETTER NI 03061 ち HIRAGANA LETTER TI 0306f は HIRAGANA LETTER HA 00020 SPACE 04e16 世 CJK UNIFIED IDEOGRAPH-4E16 0754c 界 CJK UNIFIED IDEOGRAPH-754C 0000a . LINE FEED
Source-Code and History
2023-06-26 | utf8dump (Perl source code) | 561B | The current version — uses 5 digits for codepoints (I'm finally giving in to the existence of emojis). |
2007-10-30 | utf8dump (Perl source code) | 561B | The first version — used 4 digit codepoints. |
Binary distributions
None at the moment. It's just a single tiny perl script.
$Date$