Difference between revisions of "Unicode"

From Linuxintro
imported>ThorstenStaerk
imported>ThorstenStaerk
Line 21: Line 21:
 
Unicode text editor:
 
Unicode text editor:
 
* yudit
 
* yudit
 +
 +
= Configuration =
 +
For php: /etc/php5/apache2/php.ini, key default_charset.

Revision as of 08:59, 22 December 2009

Understanding

Clearly, every text file has an encoding, that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. Unicode defines every character in the world.

Here is some practice: Store a file containing

hellö world

in file.txt. Do:

tweedleburg:~ # cat >file.txt
hellö world
tweedleburg:~ # cat file.txt
hellö world
tweedleburg:~ # hexdump -C file.txt
00000000  68 65 6c 6c c3 b6 20 77  6f 72 6c 64 0a           |hell.. world.|
0000000d

This means, every "normal" character has been stored in 1 byte, every umlaut in 2 bytes. That is unicode's UTF-8 encoding.

Doing

Convert a file to UTF-8

convmv -f iso-8859-1 -t utf8 -r --notest <datei>
recode latin1..u8 <datei>

Unicode text editor:

  • yudit

Configuration

For php: /etc/php5/apache2/php.ini, key default_charset.