Latest revision as of 20:00, 1 January 2012

Understanding

Clearly, every text file has an encoding, that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. Unicode defines every character in the world.

Here is some practice: Store a file containing

hellö world

in file.txt. Do:

tweedleburg:~ # cat >file.txt
hellö world
tweedleburg:~ # cat file.txt
hellö world
tweedleburg:~ # hexdump -C file.txt
00000000  68 65 6c 6c c3 b6 20 77  6f 72 6c 64 0a           |hell.. world.|
0000000d

This means, every "normal" character has been stored in 1 byte, every umlaut in 2 bytes. That is unicode's UTF-8 encoding.

Doing

Convert a file to UTF-8

convmv -f iso-8859-1 -t utf8 -r --notest <datei>
recode latin1..u8 <datei>

Unicode text editor:

yudit

Configuration

For php: /etc/php5/apache2/php.ini, key default_charset.

For squirrelmail: set default_charset in config.php and config_default.php to UTF8.

@@ Line 1: / Line 1: @@
---------------------------------------------------------------------------------
+= Understanding =
-*** Datei nach UTF-8 konvertieren ***
-convmv -f iso-8859-1 -t utf8 -r --notest <datei>
-recode latin1..u8 <datei>
-yudit
-Programming [[html2mediawiki]] showed some severe problems if you are using sites that contain umlauts like &auml; or &ouml;. So I [http://wiki.linuxquestions.org/wiki/UniCode deep-dived into unicode] programming and want you to be able to use my findings.
 Clearly, [http://www.joelonsoftware.com/articles/Unicode.html every text file has an encoding], that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. [http://en.wikipedia.org/wiki/Unicode Unicode] defines every character in the world.
@@ Line 20: / Line 12: @@
   00000000  68 65 6c 6c c3 b6 20 77  6f 72 6c 64 0a           |hell.. world.|
   0000000d
-This means, every "normal" character has been stored in 1 byte, every umlaut in 2 bytes. That is unicode's [http://en.wikipedia.org/wiki/UTF-8 UTF-8 encoding]
+This means, every "normal" character has been stored in 1 byte, every umlaut in 2 bytes. That is unicode's [http://en.wikipedia.org/wiki/UTF-8 UTF-8 encoding].
+= Doing =
+Convert a file to UTF-8
+ convmv -f iso-8859-1 -t utf8 -r --notest <datei>
+ recode latin1..u8 <datei>
+Unicode text editor:
+* yudit
+= Configuration =
+For [[php]]: /etc/php5/apache2/php.ini, key default_charset.
-To show what Qt understands when it reads UTF8, we store a file with the content
+For [[squirrelmail]]: set default_charset in config.php and config_default.php to UTF8.
- &uuml;
-nothing else. The following code outputs the code:
- QFile inputfile(args->url(0).fileName());
- inputfile.open(QIODevice::ReadOnly);
- inputfilecontent = inputfile.read(inputfile.bytesAvailable());
- kDebug() << "inputfilecontent.data()[0]"<<(byte)inputfilecontent.data()[0];
- kDebug() << "inputfilecontent.data()[1]"<<(byte)inputfilecontent.data()[1];
-For little endian systems, &uuml; UTF8 encoded delivers
+= See also =
+* [http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256&utf8=0x&unicodeinhtml=hex&htmlent=1 Unicode character table]
+* [http://www.joelonsoftware.com/articles/Unicode.html Joel on UniCode]
-http://www.joelonsoftware.com/articles/Unicode.html
+[[Category:Concept]]

Anonymous

Search

Difference between revisions of "Unicode"

Namespaces

More

Page actions

Latest revision as of 20:00, 1 January 2012

Contents

Understanding

Doing

Configuration

See also

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Difference between revisions of "Unicode"

Latest revision as of 20:00, 1 January 2012

Contents

Understanding

Doing

Configuration

See also

Navigation

Wiki tools

Page tools

Categories