Diff for /wikisrc/unicode.mdwn between versions 1.1 and 1.2

version 1.1, 2011/11/21 03:22:58 version 1.2, 2012/02/05 07:14:36
Line 1 Line 1
 How to use wide-range characters a.k.a. UTF-8 in NetBSD.  How to use wide-range characters a.k.a. UTF-8 in NetBSD. 
   
 [![Just to show off. That's how UTF-8 encoded spam will look like ;-)][3]][4]  [![Just to show off. That's how UTF-8 encoded spam will look like ;-)][3]][4]
   
    [3]: /images/200px-Unicoded-spam.png     [3]: /images/200px-Unicoded-spam.png
    [4]: /images/Unicoded-spam.png (Just to show off. That's how UTF-8 encoded spam will look like ;-))     [4]: /images/Unicoded-spam.png (Just to show off. That's how UTF-8 encoded spam will look like ;-))
   
 [![][5]][6]  [![][5]][6]
   
    [5]: /images/magnify-clip.png     [5]: /images/magnify-clip.png
    [6]: /images/Unicoded-spam.png (Enlarge)     [6]: /images/Unicoded-spam.png (Enlarge)
   
 Just to show off. That's how UTF-8 encoded spam will look like ;-)  Just to show off. That's how UTF-8 encoded spam will look like ;-)
   
 **Contents**  **Contents**
   
 [[!toc levels=3]]  [[!toc levels=3]]
   
 #  Introduction  #  Introduction 
   
 This is all about Unicode on NetBSD.  This is all about Unicode on NetBSD. 
   
 #  Note on wscons  #  Note on wscons 
   
 wscons doesn't support UTF-8, you'll need **X11** and a proper **X terminal emulator** for this to be of any use, or you get character mash for lunch! Only the [ASCII][40] part of Unicode, namely the **first 128 characters, will work** in your wscons console, as they overlap in both UTF-8 and ISO-8859 character sets:  wscons doesn't support UTF-8, you'll need **X11** and a proper **X terminal emulator** for this to be of any use, or you get character mash for lunch! Only the [ASCII][40] part of Unicode, namely the **first 128 characters, will work** in your wscons console, as they overlap in both UTF-8 and ISO-8859 character sets: 
          
        [40]: http://de.wikipedia.org/wiki/ASCII-Tabelle (http://de.wikipedia.org/wiki/ASCII-Tabelle)         [40]: http://de.wikipedia.org/wiki/ASCII-Tabelle (http://de.wikipedia.org/wiki/ASCII-Tabelle)
   
    !"#$%&'()*+,-./0123456789:;<=>?         !"#$%&'()*+,-./0123456789:;<=>?     
        @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_         @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ 
        `abcdefghijklmnopqrstuvwxyz{|}~           `abcdefghijklmnopqrstuvwxyz{|}~  
          
   
 #  Note on uwscons  #  Note on uwscons 
   
 Unofficial patches for 3.0 release can be found here: [ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/][41]  Unofficial patches for 3.0 release can be found here: [ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/][41]
   
    [41]: ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/ (ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/)     [41]: ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/ (ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/)
   
 #  pkgsrc  #  pkgsrc 
   
   * To make packages that support it use the ncurses library with wide-characters, add to /etc/mk.conf    * To make packages that support it use the ncurses library with wide-characters, add to /etc/mk.conf 
          
       PKG_DEFAULT_OPTIONS+= ncursesw        PKG_DEFAULT_OPTIONS+= ncursesw
          
   
 #  Soup up a shell  #  Soup up a shell 
   
 ##  ksh  ##  ksh 
   
   * Works.    * Works. 
          
       chsh -s /bin/ksh        chsh -s /bin/ksh
          
   
 ##  mksh  ##  mksh 
   
   * This one is an OpenBSD based Korn shell, works pretty well compared to the pdksh.    * This one is an OpenBSD based Korn shell, works pretty well compared to the pdksh. 
          
        cd /usr/pkgsrc/shells/mksh         cd /usr/pkgsrc/shells/mksh
        make install clean         make install clean
        chsh -s /usr/pkg/bin/mksh         chsh -s /usr/pkg/bin/mksh
          
   
 ##  zsh  ##  zsh 
   
   * Note: The stable version 4.2.x won't work. UTF-8 in the Z shell is enabled by default since 4.3.2.    * Note: The stable version 4.2.x won't work. UTF-8 in the Z shell is enabled by default since 4.3.2. 
          
        cd /usr/pkgsrc/shells/zsh-current         cd /usr/pkgsrc/shells/zsh-current
        make install clean         make install clean
        chsh -s /usr/pkg/bin/zsh         chsh -s /usr/pkg/bin/zsh
          
   
 ##  tcsh  ##  tcsh 
   
   * Works out of the box.    * Works out of the box. 
          
        cd /usr/pkgsrc/shells/tcsh         cd /usr/pkgsrc/shells/tcsh
        make install clean         make install clean
        chsh -s /usr/pkg/bin/tcsh         chsh -s /usr/pkg/bin/tcsh
          
   
 ##  bash  ##  bash 
   
   * Works out of the box.    * Works out of the box. 
          
        cd /usr/pkgsrc/shells/bash         cd /usr/pkgsrc/shells/bash
        make install clean         make install clean
        chsh -s /usr/pkg/bin/bash         chsh -s /usr/pkg/bin/bash
          
   
 ##  Shell environment  ##  Shell environment 
   
   * Set the variables LANG and LC_CTYPE in your shell configuration file    * Set the variables LANG and LC_CTYPE in your shell configuration file 
          
        export LANG="en_US.UTF-8"         export LANG="en_US.UTF-8"
        export LC_CTYPE="en_US.UTF-8"         export LC_CTYPE="en_US.UTF-8"
        export LC_ALL=""         export LC_ALL=""
          
   
 or if you have a C-style shell  or if you have a C-style shell 
          
        setenv LANG "en_US.UTF-8"         setenv LANG "en_US.UTF-8"
        setenv LC_CTYPE "en_US.UTF-8"         setenv LC_CTYPE "en_US.UTF-8"
        setenv LC_ALL ""         setenv LC_ALL ""
          
   
 The other locale variables should be left untouched, which is "C" by default, to not confuse programs. Other locales than en_US probably won't work too well, since the fonts aren't in the base system yet, but you can install them and try your luck, of course.  The other locale variables should be left untouched, which is "C" by default, to not confuse programs. Other locales than en_US probably won't work too well, since the fonts aren't in the base system yet, but you can install them and try your luck, of course. 
   
 The result should look like  The result should look like 
          
        % locale         % locale
        LANG="en_US.UTF-8"         LANG="en_US.UTF-8"
        LC_CTYPE="en_US.UTF-8"         LC_CTYPE="en_US.UTF-8"
        LC_COLLATE="C"         LC_COLLATE="C"
        LC_TIME="C"         LC_TIME="C"
        LC_NUMERIC="C"         LC_NUMERIC="C"
        LC_MONETARY="C"         LC_MONETARY="C"
        LC_MESSAGES="en_US.UTF-8"         LC_MESSAGES="en_US.UTF-8"
        LC_ALL=""         LC_ALL=""
          
   
 #  X Terminal emulators  #  X Terminal emulators 
   
 ##  xterm  ##  xterm 
   
   * Versions 239 and over work well with default "fixed" font    * Versions 239 and over work well with default "fixed" font 
   * Also works with ttf DejaVu Mono font    * Also works with ttf DejaVu Mono font 
   * Appears to have trouble with some other fonts such as Bitstream Vera Sans Mono despite this font being more complete than DejaVu    * Appears to have trouble with some other fonts such as Bitstream Vera Sans Mono despite this font being more complete than DejaVu 
   
 ##  gnome-terminal  ##  gnome-terminal 
   
   * Awesome and works great with the ttf Bitstream Vera Sans Mono or DejaVu Mono.    * Awesome and works great with the ttf Bitstream Vera Sans Mono or DejaVu Mono. 
   * Somewhat bloated considering the dependencies.    * Somewhat bloated considering the dependencies. 
   
 ##  urxvt  ##  urxvt 
   
   * recommended    * recommended 
          
        cd /usr/pkgsrc/x11/rxvt-unicode         cd /usr/pkgsrc/x11/rxvt-unicode
        make install clean         make install clean
          
   
 ##  uxterm  ##  uxterm 
   
   * Works, as the 'u' might suggest, but last time I checked it sucked. Anyone?    * Works, as the 'u' might suggest, but last time I checked it sucked. Anyone? 
   
 ##  aterm  ##  aterm 
   
   * Doesn't work and probably never will.    * Doesn't work and probably never will. 
   
 ##  Eterm  ##  Eterm 
   
   * Doesn't work either. Last time I checked the author was too busy with real-life.    * Doesn't work either. Last time I checked the author was too busy with real-life. 
   
 #  Utilities  #  Utilities 
   
 ##  less  ##  less 
   
   * Set the shell environment variable LESSCHARSET to "utf-8".    * Set the shell environment variable LESSCHARSET to "utf-8". 
   
 ##  screen  ##  screen 
   
   * .screenrc    * .screenrc 
          
        defutf8 on         defutf8 on
        encoding UTF-8         encoding UTF-8
          
   
 ##  lynx  ##  lynx 
   
   * .lynxrc    * .lynxrc 
          
        character_set=UNICODE (UTF-8)         character_set=UNICODE (UTF-8)
          
   
 Or change "Display character set" in the options menu.  Or change "Display character set" in the options menu. 
   
 ##  irssi  ##  irssi 
          
        /set recode_autodetect_utf8 yes         /set recode_autodetect_utf8 yes
        /set recode_fallback iso-8859-1  (or whatever seems fit)         /set recode_fallback iso-8859-1  (or whatever seems fit)
        /set recode_out_default_charset UTF-8                   /set recode_out_default_charset UTF-8          
        /set term_charset UTF-8                   /set term_charset UTF-8           
        /save                 /save        
          
   
 ##  silc-client  ##  silc-client 
          
        /set term_type utf-8         /set term_type utf-8
        /save         /save
          
   
 and restart.  and restart. 
   
 ##  vi  ##  vi 
   
   * NetBSD's vi is based on nvi. It doesn't support wide-range characters as of version 1.79nb16 from 10/23/96, which is the one in current 4.99.15 and all releases thereunder.    * NetBSD's vi is based on nvi. It doesn't support wide-range characters as of version 1.79nb16 from 10/23/96, which is the one in current 4.99.15 and all releases thereunder. 
   
 ##  nvi  ##  nvi 
   
   * pkgsrc' nvi (v1.81.5) is supposed to work with wide-range characters after some tweaks.    * pkgsrc' nvi (v1.81.5) is supposed to work with wide-range characters after some tweaks. 
   
 (XXX)  (XXX) 
   
 ##  vim  ##  vim 
   
   * .vimrc    * .vimrc 
          
        set encoding=utf-8                   set encoding=utf-8           
        set fileencoding=utf-8         set fileencoding=utf-8
          
   
 ##  emacs  ##  emacs 
   
   * .emacs    * .emacs 
          
        ; === Set character encoding ===         ; === Set character encoding ===
        (setq locale-coding-system 'utf-8)         (setq locale-coding-system 'utf-8)
        (set-terminal-coding-system 'utf-8)         (set-terminal-coding-system 'utf-8)
        (set-keyboard-coding-system 'utf-8)         (set-keyboard-coding-system 'utf-8)
        (set-selection-coding-system 'utf-8)         (set-selection-coding-system 'utf-8)
        (prefer-coding-system 'utf-8)         (prefer-coding-system 'utf-8)
          
   
 This one gives you umlauts:  This one gives you umlauts: 
          
        ; === Make ä, ö, ü, ß work ===         ; === Make ä, ö, ü, ß work ===
        (set-language-environment 'german)         (set-language-environment 'german)
          
   
 ##  mutt  ##  mutt 
   
   * mutt should work with all the above. If it doesn't, put in your .muttrc something like    * mutt should work with all the above. If it doesn't, put in your .muttrc something like 
          
       set charset="utf-8:iso-8859-1"        set charset="utf-8:iso-8859-1"
          
   
 If you haven't set it in PKG_DEFAULT_OPTIONS already, you may also add to mk.conf  If you haven't set it in PKG_DEFAULT_OPTIONS already, you may also add to mk.conf 
          
       PKG_OPTIONS.mutt+= ncursesw        PKG_OPTIONS.mutt+= ncursesw
          
   
 #  Servers  #  Servers 
   
 ##  Apache2  ##  Apache2 
   
   * /usr/pkg/etc/httpd/httpd.conf    * /usr/pkg/etc/httpd/httpd.conf 
          
       AddDefaultCharset UTF-8        AddDefaultCharset UTF-8
          
   
 #  Converting files  #  Converting files 
   
   * If you have files containing non-ASCII ISO-8859 characters your system now will assume these are UTF-8 characters. They're not though, and the characters in these files will be misinterpreted which means that tools that use them will start breaking. Use iconv to convert these, which is part of the base system.    * If you have files containing non-ASCII ISO-8859 characters your system now will assume these are UTF-8 characters. They're not though, and the characters in these files will be misinterpreted which means that tools that use them will start breaking. Use iconv to convert these, which is part of the base system. 
          
        iconv -f iso8859-1 -t utf-8 file >file.new         iconv -f iso8859-1 -t utf-8 file >file.new
          
   
 #  Filesystems  #  Filesystems 
   
   * Be careful with special characters in filenames, as they'll look weird when you try to access them from a non-unicode environment.    * Be careful with special characters in filenames, as they'll look weird when you try to access them from a non-unicode environment. 
   

Removed from v.1.1  
changed lines
  Added in v.1.2


CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb