File:  [NetBSD Developer Wiki] / wikisrc / Attic / unicode.mdwn
Revision 1.1: download - view: text, annotated - select for diffs
Mon Nov 21 03:22:58 2011 UTC (8 years, 11 months ago) by mspo
Branches: MAIN
CVS tags: HEAD
finish importing the pages from my findings in the pkgsrc.se wiki

    1: How to use wide-range characters a.k.a. UTF-8 in NetBSD. 
    2: 
    3: [![Just to show off. That's how UTF-8 encoded spam will look like ;-)][3]][4]
    4: 
    5:    [3]: /images/200px-Unicoded-spam.png
    6:    [4]: /images/Unicoded-spam.png (Just to show off. That's how UTF-8 encoded spam will look like ;-))
    7: 
    8: [![][5]][6]
    9: 
   10:    [5]: /images/magnify-clip.png
   11:    [6]: /images/Unicoded-spam.png (Enlarge)
   12: 
   13: Just to show off. That's how UTF-8 encoded spam will look like ;-)
   14: 
   15: **Contents**
   16: 
   17: [[!toc levels=3]]
   18: 
   19: #  Introduction 
   20: 
   21: This is all about Unicode on NetBSD. 
   22: 
   23: #  Note on wscons 
   24: 
   25: wscons doesn't support UTF-8, you'll need **X11** and a proper **X terminal emulator** for this to be of any use, or you get character mash for lunch! Only the [ASCII][40] part of Unicode, namely the **first 128 characters, will work** in your wscons console, as they overlap in both UTF-8 and ISO-8859 character sets: 
   26:     
   27:        [40]: http://de.wikipedia.org/wiki/ASCII-Tabelle (http://de.wikipedia.org/wiki/ASCII-Tabelle)
   28: 
   29:    !"#$%&'()*+,-./0123456789:;<=>?     
   30:        @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ 
   31:        `abcdefghijklmnopqrstuvwxyz{|}~  
   32:     
   33: 
   34: #  Note on uwscons 
   35: 
   36: Unofficial patches for 3.0 release can be found here: [ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/][41]
   37: 
   38:    [41]: ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/ (ftp://tink.ims.ac.jp/pub/NetBSD/uwscons/)
   39: 
   40: #  pkgsrc 
   41: 
   42:   * To make packages that support it use the ncurses library with wide-characters, add to /etc/mk.conf 
   43:     
   44:       PKG_DEFAULT_OPTIONS+= ncursesw
   45:     
   46: 
   47: #  Soup up a shell 
   48: 
   49: ##  ksh 
   50: 
   51:   * Works. 
   52:     
   53:       chsh -s /bin/ksh
   54:     
   55: 
   56: ##  mksh 
   57: 
   58:   * This one is an OpenBSD based Korn shell, works pretty well compared to the pdksh. 
   59:     
   60:        cd /usr/pkgsrc/shells/mksh
   61:        make install clean
   62:        chsh -s /usr/pkg/bin/mksh
   63:     
   64: 
   65: ##  zsh 
   66: 
   67:   * Note: The stable version 4.2.x won't work. UTF-8 in the Z shell is enabled by default since 4.3.2. 
   68:     
   69:        cd /usr/pkgsrc/shells/zsh-current
   70:        make install clean
   71:        chsh -s /usr/pkg/bin/zsh
   72:     
   73: 
   74: ##  tcsh 
   75: 
   76:   * Works out of the box. 
   77:     
   78:        cd /usr/pkgsrc/shells/tcsh
   79:        make install clean
   80:        chsh -s /usr/pkg/bin/tcsh
   81:     
   82: 
   83: ##  bash 
   84: 
   85:   * Works out of the box. 
   86:     
   87:        cd /usr/pkgsrc/shells/bash
   88:        make install clean
   89:        chsh -s /usr/pkg/bin/bash
   90:     
   91: 
   92: ##  Shell environment 
   93: 
   94:   * Set the variables LANG and LC_CTYPE in your shell configuration file 
   95:     
   96:        export LANG="en_US.UTF-8"
   97:        export LC_CTYPE="en_US.UTF-8"
   98:        export LC_ALL=""
   99:     
  100: 
  101: or if you have a C-style shell 
  102:     
  103:        setenv LANG "en_US.UTF-8"
  104:        setenv LC_CTYPE "en_US.UTF-8"
  105:        setenv LC_ALL ""
  106:     
  107: 
  108: The other locale variables should be left untouched, which is "C" by default, to not confuse programs. Other locales than en_US probably won't work too well, since the fonts aren't in the base system yet, but you can install them and try your luck, of course. 
  109: 
  110: The result should look like 
  111:     
  112:        % locale
  113:        LANG="en_US.UTF-8"
  114:        LC_CTYPE="en_US.UTF-8"
  115:        LC_COLLATE="C"
  116:        LC_TIME="C"
  117:        LC_NUMERIC="C"
  118:        LC_MONETARY="C"
  119:        LC_MESSAGES="en_US.UTF-8"
  120:        LC_ALL=""
  121:     
  122: 
  123: #  X Terminal emulators 
  124: 
  125: ##  xterm 
  126: 
  127:   * Versions 239 and over work well with default "fixed" font 
  128:   * Also works with ttf DejaVu Mono font 
  129:   * Appears to have trouble with some other fonts such as Bitstream Vera Sans Mono despite this font being more complete than DejaVu 
  130: 
  131: ##  gnome-terminal 
  132: 
  133:   * Awesome and works great with the ttf Bitstream Vera Sans Mono or DejaVu Mono. 
  134:   * Somewhat bloated considering the dependencies. 
  135: 
  136: ##  urxvt 
  137: 
  138:   * recommended 
  139:     
  140:        cd /usr/pkgsrc/x11/rxvt-unicode
  141:        make install clean
  142:     
  143: 
  144: ##  uxterm 
  145: 
  146:   * Works, as the 'u' might suggest, but last time I checked it sucked. Anyone? 
  147: 
  148: ##  aterm 
  149: 
  150:   * Doesn't work and probably never will. 
  151: 
  152: ##  Eterm 
  153: 
  154:   * Doesn't work either. Last time I checked the author was too busy with real-life. 
  155: 
  156: #  Utilities 
  157: 
  158: ##  less 
  159: 
  160:   * Set the shell environment variable LESSCHARSET to "utf-8". 
  161: 
  162: ##  screen 
  163: 
  164:   * .screenrc 
  165:     
  166:        defutf8 on
  167:        encoding UTF-8
  168:     
  169: 
  170: ##  lynx 
  171: 
  172:   * .lynxrc 
  173:     
  174:        character_set=UNICODE (UTF-8)
  175:     
  176: 
  177: Or change "Display character set" in the options menu. 
  178: 
  179: ##  irssi 
  180:     
  181:        /set recode_autodetect_utf8 yes
  182:        /set recode_fallback iso-8859-1  (or whatever seems fit)
  183:        /set recode_out_default_charset UTF-8          
  184:        /set term_charset UTF-8           
  185:        /save        
  186:     
  187: 
  188: ##  silc-client 
  189:     
  190:        /set term_type utf-8
  191:        /save
  192:     
  193: 
  194: and restart. 
  195: 
  196: ##  vi 
  197: 
  198:   * NetBSD's vi is based on nvi. It doesn't support wide-range characters as of version 1.79nb16 from 10/23/96, which is the one in current 4.99.15 and all releases thereunder. 
  199: 
  200: ##  nvi 
  201: 
  202:   * pkgsrc' nvi (v1.81.5) is supposed to work with wide-range characters after some tweaks. 
  203: 
  204: (XXX) 
  205: 
  206: ##  vim 
  207: 
  208:   * .vimrc 
  209:     
  210:        set encoding=utf-8           
  211:        set fileencoding=utf-8
  212:     
  213: 
  214: ##  emacs 
  215: 
  216:   * .emacs 
  217:     
  218:        ; === Set character encoding ===
  219:        (setq locale-coding-system 'utf-8)
  220:        (set-terminal-coding-system 'utf-8)
  221:        (set-keyboard-coding-system 'utf-8)
  222:        (set-selection-coding-system 'utf-8)
  223:        (prefer-coding-system 'utf-8)
  224:     
  225: 
  226: This one gives you umlauts: 
  227:     
  228:        ; === Make ä, ö, ü, ß work ===
  229:        (set-language-environment 'german)
  230:     
  231: 
  232: ##  mutt 
  233: 
  234:   * mutt should work with all the above. If it doesn't, put in your .muttrc something like 
  235:     
  236:       set charset="utf-8:iso-8859-1"
  237:     
  238: 
  239: If you haven't set it in PKG_DEFAULT_OPTIONS already, you may also add to mk.conf 
  240:     
  241:       PKG_OPTIONS.mutt+= ncursesw
  242:     
  243: 
  244: #  Servers 
  245: 
  246: ##  Apache2 
  247: 
  248:   * /usr/pkg/etc/httpd/httpd.conf 
  249:     
  250:       AddDefaultCharset UTF-8
  251:     
  252: 
  253: #  Converting files 
  254: 
  255:   * If you have files containing non-ASCII ISO-8859 characters your system now will assume these are UTF-8 characters. They're not though, and the characters in these files will be misinterpreted which means that tools that use them will start breaking. Use iconv to convert these, which is part of the base system. 
  256:     
  257:        iconv -f iso8859-1 -t utf-8 file >file.new
  258:     
  259: 
  260: #  Filesystems 
  261: 
  262:   * Be careful with special characters in filenames, as they'll look weird when you try to access them from a non-unicode environment. 
  263: 

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb