File:  [NetBSD Developer Wiki] / wikisrc / Attic / unicode.mdwn
Revision 1.6: download - view: text, annotated - select for diffs
Sat Apr 27 14:21:51 2019 UTC (18 months ago) by sevan
Branches: MAIN
CVS tags: HEAD
Igone note on old version of zsh, add more markup

    1: How to use wide-range characters a.k.a. UTF-8 in NetBSD. 
    2: 
    3: **Contents**
    4: 
    5: [[!toc levels=3]]
    6: 
    7: #  Introduction 
    8: 
    9: This is all about Unicode on NetBSD. 
   10: 
   11: #  Note on wscons 
   12: 
   13: wscons doesn't support UTF-8, you'll need **X11** and a proper **X terminal emulator** for this to be of any use, or you get character mash for lunch! Only the [ASCII](http://de.wikipedia.org/wiki/ASCII-Tabelle) part of Unicode, namely the **first 128 characters, will work** in your wscons console, as they overlap in both UTF-8 and ISO-8859 character sets: 
   14:     
   15:      !"#$%&'()*+,-./0123456789:;<=>?     
   16:          @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ 
   17:          `abcdefghijklmnopqrstuvwxyz{|}~  
   18:     
   19: 
   20: #  pkgsrc 
   21: 
   22: To make packages that support it use the ncurses library with wide-characters, add to `/etc/mk.conf`
   23:     
   24:       PKG_DEFAULT_OPTIONS+= ncursesw
   25:     
   26: 
   27: #  Soup up a shell 
   28: 
   29: ##  ksh 
   30: 
   31: Works. 
   32:     
   33:       chsh -s /bin/ksh
   34:     
   35: 
   36: ##  mksh 
   37: 
   38: This one is an OpenBSD based Korn shell, works pretty well compared to the pdksh. 
   39:     
   40:        cd /usr/pkgsrc/shells/mksh
   41:        make install clean
   42:        chsh -s /usr/pkg/bin/mksh
   43:     
   44: 
   45: ##  zsh 
   46: 
   47: UTF-8 in the Z shell is enabled by default since 4.3.2. 
   48:     
   49:        cd /usr/pkgsrc/shells/zsh
   50:        make install clean
   51:        chsh -s /usr/pkg/bin/zsh
   52:     
   53: 
   54: ##  tcsh 
   55: 
   56: Works out of the box. 
   57:     
   58:        cd /usr/pkgsrc/shells/tcsh
   59:        make install clean
   60:        chsh -s /usr/pkg/bin/tcsh
   61:     
   62: 
   63: ##  bash 
   64: 
   65: Works out of the box. 
   66:     
   67:        cd /usr/pkgsrc/shells/bash
   68:        make install clean
   69:        chsh -s /usr/pkg/bin/bash
   70:     
   71: 
   72: ##  Shell environment 
   73: 
   74: Set the variables `LANG` and `LC_CTYPE` in your shell configuration file 
   75:     
   76:        export LANG="en_US.UTF-8"
   77:        export LC_CTYPE="en_US.UTF-8"
   78:        export LC_ALL=""
   79:     
   80: 
   81: or if you have a C-style shell 
   82:     
   83:        setenv LANG "en_US.UTF-8"
   84:        setenv LC_CTYPE "en_US.UTF-8"
   85:        setenv LC_ALL ""
   86:     
   87: 
   88: The other locale variables should be left untouched, which is "`C`" by default, to not confuse programs. Locales other than `en_US` probably won't work too well, since the fonts aren't in the base system yet, but you can install them and try your luck, of course. 
   89: 
   90: The result should look like 
   91:     
   92:        % locale
   93:        LANG="en_US.UTF-8"
   94:        LC_CTYPE="en_US.UTF-8"
   95:        LC_COLLATE="C"
   96:        LC_TIME="C"
   97:        LC_NUMERIC="C"
   98:        LC_MONETARY="C"
   99:        LC_MESSAGES="en_US.UTF-8"
  100:        LC_ALL=""
  101:     
  102: 
  103: #  X Terminal emulators 
  104: 
  105: ##  xterm 
  106: 
  107:   * Versions 239 and over work well with default "fixed" font 
  108:   * Also works with ttf DejaVu Mono font 
  109:   * Appears to have trouble with some other fonts such as Bitstream Vera Sans Mono despite this font being more complete than DejaVu 
  110: 
  111: ##  gnome-terminal 
  112: 
  113:   * Awesome and works great with the ttf Bitstream Vera Sans Mono or DejaVu Mono. 
  114:   * Somewhat bloated considering the dependencies. 
  115: 
  116: ##  urxvt 
  117: 
  118: recommended 
  119:     
  120:        cd /usr/pkgsrc/x11/rxvt-unicode
  121:        make install clean
  122:     
  123: 
  124: ##  uxterm 
  125: 
  126:   * Works, as the 'u' might suggest, but last time I checked it sucked. Anyone? 
  127: 
  128: ##  aterm 
  129: 
  130:   * Doesn't work and probably never will. 
  131: 
  132: ##  Eterm 
  133: 
  134:   * Doesn't work either. Last time I checked the author was too busy with real-life. 
  135: 
  136: #  Utilities 
  137: 
  138: ##  less 
  139: 
  140:   * Set the shell environment variable `LESSCHARSET` to "`utf-8`". 
  141: 
  142: ##  screen 
  143: 
  144: `.screenrc` 
  145:     
  146:        defutf8 on
  147:        encoding UTF-8
  148:     
  149: 
  150: ##  lynx 
  151: 
  152: `.lynxrc`
  153:     
  154:        character_set=UNICODE (UTF-8)
  155:     
  156: 
  157: Or change "Display character set" in the options menu. 
  158: 
  159: ##  irssi 
  160:     
  161:        /set recode_autodetect_utf8 yes
  162:        /set recode_fallback iso-8859-1  (or whatever seems fit)
  163:        /set recode_out_default_charset UTF-8          
  164:        /set term_charset UTF-8           
  165:        /save        
  166:     
  167: 
  168: ##  silc-client 
  169:     
  170:        /set term_type utf-8
  171:        /save
  172:     
  173: 
  174: and restart. 
  175: 
  176: ##  vi 
  177: 
  178:   * NetBSD's vi is based on nvi. It doesn't support wide-range characters as of version 1.79nb16 from 10/23/96, which is the one in current 4.99.15 and all releases thereunder. 
  179: 
  180: ##  nvi 
  181: 
  182:   * pkgsrc' nvi (v1.81.5) is supposed to work with wide-range characters after some tweaks. 
  183: 
  184: (XXX) 
  185: 
  186: ##  vim 
  187: 
  188: `.vimrc`
  189:     
  190:        set encoding=utf-8           
  191:        set fileencoding=utf-8
  192:     
  193: 
  194: ##  emacs 
  195: 
  196: `.emacs`
  197:     
  198:        ; === Set character encoding ===
  199:        (setq locale-coding-system 'utf-8)
  200:        (set-terminal-coding-system 'utf-8)
  201:        (set-keyboard-coding-system 'utf-8)
  202:        (set-selection-coding-system 'utf-8)
  203:        (prefer-coding-system 'utf-8)
  204:     
  205: 
  206: This one gives you umlauts: 
  207:     
  208:        ; === Make ä, ö, ü, ß work ===
  209:        (set-language-environment 'german)
  210:     
  211: 
  212: ##  mutt 
  213: 
  214: mutt should work with all the above. If it doesn't, put in your .muttrc something like 
  215:     
  216:       set charset="utf-8:iso-8859-1"
  217:     
  218: 
  219: If you haven't set it in PKG_DEFAULT_OPTIONS already, you may also add to mk.conf 
  220:     
  221:       PKG_OPTIONS.mutt+= ncursesw
  222:     
  223: 
  224: #  Servers 
  225: 
  226: ##  Apache2 
  227: 
  228: `/usr/pkg/etc/httpd/httpd.conf`
  229:     
  230:       AddDefaultCharset UTF-8
  231:     
  232: 
  233: #  Converting files 
  234: 
  235: If you have files containing non-ASCII ISO-8859 characters your system now will assume these are UTF-8 characters. They're not though, and the characters in these files will be misinterpreted which means that tools that use them will start breaking. Use iconv to convert these, which is part of the base system. 
  236:     
  237:        iconv -f iso8859-1 -t utf-8 file >file.new
  238:     
  239: 
  240: #  Filesystems 
  241: 
  242:   * Be careful with special characters in filenames, as they'll look weird when you try to access them from a non-unicode environment. 
  243: 

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb