Annotation of wikisrc/unicode.mdwn, revision 1.6

1.2       schmonz     1: How to use wide-range characters a.k.a. UTF-8 in NetBSD. 
                      2: 
                      3: **Contents**
                      4: 
                      5: [[!toc levels=3]]
                      6: 
                      7: #  Introduction 
                      8: 
                      9: This is all about Unicode on NetBSD. 
                     10: 
                     11: #  Note on wscons 
                     12: 
1.4       sevan      13: wscons doesn't support UTF-8, you'll need **X11** and a proper **X terminal emulator** for this to be of any use, or you get character mash for lunch! Only the [ASCII](http://de.wikipedia.org/wiki/ASCII-Tabelle) part of Unicode, namely the **first 128 characters, will work** in your wscons console, as they overlap in both UTF-8 and ISO-8859 character sets: 
1.2       schmonz    14:     
1.4       sevan      15:      !"#$%&'()*+,-./0123456789:;<=>?     
                     16:          @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ 
                     17:          `abcdefghijklmnopqrstuvwxyz{|}~  
1.2       schmonz    18:     
                     19: 
                     20: #  pkgsrc 
                     21: 
1.6     ! sevan      22: To make packages that support it use the ncurses library with wide-characters, add to `/etc/mk.conf`
1.2       schmonz    23:     
                     24:       PKG_DEFAULT_OPTIONS+= ncursesw
                     25:     
                     26: 
                     27: #  Soup up a shell 
                     28: 
                     29: ##  ksh 
                     30: 
1.4       sevan      31: Works. 
1.2       schmonz    32:     
                     33:       chsh -s /bin/ksh
                     34:     
                     35: 
                     36: ##  mksh 
                     37: 
1.4       sevan      38: This one is an OpenBSD based Korn shell, works pretty well compared to the pdksh. 
1.2       schmonz    39:     
                     40:        cd /usr/pkgsrc/shells/mksh
                     41:        make install clean
                     42:        chsh -s /usr/pkg/bin/mksh
                     43:     
                     44: 
                     45: ##  zsh 
                     46: 
1.6     ! sevan      47: UTF-8 in the Z shell is enabled by default since 4.3.2. 
1.2       schmonz    48:     
1.3       snj        49:        cd /usr/pkgsrc/shells/zsh
1.2       schmonz    50:        make install clean
                     51:        chsh -s /usr/pkg/bin/zsh
                     52:     
                     53: 
                     54: ##  tcsh 
                     55: 
1.4       sevan      56: Works out of the box. 
1.2       schmonz    57:     
                     58:        cd /usr/pkgsrc/shells/tcsh
                     59:        make install clean
                     60:        chsh -s /usr/pkg/bin/tcsh
                     61:     
                     62: 
                     63: ##  bash 
                     64: 
1.4       sevan      65: Works out of the box. 
1.2       schmonz    66:     
                     67:        cd /usr/pkgsrc/shells/bash
                     68:        make install clean
                     69:        chsh -s /usr/pkg/bin/bash
                     70:     
                     71: 
                     72: ##  Shell environment 
                     73: 
1.6     ! sevan      74: Set the variables `LANG` and `LC_CTYPE` in your shell configuration file 
1.2       schmonz    75:     
                     76:        export LANG="en_US.UTF-8"
                     77:        export LC_CTYPE="en_US.UTF-8"
                     78:        export LC_ALL=""
                     79:     
                     80: 
                     81: or if you have a C-style shell 
                     82:     
                     83:        setenv LANG "en_US.UTF-8"
                     84:        setenv LC_CTYPE "en_US.UTF-8"
                     85:        setenv LC_ALL ""
                     86:     
                     87: 
1.6     ! sevan      88: The other locale variables should be left untouched, which is "`C`" by default, to not confuse programs. Locales other than `en_US` probably won't work too well, since the fonts aren't in the base system yet, but you can install them and try your luck, of course. 
1.2       schmonz    89: 
                     90: The result should look like 
                     91:     
                     92:        % locale
                     93:        LANG="en_US.UTF-8"
                     94:        LC_CTYPE="en_US.UTF-8"
                     95:        LC_COLLATE="C"
                     96:        LC_TIME="C"
                     97:        LC_NUMERIC="C"
                     98:        LC_MONETARY="C"
                     99:        LC_MESSAGES="en_US.UTF-8"
                    100:        LC_ALL=""
                    101:     
                    102: 
                    103: #  X Terminal emulators 
                    104: 
                    105: ##  xterm 
                    106: 
                    107:   * Versions 239 and over work well with default "fixed" font 
                    108:   * Also works with ttf DejaVu Mono font 
                    109:   * Appears to have trouble with some other fonts such as Bitstream Vera Sans Mono despite this font being more complete than DejaVu 
                    110: 
                    111: ##  gnome-terminal 
                    112: 
                    113:   * Awesome and works great with the ttf Bitstream Vera Sans Mono or DejaVu Mono. 
                    114:   * Somewhat bloated considering the dependencies. 
                    115: 
                    116: ##  urxvt 
                    117: 
1.4       sevan     118: recommended 
1.2       schmonz   119:     
                    120:        cd /usr/pkgsrc/x11/rxvt-unicode
                    121:        make install clean
                    122:     
                    123: 
                    124: ##  uxterm 
                    125: 
                    126:   * Works, as the 'u' might suggest, but last time I checked it sucked. Anyone? 
                    127: 
                    128: ##  aterm 
                    129: 
                    130:   * Doesn't work and probably never will. 
                    131: 
                    132: ##  Eterm 
                    133: 
                    134:   * Doesn't work either. Last time I checked the author was too busy with real-life. 
                    135: 
                    136: #  Utilities 
                    137: 
                    138: ##  less 
                    139: 
1.5       sevan     140:   * Set the shell environment variable `LESSCHARSET` to "`utf-8`". 
1.2       schmonz   141: 
                    142: ##  screen 
                    143: 
1.4       sevan     144: `.screenrc` 
1.2       schmonz   145:     
                    146:        defutf8 on
                    147:        encoding UTF-8
                    148:     
                    149: 
                    150: ##  lynx 
                    151: 
1.4       sevan     152: `.lynxrc`
1.2       schmonz   153:     
                    154:        character_set=UNICODE (UTF-8)
                    155:     
                    156: 
                    157: Or change "Display character set" in the options menu. 
                    158: 
                    159: ##  irssi 
                    160:     
                    161:        /set recode_autodetect_utf8 yes
                    162:        /set recode_fallback iso-8859-1  (or whatever seems fit)
                    163:        /set recode_out_default_charset UTF-8          
                    164:        /set term_charset UTF-8           
                    165:        /save        
                    166:     
                    167: 
                    168: ##  silc-client 
                    169:     
                    170:        /set term_type utf-8
                    171:        /save
                    172:     
                    173: 
                    174: and restart. 
                    175: 
                    176: ##  vi 
                    177: 
                    178:   * NetBSD's vi is based on nvi. It doesn't support wide-range characters as of version 1.79nb16 from 10/23/96, which is the one in current 4.99.15 and all releases thereunder. 
                    179: 
                    180: ##  nvi 
                    181: 
                    182:   * pkgsrc' nvi (v1.81.5) is supposed to work with wide-range characters after some tweaks. 
                    183: 
                    184: (XXX) 
                    185: 
                    186: ##  vim 
                    187: 
1.4       sevan     188: `.vimrc`
1.2       schmonz   189:     
                    190:        set encoding=utf-8           
                    191:        set fileencoding=utf-8
                    192:     
                    193: 
                    194: ##  emacs 
                    195: 
1.4       sevan     196: `.emacs`
1.2       schmonz   197:     
                    198:        ; === Set character encoding ===
                    199:        (setq locale-coding-system 'utf-8)
                    200:        (set-terminal-coding-system 'utf-8)
                    201:        (set-keyboard-coding-system 'utf-8)
                    202:        (set-selection-coding-system 'utf-8)
                    203:        (prefer-coding-system 'utf-8)
                    204:     
                    205: 
                    206: This one gives you umlauts: 
                    207:     
                    208:        ; === Make ä, ö, ü, ß work ===
                    209:        (set-language-environment 'german)
                    210:     
                    211: 
                    212: ##  mutt 
                    213: 
1.4       sevan     214: mutt should work with all the above. If it doesn't, put in your .muttrc something like 
1.2       schmonz   215:     
                    216:       set charset="utf-8:iso-8859-1"
                    217:     
                    218: 
                    219: If you haven't set it in PKG_DEFAULT_OPTIONS already, you may also add to mk.conf 
                    220:     
                    221:       PKG_OPTIONS.mutt+= ncursesw
                    222:     
                    223: 
                    224: #  Servers 
                    225: 
                    226: ##  Apache2 
                    227: 
1.4       sevan     228: `/usr/pkg/etc/httpd/httpd.conf`
1.2       schmonz   229:     
                    230:       AddDefaultCharset UTF-8
                    231:     
                    232: 
                    233: #  Converting files 
                    234: 
1.4       sevan     235: If you have files containing non-ASCII ISO-8859 characters your system now will assume these are UTF-8 characters. They're not though, and the characters in these files will be misinterpreted which means that tools that use them will start breaking. Use iconv to convert these, which is part of the base system. 
1.2       schmonz   236:     
                    237:        iconv -f iso8859-1 -t utf-8 file >file.new
                    238:     
                    239: 
                    240: #  Filesystems 
                    241: 
                    242:   * Be careful with special characters in filenames, as they'll look weird when you try to access them from a non-unicode environment. 
                    243: 

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb