How to use wide-range characters a.k.a. UTF-8 in NetBSD.

Contents

  1. Introduction
  2. Note on wscons
  3. pkgsrc
  4. Soup up a shell
    1. ksh
    2. mksh
    3. zsh
    4. tcsh
    5. bash
    6. Shell environment
  5. X Terminal emulators
    1. xterm
    2. gnome-terminal
    3. urxvt
    4. uxterm
    5. aterm
    6. Eterm
  6. Utilities
    1. less
    2. screen
    3. lynx
    4. irssi
    5. silc-client
    6. vi
    7. nvi
    8. vim
    9. emacs
    10. mutt
  7. Servers
    1. Apache2
  8. Converting files
  9. Filesystems

Introduction

This is all about Unicode on NetBSD.

Note on wscons

wscons doesn't support UTF-8, you'll need X11 and a proper X terminal emulator for this to be of any use, or you get character mash for lunch! Only the ASCII part of Unicode, namely the first 128 characters, will work in your wscons console, as they overlap in both UTF-8 and ISO-8859 character sets:

 !"#$%&'()*+,-./0123456789:;<=>?     
     @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ 
     `abcdefghijklmnopqrstuvwxyz{|}~  

pkgsrc

To make packages that support it use the ncurses library with wide-characters, add to /etc/mk.conf

  PKG_DEFAULT_OPTIONS+= ncursesw

Soup up a shell

ksh

Works.

  chsh -s /bin/ksh

mksh

This one is an OpenBSD based Korn shell, works pretty well compared to the pdksh.

   cd /usr/pkgsrc/shells/mksh
   make install clean
   chsh -s /usr/pkg/bin/mksh

zsh

UTF-8 in the Z shell is enabled by default since 4.3.2.

   cd /usr/pkgsrc/shells/zsh
   make install clean
   chsh -s /usr/pkg/bin/zsh

tcsh

Works out of the box.

   cd /usr/pkgsrc/shells/tcsh
   make install clean
   chsh -s /usr/pkg/bin/tcsh

bash

Works out of the box.

   cd /usr/pkgsrc/shells/bash
   make install clean
   chsh -s /usr/pkg/bin/bash

Shell environment

Set the variables LANG and LC_CTYPE in your shell configuration file

   export LANG="en_US.UTF-8"
   export LC_CTYPE="en_US.UTF-8"
   export LC_ALL=""

or if you have a C-style shell

   setenv LANG "en_US.UTF-8"
   setenv LC_CTYPE "en_US.UTF-8"
   setenv LC_ALL ""

The other locale variables should be left untouched, which is "C" by default, to not confuse programs. Locales other than en_US probably won't work too well, since the fonts aren't in the base system yet, but you can install them and try your luck, of course.

The result should look like

   % locale
   LANG="en_US.UTF-8"
   LC_CTYPE="en_US.UTF-8"
   LC_COLLATE="C"
   LC_TIME="C"
   LC_NUMERIC="C"
   LC_MONETARY="C"
   LC_MESSAGES="en_US.UTF-8"
   LC_ALL=""

X Terminal emulators

xterm

gnome-terminal

urxvt

recommended

   cd /usr/pkgsrc/x11/rxvt-unicode
   make install clean

uxterm

aterm

Eterm

Utilities

less

screen

.screenrc

   defutf8 on
   encoding UTF-8

lynx

.lynxrc

   character_set=UNICODE (UTF-8)

Or change "Display character set" in the options menu.

irssi

   /set recode_autodetect_utf8 yes
   /set recode_fallback iso-8859-1  (or whatever seems fit)
   /set recode_out_default_charset UTF-8          
   /set term_charset UTF-8           
   /save        

silc-client

   /set term_type utf-8
   /save

and restart.

vi

nvi

(XXX)

vim

.vimrc

   set encoding=utf-8           
   set fileencoding=utf-8

emacs

.emacs

   ; === Set character encoding ===
   (setq locale-coding-system 'utf-8)
   (set-terminal-coding-system 'utf-8)
   (set-keyboard-coding-system 'utf-8)
   (set-selection-coding-system 'utf-8)
   (prefer-coding-system 'utf-8)

This one gives you umlauts:

   ; === Make ä, ö, ü, ß work ===
   (set-language-environment 'german)

mutt

mutt should work with all the above. If it doesn't, put in your .muttrc something like

  set charset="utf-8:iso-8859-1"

If you haven't set it in PKG_DEFAULT_OPTIONS already, you may also add to mk.conf

  PKG_OPTIONS.mutt+= ncursesw

Servers

Apache2

/usr/pkg/etc/httpd/httpd.conf

  AddDefaultCharset UTF-8

Converting files

If you have files containing non-ASCII ISO-8859 characters your system now will assume these are UTF-8 characters. They're not though, and the characters in these files will be misinterpreted which means that tools that use them will start breaking. Use iconv to convert these, which is part of the base system.

   iconv -f iso8859-1 -t utf-8 file >file.new

Filesystems

Add a comment