How to use wide-range characters a.k.a. UTF-8 in NetBSD.
Contents
Introduction
This is all about Unicode on NetBSD.
Note on wscons
wscons doesn't support UTF-8, you'll need X11 and a proper X terminal emulator for this to be of any use, or you get character mash for lunch! Only the ASCII part of Unicode, namely the first 128 characters, will work in your wscons console, as they overlap in both UTF-8 and ISO-8859 character sets:
!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~
pkgsrc
To make packages that support it use the ncurses library with wide-characters, add to /etc/mk.conf
PKG_DEFAULT_OPTIONS+= ncursesw
Soup up a shell
ksh
Works.
chsh -s /bin/ksh
mksh
This one is an OpenBSD based Korn shell, works pretty well compared to the pdksh.
cd /usr/pkgsrc/shells/mksh
make install clean
chsh -s /usr/pkg/bin/mksh
zsh
UTF-8 in the Z shell is enabled by default since 4.3.2.
cd /usr/pkgsrc/shells/zsh
make install clean
chsh -s /usr/pkg/bin/zsh
tcsh
Works out of the box.
cd /usr/pkgsrc/shells/tcsh
make install clean
chsh -s /usr/pkg/bin/tcsh
bash
Works out of the box.
cd /usr/pkgsrc/shells/bash
make install clean
chsh -s /usr/pkg/bin/bash
Shell environment
Set the variables LANG
in your shell configuration file :
export LANG="en_US.UTF-8"
or if you have a C-style shell
setenv LANG "en_US.UTF-8"
The result should look like
% locale
LANG="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="C"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=""
X Terminal emulators
xterm
- Versions 239 and over work well with default "fixed" font
- Also works with ttf DejaVu Mono font
- Appears to have trouble with some other fonts such as Bitstream Vera Sans Mono despite this font being more complete than DejaVu
gnome-terminal
- Awesome and works great with the ttf Bitstream Vera Sans Mono or DejaVu Mono.
- Somewhat bloated considering the dependencies.
urxvt
recommended
cd /usr/pkgsrc/x11/rxvt-unicode
make install clean
uxterm
- Works, as the 'u' might suggest, but last time I checked it sucked. Anyone?
aterm
- Doesn't work and probably never will.
Eterm
- Doesn't work either. Last time I checked the author was too busy with real-life.
Utilities
less
- Set the shell environment variable
LESSCHARSET
to "utf-8
".
screen
.screenrc
defutf8 on
encoding UTF-8
lynx
.lynxrc
character_set=UNICODE (UTF-8)
Or change "Display character set" in the options menu.
irssi
/set recode_autodetect_utf8 yes
/set recode_fallback iso-8859-1 (or whatever seems fit)
/set recode_out_default_charset UTF-8
/set term_charset UTF-8
/save
silc-client
/set term_type utf-8
/save
and restart.
vi
- NetBSD's vi is based on nvi. It doesn't support wide-range characters as of version 1.79nb16 from 10/23/96, which is the one in current 4.99.15 and all releases thereunder.
nvi
pkgsrc' nvi (v1.81.6) works with wide-range characters if built with wide-curses
option,
e.g. by adding to mk.conf:
PKG_OPTIONS.nvi+= wide-curses
vim
.vimrc
set encoding=utf-8
set fileencoding=utf-8
emacs
.emacs
; === Set character encoding ===
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)
This one gives you umlauts:
; === Make ä, ö, ü, ß work ===
(set-language-environment 'german)
mutt
mutt should work with all the above. If it doesn't, put in your .muttrc something like
set charset="utf-8:iso-8859-1"
If you haven't set it in PKG_DEFAULT_OPTIONS already, you may also add to mk.conf
PKG_OPTIONS.mutt+= ncursesw
Servers
Apache2
/usr/pkg/etc/httpd/httpd.conf
AddDefaultCharset UTF-8
Converting files
If you have files containing non-ASCII ISO-8859 characters your system now will assume these are UTF-8 characters. They're not though, and the characters in these files will be misinterpreted which means that tools that use them will start breaking. Use iconv to convert these, which is part of the base system.
iconv -f iso8859-1 -t utf-8 file >file.new
Filesystems
- Be careful with special characters in filenames, as they'll look weird when you try to access them from a non-unicode environment.
- Remove comment