20 April 2014

As I'm using emacs under Windows 7, I want to correctly copy and paste Chinese characters from or to emacs. The important settings are language environment variables. Following code snippet is used to set the default language environment in English and make use of utf-8 coding system as much as possible, while leaving the clipboard coding system as utf-16le, which is the coding system used in Windows system core:

Windows作業系統核心中的字元表示為UTF-16小尾序,可以正確處理、顯示以4位元組儲存的字元。1

(set-locale-environment "English")
(set-language-environment 'English)
(prefer-coding-system 'utf-8)
(set-file-name-coding-system 'gbk)
(set-buffer-file-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(set-clipboard-coding-system 'utf-16le)
(set-w32-system-coding-system 'utf-8)

One benefit of using utf-16e while not chinese-gbk or chinese-gb18030 for clipboard coding system is that the special characters from French and German are correctly copy/paste from/to emacs. While with chinese-gbk/chinese-gb18030, only Chinese characters are encoded correct, characters such as ß, ê are not displayed as expected when copy from web browser.

Also please pay attention that the file name coding system is using gbk which is used to display file names which Chinese characters. Windows weirdly use gbk for file name coding other than utf-16le. It is a painful experience to realize this issue.