MantisBT - VCMI
View Issue Details
0001420VCMIOtherpublic2013-08-26 02:492023-04-11 09:21
acme_pjz 
Ivan 
normalfeatureN/A
resolvedfixed 
 
 
0001420: Read data file and map file with other text encodings
Is it possible to read data file and map file with other configurable text encodings? In current version the font files can be configured already. But it still doesn't work on Heroes3 Simplified Chinese edition, because the text encoding is CP936. If the text encoding is configurable, then VCMI will support more languages of Heroes3.
No tags attached.
7z H3Bitmap.7z (219,570) 2013-09-03 06:11
https://bugs.vcmi.eu/file_download.php?file_id=1432&type=bug
7z HZK.7z (427,903) 2013-09-03 06:11
https://bugs.vcmi.eu/file_download.php?file_id=1433&type=bug
png vcmi-chinese.png (693,206) 2013-09-08 15:06
https://bugs.vcmi.eu/file_download.php?file_id=1465&type=bug
png

jpg 1.jpg (54,280) 2013-09-11 05:30
https://bugs.vcmi.eu/file_download.php?file_id=1479&type=bug
jpg

jpg 1-1.jpg (74,040) 2013-10-03 04:57
https://bugs.vcmi.eu/file_download.php?file_id=1532&type=bug
jpg

jpg 1-2.jpg (65,890) 2013-10-03 04:58
https://bugs.vcmi.eu/file_download.php?file_id=1533&type=bug
jpg
Issue History
2013-08-26 02:49acme_pjzNew Issue
2013-08-26 09:33IvanNote Added: 0003895
2013-08-26 10:06IvanAssigned To => Ivan
2013-08-26 10:06IvanStatusnew => feedback
2013-09-03 06:11acme_pjzFile Added: H3Bitmap.7z
2013-09-03 06:11acme_pjzFile Added: HZK.7z
2013-09-03 06:15acme_pjzNote Added: 0003921
2013-09-03 06:15acme_pjzStatusfeedback => assigned
2013-09-03 10:18IvanNote Added: 0003922
2013-09-06 16:00acme_pjzNote Added: 0003935
2013-09-06 16:28IvanNote Added: 0003936
2013-09-08 15:06IvanFile Added: vcmi-chinese.png
2013-09-08 15:08IvanNote Added: 0003961
2013-09-08 16:16TowNote Added: 0003967
2013-09-09 05:18acme_pjzNote Added: 0003982
2013-09-09 11:15IvanNote Added: 0003986
2013-09-11 05:30acme_pjzFile Added: 1.jpg
2013-09-11 05:31acme_pjzNote Added: 0003995
2013-10-02 05:38acme_pjzNote Added: 0004059
2013-10-02 16:27IvanNote Added: 0004060
2013-10-03 04:57acme_pjzFile Added: 1-1.jpg
2013-10-03 04:58acme_pjzFile Added: 1-2.jpg
2013-10-03 05:15acme_pjzNote Added: 0004061
2013-10-03 11:59IvanNote Added: 0004062
2013-10-03 14:29IvanNote Edited: 0004062bug_revision_view_page.php?bugnote_id=4062#r2424
2013-10-04 07:34acme_pjzNote Added: 0004063
2013-10-04 13:19IvanNote Added: 0004064
2013-10-04 13:48acme_pjzNote Edited: 0004063bug_revision_view_page.php?bugnote_id=4063#r2426
2013-10-04 14:20acme_pjzNote Edited: 0004063bug_revision_view_page.php?bugnote_id=4063#r2427
2013-10-04 15:14IvanNote Added: 0004065
2013-10-19 13:31Jolly WingNote Added: 0004086
2013-10-20 10:09IvanNote Added: 0004091
2017-07-22 14:16SXXNote Added: 0007163
2022-12-17 14:07IvanAssigned ToIvan =>
2023-04-11 09:21IvanStatusassigned => resolved
2023-04-11 09:21IvanResolutionopen => fixed
2023-04-11 09:21IvanAssigned To => Ivan

Notes
(0003895)
Ivan   
2013-08-26 09:33   
Hi. This is first time I hear about H3 version that uses multi-byte encoding. Can you upload fonts and text files from your version of H3?

Normally they can be found in file Data/H3bitmap.lod which can be opened using MMArchive, can be downloaded here: http://wogarchive.ru/download.php?id=119 [^]

Unpack all files with *.txt and *.fnt extensions and upload them here.
(0003921)
acme_pjz   
2013-09-03 06:15   
Text files uploaded. They are all in CP936 (a.k.a. GBK) encoding. And I think font files in H3bitmap.lod don't contain Chinese characters, because I found some file named HZK** in the root directory of H3, and I think they are Simplified Chinese font file.
(0003922)
Ivan   
2013-09-03 10:18   
Thanks. Yes - these 3 files are indeed Chinese fonts and encoding seems to match to GBK character tables I found.
I'll add support for these fonts soon - looks simple enough.

For reference:

These files consist from sequence of bitmaps, 10x10, 12x12 or 24x24 pixels in size, 1 bit per pixel, width is aligned to full byte (10x10 image actually uses 2x10 bytes per character). No header or anything like that.

Font files contain ONLY Chinese characters, ASCII symbols should be taken from English fonts.
Mapping of 2-byte GBK characters to image index:
index = (first - 0x80) * 0xA0 + (second - 0x40)
May be a bit different - file size indicates 8000+ characters while GBK can fit 20000+ symbols.
(0003935)
acme_pjz   
2013-09-06 16:00   
I’ll test it soon. BTW, it need to convert to UTF8 or UTF16 if using SDL_ttf IMO
(0003936)
Ivan   
2013-09-06 16:28   
Proper unicode support is more difficult to implement. Right now I'm leaning towards making vcmi work with GBK encoding so no conversion is needed.

I also can't use SDL_ttf here because those fonts you posted are bitmaps while SDL_ttf works only with true type vector fonts. Not really a problem since original H3 fonts are also bitmaps and are already supported (including different languages like Russian)
(0003961)
Ivan   
2013-09-08 15:08   
Looks to be working, will commit my changes soon.

Does everything on this image looks OK?
http://bugs.vcmi.eu/file_download.php?file_id=1465&type=bug [^]
(0003967)
Tow   
2013-09-08 16:16   
The text line in the message (the white line under the title text) seems to be clipped, missing several bottom pixel rows.
Same issue with labels next to the Quest Log and Dismiss buttons.
Other text seems fine.
(0003982)
acme_pjz   
2013-09-09 05:18   
Same as Tow said, and maybe too few spaces between characters.

And I want to point out that your mapping formula is incorrect. It looks like H3 Simplified Chinese edition only supports GB2312 encoding, i.e. GBK/1 and GBK/2 in GBK standard. After experiments I found that the correct mapping formula is

index = (first - 0xA1) * 94 + (second - 0xA1)

Be careful, it can map non-GB2312 character into the valid range :| For non-GB2312 characters and other edition of H3 (possibly Traditional Chinese edition which may using other text encodings?) an iconv calling and SDL_ttf must be used (VCMI can be configured to use TTF vector font by changing config/fonts.json, right?)

The second note is while HZK10 looks no problem, characters in HZK12 and HZK24H seems X-Y flipped!
(0003986)
Ivan   
2013-09-09 11:15   
Can you make screenshot of Chinese version and post it here for comparison? I compared several characters to what I see in text editor and they were looking fine.

- missing line is already fixed

- GBK vs GB2312: actually I ran into this issue already. After a quick search I found a discussion on Chinese support in Era. Including link to GBK fonts. Switching from GBK to GB2312 is possible but since GBK already includes GB2312 I think supporting GBK would be more than enough.

- unicode: I think I'll try to add some basic unicode support (utf8 to be precise). This will allow using ttf font with any encoding BUT encoding must be selected manually - I haven't found any reliable way to detect encoding. Especially problematic for very similar Win1250...Win1252 which are used in majority of H3 versions. This will solve "what encoding to use" problem but from my experience ttf fonts quality is a bit lower compared to native bitmaps.
(0003995)
acme_pjz   
2013-09-11 05:31   
Hi, I just uploaded a screenshot, taken from Internet, but it represents Chinese version well.
(0004059)
acme_pjz   
2013-10-02 05:38   
I tried VCMI 0.94, but it still doesn't work, either GBK or GB2312. Should I need to put HZK10 and other files to specified directory?
(0004060)
Ivan   
2013-10-02 16:27   
Right now you'll need to install this "mod" - it provides some files necessary for Chinese support (fonts & config file)
http://download.vcmi.eu/mods/repository/chinese%20fonts.zip [^]
Download this file and unpack int into Mods/ directory.

This mod can be also installed via launcher.
(0004061)
acme_pjz   
2013-10-03 05:15   
Thanks, it works. But there are a few glitches: mojibake due to incorrect character boundary detection (characters in green circles) and string which is too wide (in cyan circles).

http://bugs.vcmi.eu/file_download.php?file_id=1532&type=bug [^]

http://bugs.vcmi.eu/file_download.php?file_id=1533&type=bug [^]

Maybe you can fix them in next versions?
(0004062)
Ivan   
2013-10-03 11:59   
(edited on: 2013-10-03 14:29)
Sure, will fix. Note that time between our releases is around 3 months so you'll have to wait a bit.

1) Too wide string: will see what I can do - it seems that ASCII characters need special treatment.

2) What's wrong in green circles? Missing characters? Incorrect line breaks? Something else? I don't know Chinese so I have not idea how it should look like.

(0004063)
acme_pjz   
2013-10-04 07:34   
(edited on: 2013-10-04 14:20)
1) How about fall back to default ASCII font file for ASCII characters (and characters are variable width)? This is the behavior of original H3.

2) The character render is incorrect in green circles, which called "乱码" in Chinese, and (hopefully) Mojibake in English. (http://en.wikipedia.org/wiki/Mojibake [^]) Since GBK is 2-byte encoding and I think the line breaking code is not taken this into account, so it break the line inside character.

By the way, Are there any nightly builds for Windows (or Linux)?

[EDIT] Sorry, but I have already found them in the forum.

(0004064)
Ivan   
2013-10-04 13:19   
1) I thought about this at first but Chinese fonts have different height. OK, I'll try to find matching fonts. Shouldn't be hard to implement.

2) Thanks. Yeah - that's probably some bug in line splitting code. Will fix.
(0004065)
Ivan   
2013-10-04 15:14   
>> By the way, Are there any nightly builds for Windows (or Linux)?
You can find nighty builds for Ubuntu here: https://launchpad.net/~vcmi/+archive/ppa [^]
For technical reasons (multiple versions of Ubuntu) launcher is disabled but othervice they should work.

"Nighty builds" for Windows are usually done some time before release so there won't be any Windows builds for around 2 months.
(0004086)
Jolly Wing   
2013-10-19 13:31   
Hi, Ivan.

I have made some little changes on the source code 0.9.3 to make vcmi to support the Game Data archives from Simplified Chinese version. I think such a modification can also support Game Data archives from CJK (Chinese, Japanese, Korean and such multi-bytes language) versions through some improvements.

When the engine reads map data and general text data, I convert the GBK encoded string into UTF8 encoded string, then I render the strings with SDL_TTF_RenderUTF8. It works!

The changes I made are list as follow:

1. client/CMessage.cpp
in function CMessage::breakText(), line 153

            // added by [email protected], 2013-08-28
            // If the text[z] is less than 0, it is the first byte of a UTF8 Chinese word.
#ifdef ZH_CN
            else if (text[z] < 0){
                z++;
                z++;
             lineLength += graphics->fonts[font]->getSymbolWidth(text[z]);
            }
#endif

2. client/gui/Fonts.cpp
in function CTrueTypeFont::getStringWidth(), line 255

    // added by [email protected], 2013-08-28
    // If we are handling simplified chinese, it is a UTF8 string
#ifdef ZH_CN
    TTF_SizeUTF8(font.get(), data.c_str(), &width, NULL);
#else
    TTF_SizeText(font.get(), data.c_str(), &width, NULL);
#endif

3. client/gui/Fonts.cpp
in function CTrueTypeFont::renderText(), line 279

     if (blended)
            // added by [email protected], 2013-09-28 Sat
            // If we are handling simplified chinese game data, it is a UTF8 string
#ifdef ZH_CN
         rendered = TTF_RenderUTF8_Blended(font.get(), data.c_str(), color);
#else
         rendered = TTF_RenderText_Blended(font.get(), data.c_str(), color);
#endif
     else
            // added by [email protected], 2013-09-28 Sat
            // If we are handling simplified chinese game data, it is a UTF8 string
#ifdef ZH_CN
         rendered = TTF_RenderUTF8_Solid(font.get(), data.c_str(), color);
#else
         rendered = TTF_RenderText_Solid(font.get(), data.c_str(), color);
#endif

4. add lib/ConvertEncoding.cpp and lib/ConvertEncoding.h

The content of ConvertEncoding.h is:

char * convert_enc(char *src_enc, char *dest_enc, const char * src_string);

The content of ConvertEncoding.cpp is:

/*
 * ConvertEncoding.cpp, for vcmi using CJK(China/Japan/Korea) data.
 *
 * Authors: Wu Jiqing ([email protected])
 *
 * License: GNU General Public License v2.0 or later
 *
 */
// added by jiqingwu([email protected])
// 2013-09-27 Fri
#include <stdio.h>
#include <iconv.h>
#include <string.h>

// added by [email protected], 2013-09-27 Fri
char * convert_enc(char *src_enc, char *dest_enc, const char * src_string)
{
#define UTF8_STR_LEN 5000

    static char out_string[UTF8_STR_LEN], *sin, *sout;
    int in_len, out_len, ret;
    iconv_t c_pt;

    if ((c_pt = iconv_open(dest_enc, src_enc)) == (iconv_t)-1)
    {
        printf("iconv open failed!\n");
        return NULL;
    }
    // iconv(c_pt, NULL, NULL, NULL, NULL);
    in_len = strlen(src_string) + 1;
    out_len = UTF8_STR_LEN;
    sin = (char *)src_string;
    sout = out_string;
    ret = iconv(c_pt, &sin, (size_t *)&in_len, &sout, (size_t *)&out_len);
    if (ret == -1)
    {
        return NULL;
    }
    iconv_close(c_pt);
    return out_string;
}

to link ConvertEncoding.o into library, add two lines into lib/CMakeLists.txt:

set(lib_SRCS
        ...
        ConvertEncoding.cpp
)

set(lib_HEADERS
        ...
        ConvertEncoding.h
)

5. lib/CGeneralTextHandler.cpp,
To include "ConvertEncoding.h"

// added by jiqingwu([email protected])
// 2013-09-27 Fri
#include "ConvertEncoding.h"

in function CLegacyConfigParser::readString(), line 112

    // added by [email protected], 2013-09-27 Fri
    // convert gbk string to utf-8 string.
    // (For simplified Chinese game data, the string is GBK encoded)
#ifdef ZH_CN
    char * utf8_str = convert_enc("GBK", "UTF8", ret.c_str());
    return std::string((const char*)utf8_str);
#else
    return ret;
#endif

6. lib/filesystem/CBinaryReader.cpp,
to include "ConvertEncoding.h":

// added by <[email protected]>, 2013-09-28 Sat
#include "../ConvertEncoding.h"

in function CBinaryReader::readString(), line 95

    // added by [email protected], 2013-08-22
    // If we are handling chinese data, convert gbk string to utf-8 string.
#ifdef ZH_CN
    char * utf8_str = convert_enc("GBK", "UTF8", ret.c_str());
    return std::string((const char*)utf8_str);
#else
    return ret;
#endif

7. add such a line into ./CMakeLists.txt to enable supporting Simplifed Chinese Game Data

add_definitions(-DZH_CN)

8. cmake, make, make install

To Play

Use the Data from the chinese version of Death of Shadow. Link the 'Data', 'Maps', 'Mp3' directories under /usr/local/share/vcmi like this (You need have root privilege):
cd /usr/local/share/vcmi
ln -s /Data/Dir/of/ChineseGame Data
ln -s /Maps/Dir/of/ChineseGame Maps
ln -s /Mp3/Dir/of/ChineseGame Mp3

To show chinese characters in this game, you need put a true type font which supports Chinese into /usr/local/share/vcmi/Data.

cp /chinese/font/path /usr/local/share/vcmi/Data

In addition, you need edit the /usr/local/share/vcmi/config/fonts.json, modify the truetype font section like this:

"trueType":
{
    "BIGFONT" : { "file" : "ChineseFont.ttf", "size" : 22, "blend" : true},
    "CALLI10R" : { "file" : "ChineseFont.ttf", "size" : 10, "blend" : true},
    "CREDITS" : { "file" : "ChineseFont.ttf", "size" : 28, "blend" : true},
    "HISCORE" : { "file" : "ChineseFont.ttf", "size" : 13, "blend" : true},
    "MEDFONT" : { "file" : "ChineseFont.ttf", "size" : 16, "blend" : true},
    "SMALFONT" : { "file" : "ChineseFont.ttf", "size" : 13, "blend" : true},
    "TIMES08R" : { "file" : "ChineseFont.ttf", "size" : 11, "blend" : true},
    "TINY" : { "file" : "ChineseFont.ttf", "size" : 11, "blend" : true},
    "VERD10B" : { "file" : "ChineseFont.ttf", "size" : 13, "blend" : true}
}
Where ChineseFont.ttf is your true type font.

Then, we can Play Game
$ vcmiclient
(0004091)
Ivan   
2013-10-20 10:09   
Thanks. I'll take a look on this. I don't like idea of using ifdef's to enable some functionality but you've tracked every place that needs changes - that will help.
(0007163)
SXX   
2017-07-22 14:16   
Interesting how this works now. Last year VCMI got support for UTF in file paths:
https://github.com/vcmi/vcmi/pull/156 [^]