Notes |
|
(0005967)
|
AVS
|
2015-10-22 16:51
|
|
I also have this problem, but had no success in fixing it. Game crashes if you start game "too fast". |
|
|
|
|
|
|
@avs, looks like I've advanced in debugging of this crash.
Actually, it is vcmiserver which is responsible for this. It segfaults, and then vcmeclient segfaults because of it.
vcmiserver segfault trace is as follows:
rror: signal 11:
0 vcmiserver 0x000000010da93154 _Z17handleLinuxSignali + 164
1 libsystem_platform.dylib 0x00007fff8bc9c52a _sigtramp + 26
2 ??? 0x00007fff64e5e320 0x0 + 140734886175520
3 vcmiserver 0x000000010d9fc7ca _ZN12CGameHandler7newTurnEv + 2954
4 vcmiserver 0x000000010da021c4 _ZN12CGameHandler3runEb + 5348
5 vcmiserver 0x000000010da924f1 _ZN11CVCMIServer5startEv + 2049
6 vcmiserver 0x000000010da93b81 main + 2337
7 libdyld.dylib 0x00007fff932bc5ad start + 1
8 ??? 0x0000000000000002 0x0 + 2
I've managed to narrow down the faulty part.
It's CGameHandler::newTurn() (server/CGameHandler.cpp) -> sendAndApply(&sah); (near line 1320) -> CGameHandler::sendAndApply(CPackForClient * info) -> gs->apply(info); -> CGameState::apply(CPack *pack) (lib/CGameState.cpp, line 2161)
There is the following code:
ui16 typ = typeList.getTypeID(pack);
applierGs->apps[typ]->applyOnGS(this,pack);
typ is retuned equal to 0
applierGs->apps is std::map with keys from 137 to 227 (no zero key), so applierGs->apps[typ] makes nullptr and we crash.
So it looks like typeList.getTypeID(pack) should return something other than 0. |
|
|
(0005980)
|
AVS
|
2015-10-24 16:32
|
|
getTypeID returns 0 if type - SetAviableHeroes - not found, this is possible as random crash only if there is race condition: server sends packet before all packets are registered in type list. |
|
|
(0005981)
|
AVS
|
2015-10-24 16:40
|
|
But this is almost impossible - types are registered in CConnection constructor... |
|
|
(0005990)
|
vmarkovtsev
|
2015-10-24 19:35
(edited on: 2015-10-24 19:47) |
|
Damn strange: indeed, SetAvailableHeroes is registered in CGameState::CGameState and the same code which gets the type id returns 152 (the pid and "this" are the same as in apply()).
I will trace the point where it "forgets" about the type.
|
|
|
|
void CGameState::apply(CPack *pack)
{
SetAvailableHeroes sah;
ui16 typ = typeList.getTypeID(&sah);
fprintf(stderr, "%d %p\n", (int)typ, applierGs->apps[typ]);
fprintf(stderr, "%s %s\n", typeid(&sah).name(), typeid(sah).name());
fprintf(stderr, "%s %s\n", typeid(pack).name(), typeid(*pack).name());
typ = typeList.getTypeID(pack);
applierGs->apps[typ]->applyOnGS(this,pack);
}
outputs
152 0x7fd192f89c80
P18SetAvailableHeroes 18SetAvailableHeroes
P5CPack 18SetAvailableHeroes
and then segfaults. typeid() runtime problem? |
|
|
|
fprintf(stderr, "%p %s\n", typeList.getTypeInfo(&sah), typeList.getTypeInfo(&sah)->name());
fprintf(stderr, "%p %s\n", typeList.getTypeInfo(pack), typeList.getTypeInfo(pack)->name());
0x101361370 18SetAvailableHeroes
0x100cbc980 18SetAvailableHeroes
So this is the problem! typeid() returns different objects for the same type. |
|
|
(0005999)
|
AVS
|
2015-10-25 00:36
|
|
|
|
(0006000)
|
vmarkovtsev
|
2015-10-25 06:17
(edited on: 2015-10-25 06:25) |
|
Thanks, but it expectedly did not help (all the branch changes as well).
The problem is not with thread safety but rather with typeid() behavior. It *may* return different objects for the same actual type.
The only way to overcome this is to change the CTypeList.typeInfos mapping from std::map<const std::type_info *, TypeInfoPtr, TypeComparer> to std::map<std::string, TypeInfoPtr, TypeComparer>. The key should not be the address of typeid() result but rather typeid().name(). Please not that const char* key type will not do here.
|
|
|
|
@Povelitel, я не могу зарегистрироваться на форуме, код валидации не отображается. Пробовал в Chrome и Safari. |
|
|
|
@avs, as I suspected, the whole CTypeList design is erroneous.
Citing http://en.cppreference.com/w/cpp/language/typeid [^]
There is no guarantee that the same std::type_info instance will be returned by all evaluations of the typeid expression on the same type, although std::type_info::hash_code of those type_info objects would be identical, as would be their std::type_index.
So we *have* to change the mapping to use hash_code instead of raw pointers. |
|
|
|
Hm, there is TypeComparer to deal with exactly the same issue. It does not work for some reason... |
|
|
|
Yeah, right. I've conducted another experiment:
TypeComparer tc;
const auto& t1 = typeid(sah);
const auto& t2 = typeid(*pack);
fprintf(stderr, "%d %d\n", (int)tc(&t1, &t2), (int)tc(&t2, &t1));
it prints
0 1
but really should print 0 0 |
|
|
(0006010)
|
vmarkovtsev
|
2015-10-25 09:04
(edited on: 2015-10-25 09:06) |
|
fprintf(stderr, "%zu %zu\n", t1.hash_code(), t2.hash_code());
prints different numbers
4550461232 4543683504
This is crazy! Types really look different although the are the same.
|
|
|
(0006011)
|
Kantor
|
2015-10-25 09:24
|
|
@vmarkovtsev "я не могу зарегистрироваться на форуме, код валидации не отображается. Пробовал в Chrome и Safari."
hehe, sorry for off-topic, but you don't need to wait for any image, just type "VCMI" (without quotes)in the "Name of our project" field. |
|
|
(0006012)
|
vmarkovtsev
|
2015-10-25 09:25
(edited on: 2015-10-25 09:26) |
|
Another experiment:
const auto& tid = typeid(sah);
fprintf(stderr, "%s %p %zu\n", tid.name(), &tid, tid.hash_code());
after "SetAvailableHeroes sah;" in CGameHandler.cpp (original vcmiserver):
18SetAvailableHeroes 0x1026109a0 4334824368
while the previous code prints
0x102cc9370 0x1026109a0
4341610288 4334824368
So SetAvailableHeroes typeid() in the code belonging to libvcmi.dylib is totally different from the code belonging to vcmiserver itself. Dunno if it's because of some agressive sort of inlining or anything else.
|
|
|
|
@Kantor, great, I've managed to register, thanks %) Super powerful bot defense skill) |
|
|
|
Ok, I 've managed to fix this stuff with the following TypeComparer:
struct TypeComparer
{
bool operator()(const std::type_info *a, const std::type_info *b) const
{
return std::string(a->name()) < std::string(b->name());
}
};
not efficient, but it works. vcmiclient still segfaults, but server survives. |
|
|
|
Here is the new crash log (vcmi log was uploaded as crash2.txt):
Thread 10 Crashed:
0 libvcmi.dylib 0x000000010a8fe621 COSer::CPointerSaver<SaveGame>::savePtr(CSaverBase&, void const*) const + 17
1 libvcmi.dylib 0x000000010a84daf8 void COSer::savePointer<CPack const*>(CPack const* const&) + 776
2 libvcmi.dylib 0x000000010a83c6aa CConnection::sendPackToServer(CPack const&, PlayerColor, unsigned int) + 186
3 com.vcmi.vcmiclient 0x000000010a05a1b1 CClient::sendRequest(CPack const*, PlayerColor) + 1361
4 com.vcmi.vcmiclient 0x000000010a05aa38 CClient::save(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 184
5 com.vcmi.vcmiclient 0x000000010a174cce CPlayerInterface::yourTurn() + 1422
6 com.vcmi.vcmiclient 0x000000010a20f7b6 YourTurn::applyCl(CClient*) + 918
7 com.vcmi.vcmiclient 0x000000010a05a84b CClient::handlePack(CPack*) + 523
8 com.vcmi.vcmiclient 0x000000010a05a51a CClient::run() + 106
9 libboost_thread-mt.dylib 0x000000010af6e715 boost::(anonymous namespace)::thread_proxy(void*) + 53
10 libsystem_pthread.dylib 0x00007fff8b5319b1 _pthread_body + 131
11 libsystem_pthread.dylib 0x00007fff8b53192e _pthread_start + 168
12 libsystem_pthread.dylib 0x00007fff8b52f385 thread_start + 13 |
|
|
|
|
|
(0006022)
|
AVS
|
2015-10-25 13:02
|
|
I think we need to switch all keys to std::string - changing comaparer is not enough. |
|
|
|
The cause of the described crash is the failure of dynamic_cast.
CClient::save() makes SaveGame instance on the stack. The following works:
SaveGame save_game(fname);
auto ptr1 = dynamic_cast<CPackForClient*>(&save_game);
auto ptr2 = dynamic_cast<CPack*>(ptr1);
auto ptr3 = reinterpret_cast<CPack*>(ptr1);
fprintf(stderr, "CCLient.save %p %p %p\n", &save_game, ptr1, ptr2);
fprintf(stderr, "CCLient.save %p %p\n", ptr3, dynamic_cast<CPackForClient*>(ptr3));
CCLient.save 0x7000007bda80 0x7000007bda80 0x7000007bda80
CCLient.save 0x7000007bda80 0x7000007bda80
But when we get into libvcmi.dylib and invoke castToMostDerived(), it calls PointerCaster<CPack, CClientPack>::castRawPtr() and dynamic_cast inside fails (returns nullptr). This looks the same cause as the first crash: RTTI works very bad. I should try static linkage... |
|
|
|
Ok, I've made a fix and finally was able to launch the game. I added the fallback to static_cast in that place. Pull request is pending...
|
|
|
|
https://github.com/vcmi/vcmi/pull/129 [^]
Please note that making keys std::string is not necessary, so you should force push into the branch without 08e9246791c8feb9eb1ef4a5888c1b7e99774215 (git reset --hard HEAD^ && git push -f origin issue/2306) |
|
|
(0006042)
|
vmarkovtsev
|
2015-10-27 06:21
(edited on: 2015-10-27 06:29) |
|
Well, passing the first turn still is not successful.
I get the following:
Player 1 starting turn
Our heroes don't move anymore, exhaustive decomposition failed
hero Ìîíåð has 1936 MP left
Player 1 ends turn
Player 1 ended turn
Player 5 starting turn
Our heroes don't move anymore, exhaustive decomposition failed
Opening StupidAI
Loaded Stupid AI 0.1
Player 5 has to answer queries before attempting any further actions. Top query is A query of type 12CBattleQuery and qid=34 affecting players (5)!
System message: Server encountered a problem: Player 5 has to answer queries before attempting any further actions. Top query is A query of type 12CBattleQuery and qid=34 affecting players (5)!
Then vcmi hangs.
|
|
|
|
|
|
(0006044)
|
AVS
|
2015-10-27 14:25
|
|
>Opening StupidAI
>Loaded Stupid AI 0.1
I think this "stub" AI just not works. VCAI should be default. |
|
|
|
@avs, can you please point me out where to switch the default AI type? |
|
|
(0006048)
|
SXX
|
2015-10-27 15:49
(edited on: 2015-10-27 15:52) |
|
settings.json
So you just need to make it use "BattleAI" instead of "StupidAI".
My version of that file:
{
"adventure" : {
"enemySpeed" : 8,
"heroReminder" : false,
"heroSpeed" : 8,
"scrollSpeed" : 2
},
"battle" : {
"animationSpeed" : 0.4,
"cellBorders" : true,
"showQueue" : false
},
"general" : {
"encoding" : "CP1251",
"music" : 33,
"playerName" : "player"
},
"logging" : {
"console" : {
"colorMapping" : [
{
"color" : "magenta",
"domain" : "network",
"level" : "trace"
}
],
"threshold" : "debug"
},
"loggers" : [
{
"domain" : "global",
"level" : "trace"
},
{
"domain" : "ai",
"level" : "trace"
},
{
"domain" : "network",
"level" : "trace"
}
]
},
"server" : {
"neutralAI" : "BattleAI"
},
"video" : {
"showIntro" : false,
"spellbookAnimation" : false
}
}
|
|
|
|
Thanks! But nah, still no luck
Player 1 starting turn
Our heroes don't move anymore, exhaustive decomposition failed
hero Мариус has 1560 MP left
Player 1 ends turn
Player 1 ended turn
Player 5 starting turn
Our heroes don't move anymore, exhaustive decomposition failed
Opening BattleAI
Loaded Battle AI
Player 5 has to answer queries before attempting any further actions. Top query is A query of type 12CBattleQuery and qid=28S yastfefme cmteisnsga gpel:a ySeerrsv e(r5 e)n!coun
tered a problem: Player 5 has to answer queries before attempting any further actions. Top query is A query of type 12CBattleQuery and qid=28 affecting players (5)!
|
|
|
|
I guess, typeid() is used in AI sources, will look into them later |
|
|
|
We hang on
battleMadeAction.cond.wait(lock);
in CGameHandler::runBattle
What's interesting is that isBlockedByQueries() which reports the error receives MakeAction CPack. |
|
|
|
It hangs on other types of events, too. For example,
Failed to realize subgoal of type BUILD (greater goal type was BUILD), I will stop.
The error message was: BUILD has been realized as much as possible.
Looking into Роланд, MP=1500
Of all 6 destinations, object oid=489 seems nice
Роланд will try to visit Монстр at (8 9 1)
Moving hero Роланд to tile 8 9 1
8MoveHero
8MoveHero
8MoveHero
8MoveHero
8MoveHero
Adding query 28 - Blocking dialog query with 0 components - {Последователи}
Гноллы хотят прославиться и мечтают служить в вашей доблестной армии. Вы согласны их принять? . Total queries count: 1
I'll answer the query 28 giving the choice 1
Attempted answering query 28 - Blocking dialog query with 0 components - {Последователи}
Гноллы хотят прославиться и мечтают служить в вашей доблестной армии. Вы согласны их принять? . Request id=29. Waiting for results...
10QueryReply
Player 5 has to answer queries before attempting any further actions. Top query is A query of type 20CBlockingDialogQuery and qid=28 affecting players (5)!
System message: Server encountered a problem: Player 5 has to answer queries before attempting any further actions. Top query is A query of type 20CBlockingDialogQuery and qid=28 affecting players (5)!
Something went really wrong, failed to answer query 28: Blocking dialog query with 0 components - {Последователи}
Гноллы хотят прославиться и мечтают служить в вашей доблестной армии. Вы согласны их принять? |
|
|
|
bool CBattleQuery::blocksPack(const CPack *pack) const
{
return !dynamic_cast<const MakeAction*>(pack) && !dynamic_cast<const MakeCustomAction*>(pack);
}
I am starting to suspect something... |
|
|
|
Jesus, VCMI uses dynamic_cast everywhere, it's like walking on a mine field... |
|
|
|
^Lol this doesn't look like optimal design for sure. But no one touched this part of code in years. |
|
|
(0006056)
|
vmarkovtsev
|
2015-10-27 20:26
(edited on: 2015-10-27 20:27) |
|
Yes, fixing this part enabled to conduct the battle and end the turn.
So, to summarize. Every typeid() and dynamic_cast on any type from libvcmi.dylib in vcmiserver is broken. I have to workaround each case separately.
|
|
|
|
|
|
|
|
|
|
Unfortunately, specifying -fvisibility-ms-compat had no effect. I've linked libvcmi.dylib, vcmiserver and vcmiclient with that option to no avail. Returning to working around each dynamic_cast separately. |
|
|
|
|
|
(0006063)
|
vmarkovtsev
|
2015-10-31 13:44
(edited on: 2015-10-31 13:45) |
|
There is a rare (1-2 weeks) crash with the following info:
Application Specific Information:
terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >: boost: mutex lock failed in pthread_mutex_lock: Invalid argument
abort() called
Thread 12 Crashed:
0 libsystem_kernel.dylib 0x00007fff9964a0ae __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff931ec500 pthread_kill + 90
2 libsystem_c.dylib 0x00007fff8e17f37b abort + 129
3 libc++abi.dylib 0x00007fff9bc43f81 abort_message + 257
4 libc++abi.dylib 0x00007fff9bc69a2f default_terminate_handler() + 243
5 libobjc.A.dylib 0x00007fff9adcf0a2 _objc_terminate() + 124
6 libc++abi.dylib 0x00007fff9bc6719e std::__terminate(void (*)()) + 8
7 libc++abi.dylib 0x00007fff9bc67213 std::terminate() + 51
8 libVCAI.dylib 0x0000000115e0bbd3 boost::condition_variable::wait(boost::unique_lock<boost::mutex>&) + 163
9 libVCAI.dylib 0x0000000115e06a5b boost::barrier::wait() + 91
10 libVCAI.dylib 0x0000000115e017b9 boost::detail::thread_data<VCAI::requestActionASAP(std::__1::function<void ()>)::$_29>::run() + 153
11 libboost_thread-mt.dylib 0x000000010d085715 boost::(anonymous namespace)::thread_proxy(void*) + 53
12 libsystem_pthread.dylib 0x00007fff931e99b1 _pthread_body + 131
13 libsystem_pthread.dylib 0x00007fff931e992e _pthread_start + 168
14 libsystem_pthread.dylib 0x00007fff931e7385 thread_start + 13
VCMI stdout:
I'll answer the query 172 giving the choice 0
Attempted answering query 172 - Exchange between heroes Мирланда and Вистан. Request id=238. Waiting for results...
Removing query 172 - Exchange between heroes Мирланда and Вистан. Total queries count: 0
Hero Мирланда moved from 11 9 1 to 11 9 1. Returning 1.
hero Мирланда visited hero 611
Striving to goal of type CLEAR WAY TO (2 12 1) (Мирланда)
Considering goal CLEAR WAY TO (2 12 1) (Мирланда)
Considering goal GATHER ARMY (Мирланда)
Considering goal VISIT HERO Вистан, Хозяин зверей (Мирланда)
Trying to realize VISIT HERO Вистан, Хозяин зверей (Мирланда) (value 4.000)
Moving hero Мирланда to tile 10 10 1
Adding query 175 - Exchange between heroes Мирланда and Вистан. Total queries count: 1
libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >: boost: mutex lock failed in pthread_mutex_lock: Invalid argument
Abort trap: 6
Will look into it...
|
|
|
(0006064)
|
vmarkovtsev
|
2015-10-31 14:26
(edited on: 2015-10-31 14:27) |
|
This is not a dynamic_cast-specific bug, but rather an issue with boost::barrier on MacOSX. Here is the code of requestActionASAP
void VCAI::requestActionASAP(std::function<void()> whatToDo)
{
// static boost::mutex m;
// boost::unique_lock<boost::mutex> mylock(m);
fprintf(stderr, "before barrier\n");
boost::barrier b(2);
fprintf(stderr, "after barrier\n");
boost::thread newThread([&b,this,whatToDo]()
{
setThreadName("VCAI::requestActionASAP::helper");
SET_GLOBAL_STATE(this);
boost::shared_lock<boost::shared_mutex> gsLock(cb->getGsMutex());
b.wait();
whatToDo();
});
b.wait();
}
Relevant links:
[1] http://www.boost.org/doc/libs/1_55_0/boost/thread/barrier.hpp [^]
[2] http://stackoverflow.com/questions/24465533/implementing-boostbarrier-in-c11 [^]
|
|
|
|
The corresponding commit:
commit 13f26fc3cb2dfc0991a07bde77b4cfb60b91e4d1
Author: Michał W. Urbańczyk <[email protected]>
Date: Sun Mar 25 22:46:14 2012 +0000
* Client is able to await for answers for multiple queries at the same time
* Hackish solution allowing AI undertaking actions from event-handling thread
* Fixed crash when death stare or acid breath activated on stack that was just killed
* minor fixes |
|
|
|
Ok, I've removed barrier at all, since it doesn't influence on anything actually. Played for 2 months without any crashes.
Pull request 130 was updated.
Finally getting to the game after 2 weeks of debugging! Thanks to everyone! I think we can close this issue.
|
|
|
(0006067)
|
AVS
|
2015-10-31 22:22
|
|
At first it should be merged) |
|
|
(0006068)
|
AVS
|
2015-10-31 22:24
|
|
There are conflicts with recent changes |
|
|
(0006183)
|
AVS
|
2015-12-19 01:30
|
|
|