更新日期:2021-09-07
[TOC]
1、基本情况
最终堆栈位置:
1
2
3
XXX -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:)
XXX -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:)
XXX -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:)
2、分析
2.1 堆栈分析
跟踪多条Crash堆栈
XXX堆栈 1:
1
2
3
4
5
6
7
8
9
10
9 XXX 0x0000000103bf4718 -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
11 XXX 0x0000000100721688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
11 XXX 0x0000000100721688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
21 XXX 0x00000001004434cc -[KSKVStorage writeObjectToFile:forKey:atomic:completion:] (KSKVStorage.m:0)
22 XXX 0x000000010326f550 -[KSLocalConfigManager internalRemoveLocalConfig:isGolbal:] (KSLocalConfigManager.m:273)
23 XXX 0x0000000103270330 +[KSLocalConfigManager removeLocalConfig:isGolbal:] (KSLocalConfigManager.m:0)
24 XXX 0x000000010326fa64 +[KSLocalConfigManager dailyCommonValueConfig:isGolbal:] (KSLocalConfigManager.m:0)
25 XXX 0x0000000100b55a30 -[KSKTVRoomViewController(InteractionGuide) checkIfShowGameOngoingView:] (KSKTVRoomViewController+InteractionGuide.m:0)
26 XXX 0x0000000100b54d38 -[KSKTVRoomViewController(InteractionGuide) showGameOngoingViewWhenEnterKTV:] (KSKTVRoomViewController+InteractionGuide.m:192)
37 XXX 0x00000001014ec7d4 main (main.m:37)
XXX堆栈2:
1
2
3
4
5
6
7
8
9
10
9 XXX 0x0000000103b94718 -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
11 XXX 0x00000001006c1688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
11 XXX 0x00000001006c1688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
21 XXX 0x00000001003e34cc -[KSKVStorage writeObjectToFile:forKey:atomic:completion:] (KSKVStorage.m:0)
22 XXX 0x000000010320ee7c -[KSLocalConfigManager internalSaveLocalConfig:value:isGolbal:] (KSLocalConfigManager.m:220)
23 XXX 0x000000010320fb10 +[KSLocalConfigManager saveLocalConfig:commonValue:isGolbal:] (KSLocalConfigManager.m:0)
24 XXX 0x0000000102152b94 -[KSSendGiftContainerView tabView:didChangeSelectIndex:] (KSSendGiftContainerView.m:1952)
25 XXX 0x000000010056235c -[KSTabView setSelectIndex:] (KSTabView.m:723)
28 XXX 0x00000001009ccbc4 -[KSBaseButton sendAction:to:forEvent:] (KSBaseButton.m:0)
46 XXX 0x000000010148c7d4 main (main.m:37)
2.2 代码分析
所有类型的堆栈最终都指向KSKVStorage writeObjectToFile,
这是一个存储控制类,writeObjectToFile是将对象归档后写到磁盘
最终堆栈指向
0 libsystem_kernel.dylib 0x000000018aa97014 ___pthread_kill + 8
1 libsystem_c.dylib 0x000000018aa0b400 _abort + 140
2 libsystem_malloc.dylib 0x000000018aadba5c _nano_free_definite_size + 92
3 libsystem_malloc.dylib 0x000000018aadd028 _nano_realloc + 648
4 libsystem_malloc.dylib 0x000000018aacf240 _malloc_zone_realloc + 180
5 Foundation 0x000000018c5aa3e4 -[NSOperation isAsynchronous] + 180
6 Foundation 0x000000018c53c2a0 __decodeInt64 + 452
可以认为是NSKeyedUnarchiver 与 NSKeyedArchiver 触发的对象 KSKitBaseModel 及 KSBaseModel 的 encodeWithCoder 方法,
触发encodeWithCoder:方法为自动序列化的标准做法如下:
- (void)encodeWithCoder:(NSCoder *)encoder {
Class cls = [self class];
while (cls != [NSObject class]) {
unsigned int numberOfIvars = 0;
Ivar* ivars = class_copyIvarList(cls, &numberOfIvars);
for(const Ivar* p = ivars; p < ivars+numberOfIvars; p++)
{
Ivar const ivar = *p;
const char *type = ivar_getTypeEncoding(ivar);
NSString *key = [NSString stringWithUTF8String:ivar_getName(ivar)];
if (key == nil){
continue;
}
if ([key length] == 0){
continue;
}
id value = [self valueForKey:key];
if (value) {
switch (type[0]) {
case _C_STRUCT_B: {
NSUInteger ivarSize = 0;
NSUInteger ivarAlignment = 0;
NSGetSizeAndAlignment(type, &ivarSize, &ivarAlignment);
NSData *data = [NSData dataWithBytes:(const char *)((__bridge void *)self) + ivar_getOffset(ivar)
length:ivarSize];
[encoder encodeObject:data forKey:key];
}
break;
default:
[encoder encodeObject:value
forKey:key];
break;
}
}
}
if (ivars) {
free(ivars);
}
cls = class_getSuperclass(cls);
}
}
上述方法通过 runtime机制 遍历当前类及其父类的属性,可以推断,在某个类或者某个继承关系中存在如下调用顺序时,触发了crash
1
2
3
[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
代码行数可能不准。
根据断点调试,可能发生在 某个对象的 的归档中:
2.3 问题定位
2.3.1 堆栈定位
再看堆栈,实现了自动序列化的KSBaseModel 及 KSKitBaseModel 作为全民K歌项目持久层对象的基类在自动序列化过程中发生crash:
0 libsystem_kernel.dylib 0x000000018aa97014 ___pthread_kill + 8
1 libsystem_c.dylib 0x000000018aa0b400 _abort + 140
2 libsystem_malloc.dylib 0x000000018aadba5c _nano_free_definite_size + 92
3 libsystem_malloc.dylib 0x000000018aadd028 _nano_realloc + 648
4 libsystem_malloc.dylib 0x000000018aacf240 _malloc_zone_realloc + 180
5 Foundation 0x000000018c5aa3e4 -[NSOperation isAsynchronous] + 180
6 Foundation 0x000000018c53c2a0 __decodeInt64 + 452
7 XXX 0x0000000103cab808 -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
8 Foundation 0x000000018c53c49c __decodeIntXXX+ 416
9 XXX 0x00000001037a0b94 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:52)
……
在分析了自动序列化代码未找到问题的情况下,看引发crash更直接的libsystem_malloc.dylib 通过对_nano_free_definite_size产生crash的调研发现bugly及微信团队在之前遇到过类似的情况:【腾讯Bugly干货分享】聊聊苹果的Bug - iOS 10 nano_free Crash
2.3.2 问题定位
通过分析,发生crash机型都是iOS10的设备, 问题可以初步断定存在于iOS10的设备上i一类堆栈为nano_free字样的Crash,
2.4 复现
2.4.1 构建复现条件
将全民K歌持久层代码简化后在新建工程中搭建复现环境。 模拟高频,高数据量持久化核心代码如下:
- (void)addOperation{
int count = 0;
while (count < 10000) {
[self.lock lock];
//通过两台设备(iOS10 iPhone7P | iOS14 iPhone12)控制变量法模拟测试结论如下:
//iOS10设备的问题:
//通过修改 count % particle 中的 particle的值可以控制需要序列化的数组对象modelData的数据量,
//实践证明,当 particle 值越大, modelData的数据量 越大,越容易发生crash
//解决方案,控制每次序列化 数据量可以规避此crash
//非iOS10设备未出现此问题
NSString *key = [NSString stringWithFormat:@"%d",count];
int particle = 5;
if (count % particle == 1) {
if (self.modelDataDict.allKeys.count > 0) {
[self.modelDataDict removeObjectForKey:[self.modelDataDict.allKeys objectAtIndex:0]];
}
} else {
KSModel *model = [[KSModel alloc] init];
[self.modelDataDict setObject:model forKey:key];
}
[self.lock unlock];
count ++;
[self addopera:[self golbalSaveConfigKey] from:0 data:[self.modelDataDict mutableCopy]];
NSLog(@"done index:%d datalen = %lu", count, (unsigned long)self.modelDataDict.allKeys.count);
}
}
- (void)addopera:(NSString *) key from:(int) type data:(id<NSCoding, NSObject>)obj{
__block NSData *value = nil;
[_diskOperationQueue addOperationWithBlock:^{
NSMutableString *str = [NSMutableString new];
@try {
value = [NSKeyedArchiver archivedDataWithRootObject:obj];
}
@catch (NSException *exception) {
NSLog(@"exception = %@ || from = %d " ,exception ,type);
}
NSString *filePath = [ViewController migrateFileNameForKey:[NSString stringWithFormat:@"%@_%@_a", str, key]];
NSError *error = nil;
[value writeToFile:filePath options:NSDataWritingAtomic error:&error];
}];
}
2.4.2 控制设备变量对比
通过控制设备以不同iOS版本号的设备进行测试,在iOS10 iPhone7P上跑上述代码 发生了Crash,Crash堆栈位置吻合:
在iOS14 iPhone12 未出现此crash
2.4.3 控制数据量对比
通过改变上述 particle 的值为 2 时,iOS10的设备不会发生crash,可以证明在所需序列化数据量较小时(至少说明在长度小于等于2的情况)不会发生crash,运行日志表明循环10000次序列化的情况:
2021-09-07 21:22:25.626353 PT-2[1187:35082] done index:9995 datalen = 2
2021-09-07 21:22:25.626385 PT-2[1187:35082] done index:9996 datalen = 1
2021-09-07 21:22:25.626456 PT-2[1187:35082] done index:9997 datalen = 2
2021-09-07 21:22:25.626520 PT-2[1187:35082] done index:9998 datalen = 1
2021-09-07 21:22:25.626584 PT-2[1187:35082] done index:9999 datalen = 2
2021-09-07 21:22:25.626610 PT-2[1187:35082] done index:10000 datalen = 1
而当particle超过一定数值及循环次数达到一定次数(也就是所需序列化的数组长度足够长时)就会发生crash,通过日志查看,当前复现条件数组长度达到5000左右时发生crash,与在全民K歌项目中的环境不同,可能这个长度会有差异:
2021-09-07 21:14:15.166295 PT-2[1087:32256] done index:8773 datalen = 5264
PT-2(1087,0x16dfe3000) malloc: *** error for object 0x173050000: Freeing unallocated pointer
*** set a breakpoint in malloc_error_break to debug
*** error for object 0x173050000: Freeing unallocated pointer
上述日志发生crash的条件:
循环次数为:8773
所需序列化数组数据长度为:5264
2.5 解决方案
通过上述调研与实践,问题发生在特定设备及特定条件下:
特定设备:iOS10.x
特定设备无法避免。
特定条件:序列化数据量大
通过全民K歌持久层代码走读,发现发生增删时,都要重新序列化全量的数据。由于上述复现条件的【控制数据量对比】
通过优化增量数据的序列化方案可以使得序列化数据量较小,从而规避此crash。
方案一:CoreData替代NSKeyedArchiver持久化方案
方案二:优化NSKeyedArchiver的数据量,每次只持久化增量数据,持久化不再通过NSMutableDictionary setObject之后再持久化,而是直接将需要持久化的对象加Key进行持久化:
- (void)addOperation{
int count = 0;
while (count < 10000) {
[self.lock lock];
NSString *key = [NSString stringWithFormat:@"%d",count];
KSModel *model = [[KSModel alloc] init];
[self addopera:[self golbalSaveConfigKeyWithKey:key] from:0 data:model];
[self.lock unlock];
count ++;
NSLog(@"done index:%d datalen = %lu", count, (unsigned long)self.modelDataDict.allKeys.count);
}
}
上述方案每次序列化只有一个元素,经验证在iOS10.X 10000次循环不会发生crash,但此方法牺牲了遍历的方便性,需要在反序列化时通过Key值从不同的存储文件中获取后再进行:
id obj = [NSKeyedUnarchiver unarchiveObjectWithData:data];
补充:也需确保序列化对象属性数据量不宜过大
3、其他
3.1 风险
此外,CRASH方法中存在API废弃风险
1
value = [NSKeyedArchiver archivedDataWithRootObject:obj];'archivedDataWithRootObject:' is deprecated: first deprecated in iOS 12.0 - Use +archivedDataWithRootObject:requiringSecureCoding:error: instead