iOS 10.x 内存Crash 排查 纪要

This is a subtitle

Posted by PaysonChen on September 13, 2021

更新日期:2021-09-07

[TOC]

1、基本情况

​ 最终堆栈位置:

1
2
3
XXX -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:)
XXX -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:)
XXX -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:)

2、分析

2.1 堆栈分析

跟踪多条Crash堆栈

XXX堆栈 1:

1
2
3
4
5
6
7
8
9
10
9 XXX 0x0000000103bf4718 -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
11 XXX 0x0000000100721688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
11 XXX 0x0000000100721688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
21 XXX 0x00000001004434cc -[KSKVStorage writeObjectToFile:forKey:atomic:completion:] (KSKVStorage.m:0)
22 XXX 0x000000010326f550 -[KSLocalConfigManager internalRemoveLocalConfig:isGolbal:] (KSLocalConfigManager.m:273)
23 XXX 0x0000000103270330 +[KSLocalConfigManager removeLocalConfig:isGolbal:] (KSLocalConfigManager.m:0)
24 XXX 0x000000010326fa64 +[KSLocalConfigManager dailyCommonValueConfig:isGolbal:] (KSLocalConfigManager.m:0)
25 XXX 0x0000000100b55a30 -[KSKTVRoomViewController(InteractionGuide) checkIfShowGameOngoingView:] (KSKTVRoomViewController+InteractionGuide.m:0)
26 XXX 0x0000000100b54d38 -[KSKTVRoomViewController(InteractionGuide) showGameOngoingViewWhenEnterKTV:] (KSKTVRoomViewController+InteractionGuide.m:192)
37 XXX 0x00000001014ec7d4 main (main.m:37)

XXX堆栈2:

1
2
3
4
5
6
7
8
9
10
9 XXX 0x0000000103b94718 -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
11 XXX 0x00000001006c1688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
11 XXX 0x00000001006c1688 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
21 XXX 0x00000001003e34cc -[KSKVStorage writeObjectToFile:forKey:atomic:completion:] (KSKVStorage.m:0)
22 XXX 0x000000010320ee7c -[KSLocalConfigManager internalSaveLocalConfig:value:isGolbal:] (KSLocalConfigManager.m:220)
23 XXX 0x000000010320fb10 +[KSLocalConfigManager saveLocalConfig:commonValue:isGolbal:] (KSLocalConfigManager.m:0)
24 XXX 0x0000000102152b94 -[KSSendGiftContainerView tabView:didChangeSelectIndex:] (KSSendGiftContainerView.m:1952)
25 XXX 0x000000010056235c -[KSTabView setSelectIndex:] (KSTabView.m:723)
28 XXX 0x00000001009ccbc4 -[KSBaseButton sendAction:to:forEvent:] (KSBaseButton.m:0)
46 XXX 0x000000010148c7d4 main (main.m:37)

2.2 代码分析

所有类型的堆栈最终都指向KSKVStorage writeObjectToFile,

这是一个存储控制类,writeObjectToFile是将对象归档后写到磁盘

最终堆栈指向

0 libsystem_kernel.dylib 0x000000018aa97014 ___pthread_kill + 8
1 libsystem_c.dylib 0x000000018aa0b400 _abort + 140
2 libsystem_malloc.dylib 0x000000018aadba5c _nano_free_definite_size + 92
3 libsystem_malloc.dylib 0x000000018aadd028 _nano_realloc + 648
4 libsystem_malloc.dylib 0x000000018aacf240 _malloc_zone_realloc + 180
5 Foundation 0x000000018c5aa3e4 -[NSOperation isAsynchronous] + 180
6 Foundation 0x000000018c53c2a0 __decodeInt64 + 452

可以认为是NSKeyedUnarchiver 与 NSKeyedArchiver 触发的对象 KSKitBaseModel 及 KSBaseModel 的 encodeWithCoder 方法,

触发encodeWithCoder:方法为自动序列化的标准做法如下:

- (void)encodeWithCoder:(NSCoder *)encoder {
	Class cls = [self class];
	while (cls != [NSObject class]) {
		unsigned int numberOfIvars = 0;
		Ivar* ivars = class_copyIvarList(cls, &numberOfIvars);
		for(const Ivar* p = ivars; p < ivars+numberOfIvars; p++)
		{
			Ivar const ivar = *p;
			const char *type = ivar_getTypeEncoding(ivar);
			NSString *key = [NSString stringWithUTF8String:ivar_getName(ivar)];
            if (key == nil){
                continue;
            }
            if ([key length] == 0){
                continue;
            }
            
			id value = [self valueForKey:key];
			if (value) {
				switch (type[0]) {
					case _C_STRUCT_B: {
						NSUInteger ivarSize = 0;
						NSUInteger ivarAlignment = 0;
						NSGetSizeAndAlignment(type, &ivarSize, &ivarAlignment);
						NSData *data = [NSData dataWithBytes:(const char *)((__bridge void *)self) + ivar_getOffset(ivar)
													  length:ivarSize];
						[encoder encodeObject:data forKey:key];
					}
						break;
					default:
						[encoder encodeObject:value
									   forKey:key];
						break;
				}
			}
		}
		if (ivars) {
			free(ivars);
		}
		
		cls = class_getSuperclass(cls);
	}
}

上述方法通过 runtime机制 遍历当前类及其父类的属性,可以推断,在某个类或者某个继承关系中存在如下调用顺序时,触发了crash

1
2
3
[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)
[KSBaseModel encodeWithCoder:] (KSBaseModel.m:50)

代码行数可能不准。

根据断点调试,可能发生在 某个对象的 的归档中:

2.3 问题定位

2.3.1 堆栈定位

​ 再看堆栈,实现了自动序列化的KSBaseModel 及 KSKitBaseModel 作为全民K歌项目持久层对象的基类在自动序列化过程中发生crash:

0 libsystem_kernel.dylib 0x000000018aa97014 ___pthread_kill + 8
1 libsystem_c.dylib 0x000000018aa0b400 _abort + 140
2 libsystem_malloc.dylib 0x000000018aadba5c _nano_free_definite_size + 92
3 libsystem_malloc.dylib 0x000000018aadd028 _nano_realloc + 648
4 libsystem_malloc.dylib 0x000000018aacf240 _malloc_zone_realloc + 180
5 Foundation 0x000000018c5aa3e4 -[NSOperation isAsynchronous] + 180
6 Foundation 0x000000018c53c2a0 __decodeInt64 + 452
7 XXX 0x0000000103cab808 -[KSKitBaseModel encodeWithCoder:] (KSKitBaseModel.m:42)
8 Foundation 0x000000018c53c49c __decodeIntXXX+ 416
9 XXX 0x00000001037a0b94 -[KSBaseModel encodeWithCoder:] (KSBaseModel.m:52)
……

​ 在分析了自动序列化代码未找到问题的情况下,看引发crash更直接的libsystem_malloc.dylib 通过对_nano_free_definite_size产生crash的调研发现bugly及微信团队在之前遇到过类似的情况:【腾讯Bugly干货分享】聊聊苹果的Bug - iOS 10 nano_free Crash

2.3.2 问题定位

​ 通过分析,发生crash机型都是iOS10的设备, 问题可以初步断定存在于iOS10的设备上i一类堆栈为nano_free字样的Crash

2.4 复现

2.4.1 构建复现条件

​ 将全民K歌持久层代码简化后在新建工程中搭建复现环境。 模拟高频,高数据量持久化核心代码如下:

- (void)addOperation{
    int count = 0;
    while (count < 10000) {
        [self.lock lock];
        //通过两台设备(iOS10 iPhone7P | iOS14 iPhone12)控制变量法模拟测试结论如下:
        //iOS10设备的问题:
        //通过修改 count % particle 中的 particle的值可以控制需要序列化的数组对象modelData的数据量,
        //实践证明,当 particle 值越大, modelData的数据量 越大,越容易发生crash
        //解决方案,控制每次序列化 数据量可以规避此crash
        //非iOS10设备未出现此问题
        NSString *key = [NSString stringWithFormat:@"%d",count];
        int particle = 5;
        if (count % particle == 1) {
            if (self.modelDataDict.allKeys.count > 0) {
                [self.modelDataDict removeObjectForKey:[self.modelDataDict.allKeys objectAtIndex:0]];
            }
            
        } else {
            KSModel *model = [[KSModel alloc] init];
            [self.modelDataDict setObject:model forKey:key];

        }
        [self.lock unlock];
        count ++;

        [self addopera:[self golbalSaveConfigKey] from:0 data:[self.modelDataDict mutableCopy]];

        NSLog(@"done index:%d datalen = %lu", count, (unsigned long)self.modelDataDict.allKeys.count);
    }
    
}

- (void)addopera:(NSString *) key from:(int) type data:(id<NSCoding, NSObject>)obj{
    
    __block NSData *value = nil;
    [_diskOperationQueue addOperationWithBlock:^{
        NSMutableString *str = [NSMutableString new];
        @try {
            value = [NSKeyedArchiver archivedDataWithRootObject:obj];
        }
        @catch (NSException *exception) {
            NSLog(@"exception = %@ ||  from = %d " ,exception ,type);
        }
        
        NSString *filePath = [ViewController migrateFileNameForKey:[NSString stringWithFormat:@"%@_%@_a", str, key]];
        NSError *error = nil;
        [value writeToFile:filePath options:NSDataWritingAtomic error:&error];
    }];
}


2.4.2 控制设备变量对比

​ 通过控制设备以不同iOS版本号的设备进行测试,在iOS10 iPhone7P上跑上述代码 发生了Crash,Crash堆栈位置吻合:

在iOS14 iPhone12 未出现此crash

2.4.3 控制数据量对比

​ 通过改变上述 particle 的值为 2 时,iOS10的设备不会发生crash,可以证明在所需序列化数据量较小时(至少说明在长度小于等于2的情况)不会发生crash,运行日志表明循环10000次序列化的情况:

2021-09-07 21:22:25.626353 PT-2[1187:35082] done index:9995 datalen = 2
2021-09-07 21:22:25.626385 PT-2[1187:35082] done index:9996 datalen = 1
2021-09-07 21:22:25.626456 PT-2[1187:35082] done index:9997 datalen = 2
2021-09-07 21:22:25.626520 PT-2[1187:35082] done index:9998 datalen = 1
2021-09-07 21:22:25.626584 PT-2[1187:35082] done index:9999 datalen = 2
2021-09-07 21:22:25.626610 PT-2[1187:35082] done index:10000 datalen = 1

​ 而当particle超过一定数值及循环次数达到一定次数(也就是所需序列化的数组长度足够长时)就会发生crash,通过日志查看,当前复现条件数组长度达到5000左右时发生crash,与在全民K歌项目中的环境不同,可能这个长度会有差异:

2021-09-07 21:14:15.166295 PT-2[1087:32256] done index:8773 datalen = 5264
PT-2(1087,0x16dfe3000) malloc: *** error for object 0x173050000: Freeing unallocated pointer
*** set a breakpoint in malloc_error_break to debug
*** error for object 0x173050000: Freeing unallocated pointer

上述日志发生crash的条件:

循环次数为:8773

所需序列化数组数据长度为:5264

2.5 解决方案

​ 通过上述调研与实践,问题发生在特定设备及特定条件下:

特定设备:iOS10.x

​ 特定设备无法避免。

特定条件:序列化数据量大

​ 通过全民K歌持久层代码走读,发现发生增删时,都要重新序列化全量的数据。由于上述复现条件的【控制数据量对比】

通过优化增量数据的序列化方案可以使得序列化数据量较小,从而规避此crash。

方案一:CoreData替代NSKeyedArchiver持久化方案

方案二:优化NSKeyedArchiver的数据量,每次只持久化增量数据,持久化不再通过NSMutableDictionary setObject之后再持久化,而是直接将需要持久化的对象加Key进行持久化:

- (void)addOperation{
    int count = 0;
    while (count < 10000) {
        [self.lock lock];
        NSString *key = [NSString stringWithFormat:@"%d",count];
				KSModel *model = [[KSModel alloc] init];
        [self addopera:[self golbalSaveConfigKeyWithKey:key] from:0 data:model];
        [self.lock unlock];
        count ++;
        NSLog(@"done index:%d datalen = %lu", count, (unsigned long)self.modelDataDict.allKeys.count);
    }
}


上述方案每次序列化只有一个元素,经验证在iOS10.X 10000次循环不会发生crash,但此方法牺牲了遍历的方便性,需要在反序列化时通过Key值从不同的存储文件中获取后再进行:

        id obj = [NSKeyedUnarchiver unarchiveObjectWithData:data];

补充:也需确保序列化对象属性数据量不宜过大

3、其他

3.1 风险

此外,CRASH方法中存在API废弃风险

1
value = [NSKeyedArchiver archivedDataWithRootObject:obj];'archivedDataWithRootObject:' is deprecated: first deprecated in iOS 12.0 - Use +archivedDataWithRootObject:requiringSecureCoding:error: instead