wlbf.github.io

MIT-6.828 2018 Memory Bug

在学习 mit-6.828-2018 lab3 的时候遇到了一个很奇怪的问题。lab3 分支 merge 之前已经完成的 lab2 分支后会在某一处 painc。在简单来说 lab3 分支原代码基础上增加一些完全正确的代码，会导致 kernel panic。通过 debug 发现问题出在下面这里：

    // Find out how much memory the machine has (npages & npages_basemem).
    i386_detect_memory();

    //////////////////////////////////////////////////////////////////////
    // create initial page directory.
    kern_pgdir = (pde_t *) boot_alloc(PGSIZE);

    // kern_pgdir = 0xf018e000
    memset(kern_pgdir, 0, PGSIZE); // <--- kern_pgdir is set to 0x0 in here.
    // kern_pgdir = 0x0

    //////////////////////////////////////////////////////////////////////
    // Recursively insert PD in itself as a page table, to form
    // a virtual page table at virtual address UVPT.
    // (For now, you don't have understand the greater purpose of the
    // following line.)

    // Permissions: kernel R, user R
    kern_pgdir[PDX(UVPT)] = PADDR(kern_pgdir) | PTE_U | PTE_P

(gdb) p kern_pgdir
$1 = 0xf018e000
(gdb) p &kern_pgdir
$1 = 0xf018e00c

通过 debug 可以发现 memset 将 kern_pgdir 自己给覆盖掉了。那么说明 boot_alloc 分配的地址是有问题的，下面看看 boot_alloc 的代码：

// This simple physical memory allocator is used only while JOS is setting
// up its virtual memory system.  page_alloc() is the real allocator.
//
// If n>0, allocates enough pages of contiguous physical memory to hold 'n'
// bytes.  Doesn't initialize the memory.  Returns a kernel virtual address.
//
// If n==0, returns the address of the next free page without allocating
// anything.
//
// If we're out of memory, boot_alloc should panic.
// This function may ONLY be used during initialization,
// before the page_free_list list has been set up.
static void *
boot_alloc(uint32_t n)
{
    static char *nextfree;  // virtual address of next byte of free memory
    char *result;

    // Initialize nextfree if this is the first time.
    // 'end' is a magic symbol automatically generated by the linker,
    // which points to the end of the kernel's bss segment:
    // the first virtual address that the linker did *not* assign
    // to any kernel code or global variables.
    if (!nextfree) {
        extern char end[];
        nextfree = ROUNDUP((char *) end, PGSIZE);
    }

    // Allocate a chunk large enough to hold 'n' bytes, then update
    // nextfree.  Make sure nextfree is kept aligned
    // to a multiple of PGSIZE.
    result = nextfree;
    nextfree += ROUNDUP(n, PGSIZE);

    return (void *)result;
}

内核刚开始运行时会使用 boot_alloc 来分配内存，boot_alloc 返回的内存地址应当是空闲的。而事实上第一次调用 boot_alloc 返回的地址 0xf018e000 竟然小于静态变量 kern_pgdir 的地址 0xf018e00c 。这才导致后续调用 memset 时 kern_pgdir 自身被覆盖。观察 boot_alloc 代码会发现，第一次分配内存时，空闲内存地址来自于 end 这个符号，像注释中说的那样， end 的地址应该位于整个内核的末尾，即 bss section 的末尾，用来指示内核之后的空闲内存地址。但事实上却不是这样，可以通过 objdump kernel 来确认我们 debug 时发现的问题。

@: objdump -D obj/kern/kernel
...
Disassembly of section .bss:
...
...

f018e000 <end>:
f018e000:	00 00                	add    %al,(%eax)
	...

f018e004 <panicstr>:
f018e004:	00 00                	add    %al,(%eax)
	...

f018e008 <npages>:
f018e008:	00 00                	add    %al,(%eax)
	...

f018e00c <kern_pgdir>:
f018e00c:	00 00                	add    %al,(%eax)
	...

f018e010 <pages>:
f018e010:	00 00                	add    %al,(%eax)
	...

...

link script:

...
	.bss : {
		PROVIDE(edata = .);
		*(.bss)
		PROVIDE(end = .);
		BYTE(0)
	}
...

很明显 end 在 bss section 中的位置是有问题的，而 end 实际上是在 link script 中提供的。也就是说链接过程中 end 没有出现在预计的位置。时间关系，我没有探究更深入的原因。令我好奇的是我没有改动过 link script, 编译工具链也与之前一致，为什么这个问题单单在 lab3 merge 完代码后才会出现。通过 objdump 我观察了 lab1 lab2 以及 origin lab3 分支编译后的 bss section 信息，发现所有的 bss section 中 end 都不在末尾位置，现象和出问题的时候一致，那么为什么之前就没问题？

@: objdump -D obj/kern/kernel
...
Disassembly of section .bss:
...
...
f017cfe0 <end>:
f017cfe0:	00 00                	add    %al,(%eax)
	...

f017cfe4 <panicstr>:
f017cfe4:	00 00                	add    %al,(%eax)
	...

f017cfe8 <npages>:
f017cfe8:	00 00                	add    %al,(%eax)
	...

f017cfec <kern_pgdir>:
f017cfec:	00 00                	add    %al,(%eax)
	...

f017cff0 <pages>:
f017cff0:	00 00                	add    %al,(%eax)
	...
...

细心观察 boot_alloc 的代码就会发现，为了保持内存对齐，end 实际上是被 round up 到了 page size 的整数倍大小，之前运行正常实际上是因为 end 被 round up 到了更高的内存地址上，以上面情况为例，boot_alloc 实际上会从 ROUNDUP((char *) end) 即 0xf017d000 开始分配内存，避开了 end 之后的 bss section 中变量的内存地址。而我会遇到问题是因为改动了代码之后，text section 大小发生了改变，bss section 中变量的地址也随之变化，end 的地址非常巧合地被分配为 0xf018e000 ,这个地址正好是 page size 的整数倍，round up 之后并不会变为更高的地址，导致分配内存时覆盖了后续 bss section 中的变量。到这里算是把问题大概搞清楚了。我还发现了一个官方的 commit：

commit a56269d4beefc7d0b3672180aa46c654cfb63af4 (mitedu/lab1)
Author: Jonathan Behrens <fintelia@gmail.com>
Date:   Tue Sep 4 14:10:42 2018 -0400

    Tweak kernel.ld linker script so edata and end are set correctly
    
    This change should hopefully resolve issues when compiling with newer versions
    of GCC.

看来如何在链接过程中，提供 end 这个 magic symbol 可能是一件比较麻烦的事。我遇到问题的的环境是：

Ubuntu-18.04
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
GNU ld (GNU Binutils for Ubuntu) 2.30

最后我的解决方案是装了个 gcc-4.8，编译之后，bss 中变量地址和 gcc-7.4 编译出来的不同，只要 end 地址不是 page size 的整数倍，bss section 没有那么大， end 经过 round up 之后落到了更高的空闲内存上，就又可以愉快地做作业了。（假装问题已经解决了）