On Writing a Kernel

Friday, January 11, 2008

Its 2008 and my project goes on

It was May 2007 that I made my last post. That's a long time. The thing that made me remember this blog is I am working on filesystems and just had an idea. Currently in my project I am in the verge of finishing off with paging and memory management. I had some good progress in the past few months, but I still am thinking of what to do about the filesystem. I have a good idea of how to write the VFS. I didn't write it yet because I want to make my mind first, as to what I am going to do about file handling in general. Filesystems are very complex. There's no way I could do the OS and get my hands dirty with another filesystem. An modern fs like reiserfs is something like 50k - 80k lines of code?

Some good ideas though that come to my mind:

1) In plan 9, files are used for stream management. Plan 9 streams (which had a modular design but wasn't very successful) are used for device i/o, and networking.
ioctl calls and sockets are replaced by 2 files with suffix 'ctl' and 'data'. Now in ReiserFS4, it is suggested that if a lot of small files can be efficiently stored on the block device, one could implement file attributes as files, which would be accessed by a special path just as '..' and '.', such as '....' but using the conventional read/write calls. This is not a bad idea, and can also make it possible to implement Plan 9 'ctl' and 'data' files as file attributes for that device (e.g. an ethernet device, or a tcp connection).

2) Memory is a stream of bytes, and a block device is a stream of bytes. Logically they are the same. Now to manage access to memory efficiently, we use virtual memory. The physical memory pages are managed in the background for us, in the most efficient way possible by maximising sharing, but yet we partition a block device and access it by its real address. Furthermore this dividing of block devices are so opaque that they are even escalated to the user of the system by tools like fdisk. What I think would be really useful is that the filesystem itself hides from the user the partitioning or managing of a set of disks and present the user with just the simplicity of raw space. In that respect the idea of ZFS sounds really good. One downside however, is that its IO scheduling etc. is based on disks, since its more of an enteerprise filesystem. A more interesting design would be to handle both solid state flash disks and disk drives in a transparent way.

Anyways what I am thinking is that the (1) needs changes in existing VFS desings, since I haven't even written it, I might leave support for a special attribute accessor path component for future experimentation. The other idea I have is that I might maybe design a filesystem that has no journalling protection or good performance, but something that is dead simple and flexible. Something that compresses data for reducing I/O, that duplicates data for simple recovery and that allows flexibility of partitions. Maybe I might get away with creating something simple and useful instead of putting all that effort to port an existing huge filesystem. Plus I would have learned something ;-)

Saturday, May 12, 2007

stack corruption

I have added proper linking with the mock-up C library, and the L4 library. I can also pass up to 7 arguments over the stack into kernel, so it's not fast no-copy but since it works it is ok for now. I am now calling functions from the L4 library rather than hand-made system call jumps. This revealed various problems. It turns out that I must save the userspace stack pointer even if the context switch occurs in kernel space. For example I wasnt saving it for kernel calls, and a preempting process would overwrite the user stack pointer. I fixed that, than voluntary context switching showed up with the same problem. I should really save it upon context switches, rather than at other points (like the syscall entry). The assembler for preemption/irq/context switching is fairly clean and compact so far, I will be cautious not to add much complexity.

The other thing is for example this problem has revealed itself by static reasoning rather than examining the code at run-time. For example if I had a trace tool, that might have also helped, but for this kind of bug, it seems you can only think about the logic in your code and decide what's wrong.

Another thing is it is a quite nice thing to do this project. If I didn't do it how could I learn about virtual memory, caching, managing address spaces etc. there was no way to go any further. Learning by practice is quite good. I think my git repository is quite neat as well. Normally you get educational os'es (like minix) as a whole, and even though they are simple, one wouldnt easily know where to start to write another one. In my case though, the git commits are really good examples of how to develop it step by step. So I think in the future I may at least publish it as an educational practical approach to getting started with an os.

Friday, April 13, 2007

quick note

Just a quick note, I've found an open source usb stack, that mainly supports control transfers. The good thing is it targets embedded devices + it is well written in kernel style. I have also found an embedded filesystem library called efsl or something. It is also good, because it supports FAT, and it has the basic open/close/read/write implemented, even though they are dummy library calls it is quite useful for now + it only needs read_sector()/write_sector() calls at the low level. these two might be quite useful. The other thing is nano-x. This is an embedded gui library that only needs very basic framebuffer operations like DrawVertical() DrawHoriz() etc. implemented. After I enhance task support and system calls, if I can get all three put together, that could be a great demo.

Wednesday, April 11, 2007

A turning point

I thought I had made a few more posts since my last. Essentially, I have both preemptive and voluntary scheduling implemented (thread_switch system call). Irqs are reentrant up to the limit of the 1 page svc stack, and can preempt a process that can either be in the kernel or the user context. So kernel preemption is there from the start.

I've got this far, however working late night & weekends does not really help a lot. I dont get tired as such, but rather stressed because in the end when you look back, dealing continuously with source code most of your time for even a few weeks long, you feel a little weird, especially for a guy at my age who is supposed to just get social or do sports etc. Plus, it takes ages to push the project forward, essentially you do a week's full-time job in 3 to 4 weeks.

So the conclusion is, I think I should either stop or continue doing this full-time. One thing for sure is, stopping ain't an option. We'll see what happens in the next few weeks.

Friday, February 23, 2007

Scheduler + irq handling

The extremely basic scheduler + irq handling (not so basic, in fact very clean in high-level sources, and somewhat complicated in the assembler) features added and it compiles now. How do I feel? I feel like to vomit source code at the moment. Especially assembler. Now need to debug all this, but before that I'll take a break because it's my 25th birthday on Saturday..

Sunday, February 18, 2007

context switch

Still working on context switching code. It's easy to design a simple system, but it is harder than I thought to design it properly. I allow irqs to preempt irqs, kernel, and userspace, which means each of these contexts must be handled properly. I have resolved almost all of it. Now I need to implement it. The only part that I'm not yet quite sure is whether I really need 2 contexts per process, one for kernel and one for userspace. Because system calls preempt.. I just realise now as I write this that I don't need two contexts. System calls push user context onto their stack, and don't write to the ktcb. When they are pre-empted, then the current context could be saved to the tcb, be it in the system call or userspace. So the rule-of-thumb is, if interrupted, save previous context on the current stack, if context switch will occur, pop it from the stack, save to ktcb context area, and load the next context. This seems to apply for all interruptions, i.e. irq->irq, usr->svc svc->irq usr->irq...
This is good fun.

Thursday, February 01, 2007

Stuff implemented

KIP is implemented. Timer and interrupt controller support is added. 11 system calls can be called from userspace. (They are actually not implemented yet.) The nice thing is I can use Insight + qemu for very rapid testing. Upon running of a script Insight is started, gdb connects to qemu, loader runs, and stops right after the mmu is turned on and before jumping from physical to virtual. Then it loads the kernel image symbols, and stops. So in the convenience of running a simple script you get to step through the kernel code with Insight. Something possible theoretically but never practically with a hardware debugger. The reason is you have to configure the hw debugger, connect to target etc. etc. and debugger prompts you with its endless and needless questions at each step and it takes more than half a minute to do this. Anyway I've done other nice things but this is what I did last night so perhaps slightly influenced by it ;-). I've been doing ok so far, but overall there's just so much more to write. The next thing I'll do is finally implement the scheduling, and some of the easier system calls. (Well all seem straightforward as long as I get the register glue logic correct) Meanwhile I am looking for an embedded gui implementation. I am seriously wondering why there isn't a widely adopted replacement for X in a smaller scope. I've found something called libggl (or similar name) that looks promising. The code looks good. But not sure if it is worth spending hours and hours on it. But before I do that I must get a lot of other things done first anyway.