阅读的代码为 qemu v2.5.0 版本,qemu 的发展和 ceph 一样快,翻了下最新的 master 分支的代码,基本上已经面目全非了,不过基本思路并没有太大变化。
qemu 事件框架(main loop, iothread)基于 glib 的 event loop 机制:
1 2 3 4 5 6 7 aio_context_new g_source_new event_notifier_init aio_set_event_notifier aio_set_fd_handler g_source_add_poll ctx->notify_dummy_bh = aio_bh_new (ctx, notify_dummy_bh, NULL )
qemu 上电时读取用户指定的 -drive
选项创建硬盘驱动器,以及包括 cdrom 在内的默认驱动器(PS: qemu 源代码中最主要的文件是 vl.c,之所以叫 vl 是因为 qemu 最早的名字是 virtual linux,且其命令行程序就是 vl,而不是现今的 qemu):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 vl.c/main qemu_opts_foreach (qemu_find_opts("drive" ), drive_init_func, ...) drive_new qemu_opt_get (all_opts, "cache" ) bdrv_parse_cache_flags blockdev_init extract_common_blockdev_options blk_new_open blk_new_with_bs blk_new bdrv_new_root bdrv_new bdrv_open bdrv_open_inherit bdrv_open_child bdrv_open_inherit bdrv_open_common bs->drv = drv drv->bdrv_file_open bdrv_attach_child bdrv_open_common bs->drv = drv bs->file = file default_drive (cdrom) drive_new default_drive (floppy) default_drive (sdcard)
程序主循环:
1 2 3 4 5 6 7 8 9 10 vl.c/main main_loop do { main_loop_wait (false ) os_host_main_loop_wait glib_pollfds_fill qemu_poll_ns glib_pollfds_poll qemu_clock_run_all_timers } while (!main_loop_should_exit ())
如果在 qemu 启动时增加了 iothread 的选项,在遍历 -drive
选项前会遍历 -object
选项,并创建对应的 iothread:
1 2 3 4 5 6 7 8 vl.c/main qemu_opts_foreach (qemu_find_opts("object" ), object_create, ...) object_add ("iothread" , ...) object_new ("iothread" ) user_creatable_complete iothread_complete iothread->ctx = aio_context_new (...) qemu_thread_create
此时所有与硬盘 io 相关的操作都在 iothread 线程函数 iothread_run
中进行:
1 2 3 4 iothread.c/iothread_run while (!iothread->stopping) { aio_poll (iothread->ctx, ...) }
硬盘驱动注册:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rbd.c/bdrv_rbd_init bdrv_register (&bdrv_rbd) bdrv_setup_io_funcs if (!bdrv->bdrv_co_readv) { bdrv->bdrv_co_readv = bdrv_co_readv_em; bdrv->bdrv_co_writev = bdrv_co_writev_em; if (!bdrv->bdrv_aio_readv) { bdrv->bdrv_aio_readv = bdrv_aio_readv_em; bdrv->bdrv_aio_writev = bdrv_aio_writev_em; } }
io 调用链:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 blk_aio_writev bdrv_aio_writev bdrv_co_aio_rw_vector co = qemu_coroutine_create (bdrv_co_do_rw) qemu_coroutine_enter (co, acb) bdrv_co_maybe_schedule_bh (acb) bdrv_co_do_rw bdrv_co_do_writev bdrv_co_do_pwritev bdrv_aligned_pwritev drv->bdrv_co_readv bdrv_co_writev(bs->file->bs, ...) bdrv_co_do_writev bdrv_co_do_pwritev bdrv_aligned_pwritev drv->bdrv_co_readv bdrv_co_io_em qemu_rbd_aio_writev
其中 read 的流程和 write 一样,只是把 write 替换成 read 而已。
rbd IO 回调处理:
1 2 3 rbd_finish_aiocb qemu_bh_schedule aio_notify
参考资料 The Main Event Loop
https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html
Understanding QEMU devices
https://www.qemu.org/2018/02/09/understanding-qemu-devices/
Improving the QEMU Event Loop
http://events17.linuxfoundation.org/sites/events/files/slides/Improving%20the%20QEMU%20Event%20Loop%20-%203.pdf
Towards Multi-threaded Device Emulation in QEMU
https://www.linux-kvm.org/images/a/a7/02x04-MultithreadedDevices.pdf
multiple-iothreads.txt
https://github.com/qemu/qemu/blob/master/docs/devel/multiple-iothreads.txt
Live Block Device Operations in QEMU
https://archive.fosdem.org/2018/schedule/event/vai_qemu_live_dev_operations/attachments/slides/2391/export/events/attachments/vai_qemu_live_dev_operations/slides/2391/Live_Block_Device_Operations_in_QEMU_FOSDEM2018.pdf