virtio-scsi + qemu-user-target + SCSI passthrough + Ceph RBD

虚机 SCSI 硬盘

虚机中的 SCSI 硬盘模拟有多种实现:

  1. virtio-blk[1][2]
  2. virtio-scsi[2][3]
  3. PCI passthrough + VFIO[4][5]

其中 virtio-blk 已被 virtio-scsi 替代;PCI passthrough 的优势是高性能,但也存在一些限制,可用于某些特定应用虚机(如 HPC)访问 FC SAN 的场景;下文主要基于 virtio-scsi 进行讨论。

virtio-scsi

virtio-scsi 的后端存储访问主要有两种配置:

  1. virtio-scsi + qemu-user-target[3]
  2. virtio-scsi + LIO-vhost[3][6][7]

LIO-vhost 的实现主要是将虚机 IO 访问交由 LIO 的 vHost 前端 fabric 驱动进行处理,性能是其最主要的优势[8],但也存在较多限制,如当前仅支持 Linux 虚机[6];下文主要基于 virtio-scsi + qemu-user-target 进行讨论。

virtio-scsi + qemu-user-target

qemu-user-target 实际上就是通过 SCSI 总线连接在 virtio-scsi HBA 上的一个 SCSI target,它支持各种存储后端[9],同时针对不同的后端或参数配置可以进行 SCSI 命令透传或将 SCSI 命令转换成后端 IO 请求发送给存储后端进行处理。

虚机中的 virtio_scsi HBA 驱动 (linux/drivers/scsi/virtio_scsi.c) 将 SCSI 命令入 VirtQueue 队列 (virtscsi_queuecommand),然后由 qemu 中模拟的 virtio-scsi 设备的 IO 线程将 SCSI 命令出队 (qemu/hw/scsi/virtio-scsi.c/virtio_scsi_pop_req) 并处理,最终该 SCSI 命令透传与否取决于后端类型以及 qemu 的配置选项。显然如果存储后端不支持 SCSI 命令,则透传也无用,所有的 SG_IO 都会失败,所以 libvirt 只允许少数几种类型的存储后端配置成 LUN 访问模式。

以 iSCSI 后端,即 qemu-user-target + libiscsi[10],为例:

  1. 如果将 SCSI 硬盘配置成普通硬盘模式 (<disk type='network' device='disk'>),则 SCSI 读写命令会转成 libiscsi 的调用 (scsi_disk_dma_reqops, bdrv_iscsi),REPORT_LUNS, INQUIRY 等非读写命令则会由 qemu target 填充实际配置参数或模拟数据返回 (scsi_disk_emulate_reqops, reqops_target_command)
  2. 如果将 SCSI 硬盘配置成 LUN 模式,即 <disk type='network' device='lun'>,则除 REPORT_LUNS, REQUEST_SENSE 等少数几个SCSI 命令由 qemu target 填充实际配置的参数或模拟数据返回外,其它 SCSI 命令都会通过 libiscsi 进行透传 (scsi_generic_req_ops, iscsi_aio_ioctl)

Ceph RBD + SCSI

Ceph 不支持传统的 FCP / iSCSI 协议访问,虚机要通过 SCSI 协议访问 Ceph 的块设备,即 RBD,存在如下一些选择:

  1. Ceph 增加 FCP / iSCSI 网关[11],可以参考 RGW 的原理[12]
  2. hypervisor 层 SCSI target 进行 SCSI 协议转换[3]

增加 FCP / iSCSI 协议原生支持的方案社区曾有过相关计划[13][14],但已经停滞,相关的理论和当前的独立网关实现类似;最后一个方案参考前文可知实际上就是现在 virtio-scsi + qemu-user-target 的实现。

virtio-scsi + qemu-user-target + SCSI passthrough + Ceph RBD

根据前文的分析可知,当前的 virtio-scsi + qemu-user-target + Ceph RBD 方案并不支持 SCSI 命令透传,所有虚机 SCSI HBA 驱动发送过来的 SCSI 命令会被 qemu-user-target 转换成 Ceph RBD 的读写请求发送给 Ceph 后端或在 target 层直接模拟返回,虚机的 SCSI 命令不可能实现透传,因此在 libvirt 的虚机 xml 定义中也不可能将 RBD 后端的模拟的 SCSI 硬盘设置成 LUN 访问模式。

如果只是要支持部分 SCSI 命令的非默认(注意不是 passthrough)处理,可以通过修改 qemu 中的 target 代码 (主要是 scsi_disk_reqops_dispatch 分发表) 来实现。VPD page code 0x83[15] 的支持,可以通过修改 qemu/hw/scsi/scsi-disk.c/scsi_disk_emulate_inquiry 函数的实现即可,实际上社区在这方面也一直有相关的工作[16][17][18],理论上要支持的话应该不困难。至于永久预留 (PERSISTENT RESERVE, PR)[19][20],当前 qemu target 在针对非 LUN 访问模式的情况下,并没有进行相关处理,在虚机中对 PR 信息的查询都会返回 Additional sense: Invalid command operation code 错误,因此理论上如果要支持 PR 的话可以在 qemu/hw/scsi/scsi-disk.c/scsi_disk_emulate_command 函数中增加特殊处理,librbd 可以提供操作 Ceph RBD 块设备元数据的相关接口,具体到 PR 的实现可以参考 IET(iSCSI Enterprise Target)[21]。

如果需要支持真正的 SCSI 命令透传,只能通过搭建 FCP / iSCSI 网关的方式。

参考文献

[1] VFIO: A user’s perspective

http://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf

[2] The next-generation storage interface for the Red Hat Enterprise Linux Kernel Virtual Machine: virtio-scsi

https://www.redhat.com/en/resources/rhel-next-gen-storage-for-rhel-kernel-vm

[3] Virtio SCSI - An alternative virtualized storage stack for KVM

http://www.linux-kvm.org/images/f/f5/2011-forum-virtio-scsi.pdf

[4] Linux virtualization and PCI passthrough

http://www.ibm.com/developerworks/library/l-pci-passthrough/

[5] VFIO - “Virtual Function I/O”

https://www.kernel.org/doc/Documentation/vfio.txt

[6] vHost

http://linux-iscsi.org/wiki/VHost

[7] QEMU Internals: vhost architecture

http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html

[8] KVM I/O performance and end-to-end reliability

http://www.linux-kvm.org/images/f/f6/01x07a-Vhost.pdf

[9] Official QEMU mirror

https://github.com/qemu/qemu/tree/master/block

[10] iscsi client library and utilities

https://github.com/sahlberg/libiscsi

[11] Ceph iSCSI Gateway

https://www.susecon.com/doc/2015/sessions/TUT16512.pdf

[12] Ceph Object Gateway

http://docs.ceph.com/docs/master/radosgw/

[13] Clustered SCSI target using RBD

http://tracker.ceph.com/projects/ceph/wiki/Clustered_SCSI_target_using_RBD

[14] Clustered SCSI target using RBD Status

http://tracker.ceph.com/projects/ceph/wiki/Clustered_SCSI_target_using_RBD_Status

[15] 7.6 Vital product data parameters

SCSI Primary Commands - 3 (SPC-3) Revision 23

[16] qemu: Use disk wwn in qemu command line

https://www.redhat.com/archives/libvir-list/2012-August/msg01857.html

[17] scsi-disk: Add support for port WWN and index descriptors in VPD page 83h

https://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg03421.html

[18] scsi: push WWN fields up to SCSIDevice

https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg01774.html

[19] 6.11 PERSISTENT RESERVE IN command

SCSI Primary Commands - 3 (SPC-3) Revision 23

[20] 6.12 PERSISTENT RESERVE OUT command

SCSI Primary Commands - 3 (SPC-3) Revision 23

[21] iSCSI Enterprise Target

https://sourceforge.net/p/iscsitarget/code/HEAD/tree/trunk/kernel/persist.c

代码实现

// linux/drivers/scsi/virtio_scsi.c
virtscsi_probe
  scsi_host_alloc
  virtscsi_init
  scsi_add_host
  scsi_scan_host
    do_scsi_scan_host
      scsi_scan_host_selected
        scsi_scan_channel
          for (id = 0; id < shost->max_id; ++id)
            __scsi_scan_target
              scsi_alloc_target
                shost->hostt->target_alloc
                  virtscsi_target_alloc
              // Scan LUN 0, if there is some response, scan further. Ideally, we
              // would not configure LUN 0 until all LUNs are scanned.
              scsi_probe_and_add_lun(lun=0)
                // <host, channel, target, lun>
                scsi_device_lookup_by_target
                  __scsi_device_lookup_by_target
                sdev = scsi_alloc_sdev
                  sdev->request_queue = scsi_alloc_queue
                    q = __scsi_alloc_queue(scsi_request_fn)
                      // prepare a request queue for use with a block device
                      blk_init_queue
                      __scsi_init_queue
                    blk_queue_prep_rq(q, scsi_prep_fn)
                    blk_queue_unprep_rq(q, scsi_unprep_fn)
                    blk_queue_softirq_done(q, scsi_softirq_done)
                    blk_queue_rq_timed_out(q, scsi_times_out)
                    blk_queue_lld_busy(q, scsi_lld_busy)
                // probe a single LUN using a SCSI INQUIRY
                scsi_probe_lun
                scsi_add_lun
                  scsi_attach_vpd
              // Scan using SCSI REPORT LUN results
              if scsi_report_lun_scan:
                // The REPORT LUN did not scan the target, do a sequential scan.
                scsi_sequential_lun_scan
                  // We have already scanned LUN 0, so start at LUN 1.
                  for (lun = 1; lun < max_dev_lun; ++lun)
                    scsi_probe_and_add_lun
              // paired with scsi_alloc_target(): determine if the target has
              // any children at all and if not, nuke it
              scsi_target_reap

// The goal of the function is to prepare a request for I/O, it can be used to build a
// cdb from the request data for instance.
scsi_prep_fn
  scsi_get_cmd_from_req
    scsi_get_command
    req->special = cmd;
    cmd->request = req;
  scsi_setup_cmnd
    case REQ_TYPE_FS:
      scsi_setup_fs_cmnd
        scsi_cmd_to_driver
          // sd.c
          sd_init_command
            if (rq->cmd_flags & REQ_DISCARD)
              return sd_setup_discard_cmnd(cmd);
            else if (rq->cmd_flags & REQ_WRITE_SAME)
              return sd_setup_write_same_cmnd(cmd);
            else if (rq->cmd_flags & REQ_FLUSH)
              return sd_setup_flush_cmnd(cmd);
            else
              return sd_setup_read_write_cmnd(cmd);
    case REQ_TYPE_BLOCK_PC:
      scsi_setup_blk_pc_cmnd
        scsi_init_io

// The function to be called to process requests that have been placed on the queue.
scsi_request_fn
  cmd = req->special;
  cmd->scsi_done = scsi_done;
  scsi_dispatch_cmd(cmd)
    // The queuecommand function is used to queue up a scsi
    // command block to the LLDD.  When the driver finished
    // processing the command the done callback is invoked.
    host->hostt->queuecommand(host, cmd);
      virtscsi_queuecommand_single

virtscsi_queuecommand_single
  virtscsi_queuecommand
    virtscsi_kick_cmd
      // add a virtio_scsi_cmd to a virtqueue
      virtscsi_add_cmd
      virtqueue_kick_prepare
      virtqueue_notify
// qemu/hw/virtio/virtio-pci.c
virtio_pci_set_guest_notifiers
  for (n = 0; n < nvqs; n++)
    virtio_pci_set_guest_notifier
      virtio_queue_set_guest_notifier_fd_handler
        event_notifier_set_handler(virtio_queue_guest_notifier_read)

virtio_queue_guest_notifier_read
  if (event_notifier_test_and_clear)
    virtio_irq
      virtio_notify_vector
        k->notify
// qemu/hw/scsi/virtio-scsi.c
// Virtio SCSI HBA device initialization
virtio_scsi_device_realize
  virtio_scsi_common_realize(virtio_scsi_handle_ctrl,
                             virtio_scsi_handle_event,
                             virtio_scsi_handle_cmd);
    virtio_init
    s->ctrl_vq = virtio_add_queue_aio(vdev, VIRTIO_SCSI_VQ_SIZE, virtio_scsi_handle_ctrl);
    s->event_vq = virtio_add_queue_aio(vdev, VIRTIO_SCSI_VQ_SIZE, virtio_scsi_handle_event);
    for (i = 0; i < s->conf.num_queues; i++) {
        s->cmd_vqs[i] = virtio_add_queue_aio(vdev, VIRTIO_SCSI_VQ_SIZE, virtio_scsi_handle_cmd);
    }
    if (s->conf.iothread)
      virtio_scsi_set_iothread
        s->ctx = iothread_get_aio_context
  scsi_bus_new

virtio_scsi_handle_ctrl
  if (s->ctx)
    virtio_scsi_dataplane_start
  else
    virtio_scsi_handle_ctrl_vq

virtio_scsi_handle_event
  if (s->ctx)
    virtio_scsi_dataplane_start
  else
    virtio_scsi_handle_event_vq

virtio_scsi_handle_cmd
  if (s->ctx) // one iothread per virtio-scsi-pci
    // initialize vq(s), so only called once
    virtio_scsi_dataplane_start
      k->set_guest_notifiers
      virtio_scsi_vring_init(vs->ctrl_vq, virtio_scsi_data_plane_handle_ctrl)
      virtio_scsi_vring_init(vs->event_vq, virtio_scsi_data_plane_handle_event)
      for (i = 0; i < vs->conf.num_queues; i++)
        virtio_scsi_vring_init(vs->cmd_vqs[i], virtio_scsi_data_plane_handle_cmd)
      s->dataplane_started = true;
  else
    virtio_scsi_handle_cmd_vq

// handled in individual iothread for each virtio-scsi-pci device
virtio_scsi_data_plane_handle_cmd
  virtio_scsi_handle_cmd_vq
    while ((req = virtio_scsi_pop_req(s, vq))) {
        virtio_scsi_handle_cmd_req_prepare(s, req)
          virtio_scsi_parse_req
          // may return a random lun dev to represent the target instead of the specified
          // lun dev if the specified lun id does not exist
          virtio_scsi_device_find
          req->sreq = scsi_req_new
            // alloc SCSIRequest or SCSIGenericReq or SCSIDiskReq or SCSIBlockReq
            // UNIT_ATTENTION, REPORT_LUNS, REQUEST_SENSE call scsi_req_parse_cdb directly,
            // only scsi_generic_class_initfn and scsi_block_class_initfn set sc->parse_cdb,
            // so scsi-cd, scsi-hd and scsi-disk call scsi_req_parse_cdb,
            // scsi-generic and scsi-block call sc->parse_cdb
            // 1) for UNIT_ATTENTION, REPORT_LUNS, REQUEST_SENSE or scsi-cd, scsi-hd, scsi-disk
            scsi_req_parse_cdb
            // 2) for scsi-generic and scsi-block
            sc->parse_cdb
              // for 1) scsi-generic
              scsi_generic_parse_cdb
                scsi_bus_parse_cdb
              // for 2) scsi-block
              scsi_block_parse_cdb
                if (scsi_block_is_passthrough)
                  // false for MMC r/w, otherwise true
                  scsi_bus_parse_cdb
                    scsi_req_parse_cdb
                    bus->info->parse_cdb
                      // always set to virtio_scsi_parse_cdb, see virtio_scsi_scsi_info definition
                      virtio_scsi_parse_cdb
                else
                  scsi_req_parse_cdb
                    switch (dev->type) {
                    case TYPE_TAPE:
                        rc = scsi_req_stream_xfer(cmd, dev, buf);
                        break;
                    case TYPE_MEDIUM_CHANGER:
                        rc = scsi_req_medium_changer_xfer(cmd, dev, buf);
                        break;
                    case TYPE_SCANNER:
                        rc = scsi_req_scanner_length(cmd, dev, buf);
                        break;
                    default:
                        rc = scsi_req_xfer(cmd, dev, buf);
                        break;
                    }
                    scsi_cmd_xfer_mode(cmd);
                    cmd->lba = scsi_cmd_lba(cmd);
            // UNIT_ATTENTION, REPORT_LUNS, REQUEST_SENSE are not bound to specific LUN,
            // so call scsi_req_alloc
            // 1) for UNIT_ATTENTION, REPORT_LUNS, REQUEST_SENSE or errors
            scsi_req_alloc(&reqops_unit_attention / &reqops_target_command / &reqops_invalid_opcode / &reqops_invalid_field)
            // 2) for other LUN specific request
            scsi_device_alloc_req
              // scsi-genereic, scsi-cd, scsi-hd, scsi-disk, scsi-block all have their own
              // sc->alloc_req, see scsi_generic_class_initfn, scsi_cd_class_initfn,
              // scsi_hd_class_initfn, scsi_disk_class_initfn, scsi_block_class_initfn
              sc->alloc_req
                // 1) for scsi-generic
                scsi_req_alloc(&scsi_generic_req_ops)
                // 2) for scsi-cd, scsi-hd, scsi-disk
                ops = scsi_disk_reqops_dispatch[command];
                if (!ops)
                  ops = &scsi_disk_emulate_reqops;
                // the alloc size is determined by ops->size
                scsi_req_alloc(ops)
                // 3) for scsi-block
                if (scsi_block_is_passthrough)
                  // false for MMC r/w, otherwise true
                  scsi_req_alloc(&scsi_generic_req_ops)
                else
                  scsi_req_alloc(&scsi_block_dma_reqops)
        QTAILQ_INSERT_TAIL(&reqs, req, next);
      }
      QTAILQ_FOREACH_SAFE(req, &reqs, next, next) {
          virtio_scsi_handle_cmd_req_submit(s, req)
            sreq = req->sreq
            if (scsi_req_enqueue(sreq)) {
                // has data to transfer
                scsi_req_continue(sreq);
            }
      }

scsi_req_enqueue
  // enqueue SCSIRequest or SCSIGenericReq or SCSIDiskReq or SCSIBlockReq to device queue
  scsi_req_enqueue_internal(req)
    QTAILQ_INSERT_TAIL(&req->dev->requests, req, next);
  // prepare SCSIRequest or SCSIGenericReq or SCSIDiskReq or SCSIBlockReq
  req->ops->send_command(req, req->cmd.buf)

// need to transfer data between target and device
scsi_req_continue
  if (req->cmd.mode == SCSI_XFER_TO_DEV) {
      req->ops->write_data(req);
  } else {
      req->ops->read_data(req);
  }
// qemu/hw/scsi/scsi-generic.c
static void scsi_generic_class_initfn(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);

    sc->realize      = scsi_generic_realize;
    sc->alloc_req    = scsi_new_request;
    sc->parse_cdb    = scsi_generic_parse_cdb;
    dc->fw_name = "disk";
    dc->desc = "pass through generic scsi device (/dev/sg*)";
    dc->reset = scsi_generic_reset;
    dc->props = scsi_generic_properties;
    dc->vmsd  = &vmstate_scsi_device;
}
static const TypeInfo scsi_generic_info = {
    .name          = "scsi-generic",
    .parent        = TYPE_SCSI_DEVICE,
    .instance_size = sizeof(SCSIDevice),
    .class_init    = scsi_generic_class_initfn,
};
// qemu/hw/scsi/scsi-disk.c
static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);

    dc->fw_name = "disk";
    dc->reset = scsi_disk_reset;
    sdc->dma_readv = scsi_dma_readv;
    sdc->dma_writev = scsi_dma_writev;
    sdc->need_fua_emulation = scsi_is_cmd_fua;
}
static const TypeInfo scsi_disk_base_info = {
    .name          = TYPE_SCSI_DISK_BASE,
    .parent        = TYPE_SCSI_DEVICE,
    .class_init    = scsi_disk_base_class_initfn,
    .instance_size = sizeof(SCSIDiskState),
    .class_size    = sizeof(SCSIDiskClass),
    .abstract      = true,
};

static void scsi_cd_class_initfn(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);

    sc->realize      = scsi_cd_realize;
    sc->alloc_req    = scsi_new_request;
    sc->unit_attention_reported = scsi_disk_unit_attention_reported;
    dc->desc = "virtual SCSI CD-ROM";
    dc->props = scsi_cd_properties;
    dc->vmsd  = &vmstate_scsi_disk_state;
}
static const TypeInfo scsi_cd_info = {
    .name          = "scsi-cd",
    .parent        = TYPE_SCSI_DISK_BASE,
    .class_init    = scsi_cd_class_initfn,
};

static void scsi_hd_class_initfn(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);

    sc->realize      = scsi_hd_realize;
    sc->alloc_req    = scsi_new_request;
    sc->unit_attention_reported = scsi_disk_unit_attention_reported;
    dc->desc = "virtual SCSI disk";
    dc->props = scsi_hd_properties;
    dc->vmsd  = &vmstate_scsi_disk_state;
}
static const TypeInfo scsi_hd_info = {
    .name          = "scsi-hd",
    .parent        = TYPE_SCSI_DISK_BASE,
    .class_init    = scsi_hd_class_initfn,
};

static void scsi_disk_class_initfn(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);

    sc->realize      = scsi_disk_realize;
    sc->alloc_req    = scsi_new_request;
    sc->unit_attention_reported = scsi_disk_unit_attention_reported;
    dc->fw_name = "disk";
    dc->desc = "virtual SCSI disk or CD-ROM (legacy)";
    dc->reset = scsi_disk_reset;
    dc->props = scsi_disk_properties;
    dc->vmsd  = &vmstate_scsi_disk_state;
}
static const TypeInfo scsi_disk_info = {
    .name          = "scsi-disk",
    .parent        = TYPE_SCSI_DISK_BASE,
    .class_init    = scsi_disk_class_initfn,
};

static void scsi_block_class_initfn(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
    SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);

    sc->realize      = scsi_block_realize;
    sc->alloc_req    = scsi_block_new_request;
    sc->parse_cdb    = scsi_block_parse_cdb;
    sdc->dma_readv   = scsi_block_dma_readv;
    sdc->dma_writev  = scsi_block_dma_writev;
    sdc->need_fua_emulation = scsi_block_no_fua;
    dc->desc = "SCSI block device passthrough";
    dc->props = scsi_block_properties;
    dc->vmsd  = &vmstate_scsi_disk_state;
}
static const TypeInfo scsi_block_info = {
    .name          = "scsi-block",
    .parent        = TYPE_SCSI_DISK_BASE,
    .class_init    = scsi_block_class_initfn,
};

// used for load VM state from file
vmstate_scsi_disk_state
  vmstate_scsi_device
    vmstate_info_scsi_requests
      // read SCSI requests from file
      get_scsi_requests
        while ((sbyte = qemu_get_sbyte(f)) > 0)
          req = scsi_req_new
          scsi_req_enqueue_internal(req)
            QTAILQ_INSERT_TAIL(&req->dev->requests, req, next);

// registered by qemu_add_vm_change_state_handler when SCSI device is created
scsi_dma_restart_cb
  scsi_dma_restart_bh
    // SCSIRequest(s) are enqueued by scsi_req_enqueue_internal which called by get_scsi_requests
    QTAILQ_FOREACH_SAFE(req, &s->requests, next, next)
      switch (req->cmd.mode) {
      case SCSI_XFER_FROM_DEV:
      case SCSI_XFER_TO_DEV:
          scsi_req_continue(req);
            // req->cmd.mode is set by scsi_cmd_xfer_mode which called by scsi_req_parse_cdb
            // req->ops set by scsi_req_alloc
            if (req->cmd.mode == SCSI_XFER_TO_DEV) {
              req->ops->write_data(req);
            } else {
              req->ops->read_data(req);
            }
          break;
      case SCSI_XFER_NONE:
          scsi_req_dequeue(req);
          scsi_req_enqueue(req);
          break;
      }

// reqops_invalid_opcode or reqops_invalid_field is for invalid SCSI cmd
static const struct SCSIReqOps reqops_invalid_opcode = {
    .size         = sizeof(SCSIRequest),
    .send_command = scsi_invalid_command
};

static const struct SCSIReqOps reqops_invalid_field = {
    .size         = sizeof(SCSIRequest),
    .send_command = scsi_invalid_field
};

// for UNIT_ATTENTION
static const struct SCSIReqOps reqops_unit_attention = {
    .size         = sizeof(SCSIRequest),
    .send_command = scsi_unit_attention
};

// for REPORT_LUNS or REQUEST_SENSE
static const struct SCSIReqOps reqops_target_command = {
    .size         = sizeof(SCSITargetReq),
    .send_command = scsi_target_send_command,
    .read_data    = scsi_target_read_data,
    .get_buf      = scsi_target_get_buf,
    .free_req     = scsi_target_free_buf,
};

// scsi_disk_emulate_reqops or scsi_disk_dma_reqops is for scsi-cd, scsi-hd, scsi-disk,
// the chosen reqops is chose depends on the requested SCSI cmd
static const SCSIReqOps scsi_disk_emulate_reqops = {
    .size         = sizeof(SCSIDiskReq),
    .free_req     = scsi_free_request,
    .send_command = scsi_disk_emulate_command,
    .read_data    = scsi_disk_emulate_read_data,
    .write_data   = scsi_disk_emulate_write_data,
    .get_buf      = scsi_get_buf,
};

static const SCSIReqOps scsi_disk_dma_reqops = {
    .size         = sizeof(SCSIDiskReq),
    .free_req     = scsi_free_request,
    .send_command = scsi_disk_dma_command,
    .read_data    = scsi_read_data,
    .write_data   = scsi_write_data,
    .get_buf      = scsi_get_buf,
    .load_request = scsi_disk_load_request,
    .save_request = scsi_disk_save_request,
};

static const SCSIReqOps *const scsi_disk_reqops_dispatch[256] = {
    [TEST_UNIT_READY]                 = &scsi_disk_emulate_reqops,
    [INQUIRY]                         = &scsi_disk_emulate_reqops,
    [MODE_SENSE]                      = &scsi_disk_emulate_reqops,
    [MODE_SENSE_10]                   = &scsi_disk_emulate_reqops,
    [START_STOP]                      = &scsi_disk_emulate_reqops,
    [ALLOW_MEDIUM_REMOVAL]            = &scsi_disk_emulate_reqops,
    [READ_CAPACITY_10]                = &scsi_disk_emulate_reqops,
    [READ_TOC]                        = &scsi_disk_emulate_reqops,
    [READ_DVD_STRUCTURE]              = &scsi_disk_emulate_reqops,
    [READ_DISC_INFORMATION]           = &scsi_disk_emulate_reqops,
    [GET_CONFIGURATION]               = &scsi_disk_emulate_reqops,
    [GET_EVENT_STATUS_NOTIFICATION]   = &scsi_disk_emulate_reqops,
    [MECHANISM_STATUS]                = &scsi_disk_emulate_reqops,
    [SERVICE_ACTION_IN_16]            = &scsi_disk_emulate_reqops,
    [REQUEST_SENSE]                   = &scsi_disk_emulate_reqops,
    [SYNCHRONIZE_CACHE]               = &scsi_disk_emulate_reqops,
    [SEEK_10]                         = &scsi_disk_emulate_reqops,
    [MODE_SELECT]                     = &scsi_disk_emulate_reqops,
    [MODE_SELECT_10]                  = &scsi_disk_emulate_reqops,
    [UNMAP]                           = &scsi_disk_emulate_reqops,
    [WRITE_SAME_10]                   = &scsi_disk_emulate_reqops,
    [WRITE_SAME_16]                   = &scsi_disk_emulate_reqops,
    [VERIFY_10]                       = &scsi_disk_emulate_reqops,
    [VERIFY_12]                       = &scsi_disk_emulate_reqops,
    [VERIFY_16]                       = &scsi_disk_emulate_reqops,

    [READ_6]                          = &scsi_disk_dma_reqops,
    [READ_10]                         = &scsi_disk_dma_reqops,
    [READ_12]                         = &scsi_disk_dma_reqops,
    [READ_16]                         = &scsi_disk_dma_reqops,
    [WRITE_6]                         = &scsi_disk_dma_reqops,
    [WRITE_10]                        = &scsi_disk_dma_reqops,
    [WRITE_12]                        = &scsi_disk_dma_reqops,
    [WRITE_16]                        = &scsi_disk_dma_reqops,
    [WRITE_VERIFY_10]                 = &scsi_disk_dma_reqops,
    [WRITE_VERIFY_12]                 = &scsi_disk_dma_reqops,
    [WRITE_VERIFY_16]                 = &scsi_disk_dma_reqops,
};

// for scsi-generic, and for scsi-block if the requested SCSI cmd can passthrough
const SCSIReqOps scsi_generic_req_ops = {
    .size         = sizeof(SCSIGenericReq),
    .free_req     = scsi_free_request,
    .send_command = scsi_send_command,
    .read_data    = scsi_read_data,
    .write_data   = scsi_write_data,
    .get_buf      = scsi_get_buf,
    .load_request = scsi_generic_load_request,
    .save_request = scsi_generic_save_request,
};

// for scsi-block if the requested SCSI cmd can not passthrough
static const SCSIReqOps scsi_block_dma_reqops = {
    .size         = sizeof(SCSIBlockReq),
    .free_req     = scsi_free_request,
    .send_command = scsi_block_dma_command,
    .read_data    = scsi_read_data,
    .write_data   = scsi_write_data,
    .get_buf      = scsi_get_buf,
    .load_request = scsi_disk_load_request,
    .save_request = scsi_disk_save_request,
};
// qemu/vl.c
main
  qemu_init_cpu_loop
  module_call_init(MODULE_INIT_QOM)
  module_call_init(MODULE_INIT_OPTS)
  bdrv_init_with_whitelist
    bdrv_init
      module_call_init(MODULE_INIT_BLOCK)
  machine_class = select_machine
  qemu_init_main_loop
  current_machine = MACHINE(object_new(object_class_get_name(
                          OBJECT_CLASS(machine_class))));
    object_new_with_type
      // check ti->class and return immediately if not null
      type_initialize
        // type pc_machine_type_v2_7 registered in MODULE_INIT_QOM phase
        ti->class_init
          // set machine_class->init
          pc_machine_v2_7_class_init
      object_initialize_with_type
  cpu_exec_init_all
    io_mem_init
    memory_map_init
  socket_init
  configure_accelerator
    accel_init_machine
      acc->init_machine
  cpu_ticks_init
  machine_class->init(current_machine)
    // qemu/hw/i386/pc_piix.c
    pc_machine_init_v2_7
      pc_init1
        pc_cpus_init
          for (i = 0; i < max_cpus; i++)
            pc_new_cpu
              cpu = X86_CPU(object_new(typename))
                object_new_with_type
                  // check ti->class and return immediately if not null
                  type_initialize
                    parent = type_get_parent
                    if (parent)
                      // initialize parent and parent's parent type up until to root
                      type_initialize(parent)
                    ti->class_init
                      // qemu/target-i386/cpu.c
                      // set xcc->parent_realize, dc->realize, cc->cpu_exec_enter, cc->cpu_exec_exit
                      x86_cpu_common_class_init
                  object_initialize_with_type
                    type_initialize
                    object_init_with_type
                      // initialize parent and parent's parent object up until to root
                      // note: TypeImpl instances are created by type_new which called for
                      // each type's registration during MODULE_INIT_QOM phase
                      if (type_has_parent(ti))
                        object_init_with_type(obj, type_get_parent(ti));
                      ti->instance_init
                    object_post_init_with_type
                      ti->instance_post_init
              object_property_set_bool(OBJECT(cpu), true, "realized", &local_err);
                object_property_set_qobject
                  object_property_set
                    // "realized" property was added by device_initfn
                    prop = object_property_find
                    // call property set handler
                    prop->set
                        // call device_set_realized set by device_initfn
                        device_set_realized
                          // set to x86_cpu_realizefn by x86_cpu_common_class_init
                          dc->realize
                            x86_cpu_realizefn
                              cpu_exec_init
                              mce_init
                              qemu_init_vcpu
                                if (kvm_enabled()) {
                                    qemu_kvm_start_vcpu(cpu);
                                } else if (tcg_enabled()) {
                                    qemu_tcg_init_vcpu(cpu);
                                } else {
                                    qemu_dummy_start_vcpu(cpu);
                                }
                              x86_cpu_apic_realize
                              cpu_reset
                              // xcc->parent_realize = dc->realize, see x86_cpu_common_class_init
                              xcc->parent_realize
        pc_memory_init
        pci_bus = i440fx_init
        pc_basic_device_init
        pc_nic_init
        pc_pci_device_init
  vm_start
    cpu_enable_ticks
    resume_all_vcpus
  main_loop
    main_loop_wait

最后修改于 2019-01-07