📄 scsi_eh.txt
字号:
1. Error completion / time out ACTION: scsi_eh_scmd_add() is invoked for scmd - set scmd->eh_eflags - add scmd to shost->eh_cmd_q - set SHOST_RECOVERY - shost->host_failed++ LOCKING: shost->host_lock 2. EH starts ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q is cleared. LOCKING: shost->host_lock (not strictly necessary, just for consistency) 3. scmd recovered ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd - shost->host_failed-- - clear scmd->eh_eflags - scsi_setup_cmd_retry() - move from local eh_work_q to local eh_done_q LOCKING: none 4. EH completes ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper layer of failure. - scmd is removed from eh_done_q and scmd->eh_entry is cleared - if retry is necessary, scmd is requeued using scsi_queue_insert() - otherwise, scsi_finish_command() is invoked for scmd LOCKING: queue or finish function performs appropriate locking[2-1-3] Flow of control EH through fine-grained callbacks start from scsi_unjam_host().<<scsi_unjam_host>> 1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is cleared by this action. 2. Invoke scsi_eh_get_sense. <<scsi_eh_get_sense>> This action is taken for each error-completed (!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most SCSI transports/LLDDs automatically acquire sense data on command failures (autosense). Autosense is recommended for performance reasons and as sense information could get out of sync inbetween occurrence of CHECK CONDITION and this action. Note that if autosense is not supported, scmd->sense_buffer contains invalid sense data when error-completing the scmd with scsi_done(). scsi_decide_disposition() always returns FAILED in such cases thus invoking SCSI EH. When the scmd reaches here, sense data is acquired and scsi_decide_disposition() is called again. 1. Invoke scsi_request_sense() which issues REQUEST_SENSE command. If fails, no action. Note that taking no action causes higher-severity recovery to be taken for the scmd. 2. Invoke scsi_decide_disposition() on the scmd - SUCCESS scmd->retries is set to scmd->allowed preventing scsi_eh_flush_done_q() from retrying the scmd and scsi_eh_finish_cmd() is invoked. - NEEDS_RETRY scsi_eh_finish_cmd() invoked - otherwise No action. 3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds(). <<scsi_eh_abort_cmds>> This action is taken for each timed out command. hostt->eh_abort_handler() is invoked for each scmd. The handler returns SUCCESS if it has succeeded to make LLDD and all related hardware forget about the scmd. If a timedout scmd is successfully aborted and the sdev is either offline or ready, scsi_eh_finish_cmd() is invoked for the scmd. Otherwise, the scmd is left in eh_work_q for higher-severity actions. Note that both offline and ready status mean that the sdev is ready to process new scmds, where processing also implies immediate failing; thus, if a sdev is in one of the two states, no further recovery action is needed. Device readiness is tested using scsi_eh_tur() which issues TEST_UNIT_READY command. Note that the scmd must have been aborted successfully before reusing it for TEST_UNIT_READY. 4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs() <<scsi_eh_ready_devs>> This function takes four increasingly more severe measures to make failed sdevs ready for new commands. 1. Invoke scsi_eh_stu() <<scsi_eh_stu>> For each sdev which has failed scmds with valid sense data of which scsi_check_sense()'s verdict is FAILED, START_STOP_UNIT command is issued w/ start=1. Note that as we explicitly choose error-completed scmds, it is known that lower layers have forgotten about the scmd and we can reuse it for STU. If STU succeeds and the sdev is either offline or ready, all failed scmds on the sdev are EH-finished with scsi_eh_finish_cmd(). *NOTE* If hostt->eh_abort_handler() isn't implemented or failed, we may still have timed out scmds at this point and STU doesn't make lower layers forget about those scmds. Yet, this function EH-finish all scmds on the sdev if STU succeeds leaving lower layers in an inconsistent state. It seems that STU action should be taken only when a sdev has no timed out scmd. 2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset(). <<scsi_eh_bus_device_reset>> This action is very similar to scsi_eh_stu() except that, instead of issuing STU, hostt->eh_device_reset_handler() is used. Also, as we're not issuing SCSI commands and resetting clears all scmds on the sdev, there is no need to choose error-completed scmds. 3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset() <<scsi_eh_bus_reset>> hostt->eh_bus_reset_handler() is invoked for each channel with failed scmds. If bus reset succeeds, all failed scmds on all ready or offline sdevs on the channel are EH-finished. 4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset() <<scsi_eh_host_reset>> This is the last resort. hostt->eh_host_reset_handler() is invoked. If host reset succeeds, all failed scmds on all ready or offline sdevs on the host are EH-finished. 5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs() <<scsi_eh_offline_sdevs>> Take all sdevs which still have unrecovered scmds offline and EH-finish the scmds. 5. Invoke scsi_eh_flush_done_q(). <<scsi_eh_flush_done_q>> At this point all scmds are recovered (or given up) and put on eh_done_q by scsi_eh_finish_cmd(). This function flushes eh_done_q by either retrying or notifying upper layer of failure of the scmds.[2-2] EH through transportt->eh_strategy_handler() transportt->eh_strategy_handler() is invoked in the place ofscsi_unjam_host() and it is responsible for whole recovery process.On completion, the handler should have made lower layers forget aboutall failed scmds and either ready for new commands or offline. Also,it should perform SCSI EH maintenance choirs to maintain integrity ofSCSI midlayer. IOW, of the steps described in [2-1-2], all stepsexcept for #1 must be implemented by eh_strategy_handler().[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions The following conditions are true on entry to the handler. - Each failed scmd's eh_flags field is set appropriately. - Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry. - SHOST_RECOVERY is set. - shost->host_failed == shost->host_busy[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions The following conditions must be true on exit from the handler. - shost->host_failed is zero. - Each scmd's eh_eflags field is cleared. - Each scmd is in such a state that scsi_setup_cmd_retry() on the scmd doesn't make any difference. - shost->eh_cmd_q is cleared. - Each scmd->eh_entry is cleared. - Either scsi_queue_insert() or scsi_finish_command() is called on each scmd. Note that the handler is free to use scmd->retries and ->allowed to limit the number of retries.[2-2-3] Things to consider - Know that timed out scmds are still active on lower layers. Make lower layers forget about them before doing anything else with those scmds. - For consistency, when accessing/modifying shost data structure, grab shost->host_lock. - On completion, each failed sdev must have forgotten about all active scmds. - On completion, each failed sdev must be ready for new commands or offline.--Tejun Heohtejun@gmail.com11th September 2005
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -