📄 libata.tmpl
字号:
</para> </listitem> <listitem> <para> EH handling via ata_scsi_error() is not properly protected from usual command processing. On EH entrance, the device is not in quiescent state. Timed out commands may succeed or fail any time. pio_task and atapi_task may still be running. </para> </listitem> <listitem> <para> Too weak error recovery. Devices / controllers causing HSM mismatch errors and other errors quite often require reset to return to known state. Also, advanced error handling is necessary to support features like NCQ and hotplug. </para> </listitem> <listitem> <para> ATA errors are directly handled in the interrupt handler and PIO errors in pio_task. This is problematic for advanced error handling for the following reasons. </para> <para> First, advanced error handling often requires context and internal qc execution. </para> <para> Second, even a simple failure (say, CRC error) needs information gathering and could trigger complex error handling (say, resetting & reconfiguring). Having multiple code paths to gather information, enter EH and trigger actions makes life painful. </para> <para> Third, scattered EH code makes implementing low level drivers difficult. Low level drivers override libata callbacks. If EH is scattered over several places, each affected callbacks should perform its part of error handling. This can be error prone and painful. </para> </listitem> </itemizedlist> </sect1> </chapter> <chapter id="libataExt"> <title>libata Library</title>!Edrivers/ata/libata-core.c </chapter> <chapter id="libataInt"> <title>libata Core Internals</title>!Idrivers/ata/libata-core.c </chapter> <chapter id="libataScsiInt"> <title>libata SCSI translation/emulation</title>!Edrivers/ata/libata-scsi.c!Idrivers/ata/libata-scsi.c </chapter> <chapter id="ataExceptions"> <title>ATA errors and exceptions</title> <para> This chapter tries to identify what error/exception conditions exist for ATA/ATAPI devices and describe how they should be handled in implementation-neutral way. </para> <para> The term 'error' is used to describe conditions where either an explicit error condition is reported from device or a command has timed out. </para> <para> The term 'exception' is either used to describe exceptional conditions which are not errors (say, power or hotplug events), or to describe both errors and non-error exceptional conditions. Where explicit distinction between error and exception is necessary, the term 'non-error exception' is used. </para> <sect1 id="excat"> <title>Exception categories</title> <para> Exceptions are described primarily with respect to legacy taskfile + bus master IDE interface. If a controller provides other better mechanism for error reporting, mapping those into categories described below shouldn't be difficult. </para> <para> In the following sections, two recovery actions - reset and reconfiguring transport - are mentioned. These are described further in <xref linkend="exrec"/>. </para> <sect2 id="excatHSMviolation"> <title>HSM violation</title> <para> This error is indicated when STATUS value doesn't match HSM requirement during issuing or excution any ATA/ATAPI command. </para> <itemizedlist> <title>Examples</title> <listitem> <para> ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying to issue a command. </para> </listitem> <listitem> <para> !BSY && !DRQ during PIO data transfer. </para> </listitem> <listitem> <para> DRQ on command completion. </para> </listitem> <listitem> <para> !BSY && ERR after CDB tranfer starts but before the last byte of CDB is transferred. ATA/ATAPI standard states that "The device shall not terminate the PACKET command with an error before the last byte of the command packet has been written" in the error outputs description of PACKET command and the state diagram doesn't include such transitions. </para> </listitem> </itemizedlist> <para> In these cases, HSM is violated and not much information regarding the error can be acquired from STATUS or ERROR register. IOW, this error can be anything - driver bug, faulty device, controller and/or cable. </para> <para> As HSM is violated, reset is necessary to restore known state. Reconfiguring transport for lower speed might be helpful too as transmission errors sometimes cause this kind of errors. </para> </sect2> <sect2 id="excatDevErr"> <title>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</title> <para> These are errors detected and reported by ATA/ATAPI devices indicating device problems. For this type of errors, STATUS and ERROR register values are valid and describe error condition. Note that some of ATA bus errors are detected by ATA/ATAPI devices and reported using the same mechanism as device errors. Those cases are described later in this section. </para> <para> For ATA commands, this type of errors are indicated by !BSY && ERR during command execution and on completion. </para> <para>For ATAPI commands,</para> <itemizedlist> <listitem> <para> !BSY && ERR && ABRT right after issuing PACKET indicates that PACKET command is not supported and falls in this category. </para> </listitem> <listitem> <para> !BSY && ERR(==CHK) && !ABRT after the last byte of CDB is transferred indicates CHECK CONDITION and doesn't fall in this category. </para> </listitem> <listitem> <para> !BSY && ERR(==CHK) && ABRT after the last byte of CDB is transferred *probably* indicates CHECK CONDITION and doesn't fall in this category. </para> </listitem> </itemizedlist> <para> Of errors detected as above, the followings are not ATA/ATAPI device errors but ATA bus errors and should be handled according to <xref linkend="excatATAbusErr"/>. </para> <variablelist> <varlistentry> <term>CRC error during data transfer</term> <listitem> <para> This is indicated by ICRC bit in the ERROR register and means that corruption occurred during data transfer. Upto ATA/ATAPI-7, the standard specifies that this bit is only applicable to UDMA transfers but ATA/ATAPI-8 draft revision 1f says that the bit may be applicable to multiword DMA and PIO. </para> </listitem> </varlistentry> <varlistentry> <term>ABRT error during data transfer or on completion</term> <listitem> <para> Upto ATA/ATAPI-7, the standard specifies that ABRT could be set on ICRC errors and on cases where a device is not able to complete a command. Combined with the fact that MWDMA and PIO transfer errors aren't allowed to use ICRC bit upto ATA/ATAPI-7, it seems to imply that ABRT bit alone could indicate tranfer errors. </para> <para> However, ATA/ATAPI-8 draft revision 1f removes the part that ICRC errors can turn on ABRT. So, this is kind of gray area. Some heuristics are needed here. </para> </listitem> </varlistentry> </variablelist> <para> ATA/ATAPI device errors can be further categorized as follows. </para> <variablelist> <varlistentry> <term>Media errors</term> <listitem> <para> This is indicated by UNC bit in the ERROR register. ATA devices reports UNC error only after certain number of retries cannot recover the data, so there's nothing much else to do other than notifying upper layer. </para> <para> READ and WRITE commands report CHS or LBA of the first failed sector but ATA/ATAPI standard specifies that the amount of transferred data on error completion is indeterminate, so we cannot assume that sectors preceding the failed sector have been transferred and thus cannot complete those sectors successfully as SCSI does. </para> </listitem> </varlistentry> <varlistentry> <term>Media changed / media change requested error</term> <listitem> <para> <<TODO: fill here>> </para> </listitem> </varlistentry> <varlistentry><term>Address error</term> <listitem> <para> This is indicated by IDNF bit in the ERROR register. Report to upper layer. </para> </listitem> </varlistentry> <varlistentry><term>Other errors</term> <listitem> <para> This can be invalid command or parameter indicated by ABRT ERROR bit or some other error condition. Note that ABRT bit can indicate a lot of things including ICRC and Address errors. Heuristics needed. </para> </listitem> </varlistentry> </variablelist> <para> Depending on commands, not all STATUS/ERROR bits are applicable. These non-applicable bits are marked with "na" in the output descriptions but upto ATA/ATAPI-7 no definition of "na" can be found. However, ATA/ATAPI-8 draft revision 1f describes "N/A" as follows. </para> <blockquote> <variablelist> <varlistentry><term>3.2.3.3a N/A</term> <listitem> <para> A keyword the indicates a field has no defined value in this standard and should not be checked by the host or device. N/A fields should be cleared to zero. </para> </listitem> </varlistentry> </variablelist> </blockquote> <para> So, it seems reasonable to assume that "na" bits are cleared to zero by devices and thus need no explicit masking. </para> </sect2> <sect2 id="excatATAPIcc"> <title>ATAPI device CHECK CONDITION</title> <para> ATAPI device CHECK CONDITION error is indicated by set CHK bit (ERR bit) in the STATUS register after the last byte of CDB is transferred for a PACKET command. For this kind of errors, sense data should be acquired to gather information regarding the errors. REQUEST SENSE packet command should be used to acquire sense data. </para> <para> Once sense data is acquired, this type of errors can be handled similary to other SCSI errors. Note that sense data may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such cases, the error should be considered as an ATA bus error and handled according to <xref linkend="excatATAbusErr"/>. </para> </sect2> <sect2 id="excatNCQerr"> <title>ATA device error (NCQ)</title> <para> NCQ command error is indicated by cleared BSY and set ERR bit during NCQ command phase (one or more NCQ commands outstanding). Although STATUS and ERROR registers will contain valid values describing the error, READ LOG EXT is required to clear the error condition, determine which command has failed and acquire more information. </para> <para> READ LOG EXT Log Page 10h reports which tag has failed and taskfile register values describing the error. With this information the failed command can be handled as a normal ATA command error as in <xref linkend="excatDevErr"/> and all other in-flight commands must be retried. Note that this retry should not be counted - it's likely that commands retried this way would have completed normally if it were not for the failed command. </para> <para> Note that ATA bus errors can be reported as ATA device NCQ errors. This should be handled as described in <xref linkend="excatATAbusErr"/>. </para> <para> If READ LOG EXT Log Page 10h fails or reports NQ, we're thoroughly screwed. This condition should be treated according to <xref linkend="excatHSMviolation"/>. </para> </sect2> <sect2 id="excatATAbusErr"> <title>ATA bus error</title> <para> ATA bus error means that data corruption occurred during transmission over ATA bus (SATA or PATA). This type of errors can be indicated by </para>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -