If we didn't store recovery info for any of the pages in a bad chunk
then we'll consider it an unrecoverable error and reboot. The old
code only checked the first page in the chunk. With 8K chunks this
was only a 50% hit rate.
We could just choose to recover the page whose recovery info we have.
We don't do that because there is often more than one bit error in a
chunk and it seems better to error on the side of rebooting.
BUG=chrome-os-partner:15655
TEST=suspend_stress_test
Change-Id: Ic578bcfaf19efa33f0cc118c9a2f5ca64e5063bb
Signed-off-by: Doug Anderson <dianders@chromium.org>
Reviewed-on: https://gerrit.chromium.org/gerrit/38752
Reviewed-by: Jon Kliegman <kliegs@chromium.org>
{
const u32 failed_cu = bitfix_get_cu(failed_chunk);
u32 cu;
+ size_t offset;
- BUG_ON(should_skip_fn(failed_chunk));
+ /*
+ * If any of the pages in the failed chunk were skipped then we can't
+ * recover it; just bail.
+ */
+ for (offset = 0; offset < CHUNK_SIZE; offset += PAGE_SIZE)
+ BUG_ON(should_skip_fn(failed_chunk + offset));
for (cu = 0; cu < CU_COUNT; cu++) {
phys_addr_t this_chunk = (failed_chunk & ~CU_MASK) |
(cu << CU_OFFSET);
- size_t offset;
/* Don't include the failed corruption unit in our xor */
if (cu == failed_cu)