samsung: snow: bitfix: Bail if any pages in a bad chunk were skipped
authorDoug Anderson <dianders@chromium.org>
Tue, 27 Nov 2012 21:24:30 +0000 (13:24 -0800)
committerGerrit <chrome-bot@google.com>
Wed, 28 Nov 2012 05:57:33 +0000 (21:57 -0800)
If we didn't store recovery info for any of the pages in a bad chunk
then we'll consider it an unrecoverable error and reboot.  The old
code only checked the first page in the chunk.  With 8K chunks this
was only a 50% hit rate.

We could just choose to recover the page whose recovery info we have.
We don't do that because there is often more than one bit error in a
chunk and it seems better to error on the side of rebooting.

BUG=chrome-os-partner:15655
TEST=suspend_stress_test

Change-Id: Ic578bcfaf19efa33f0cc118c9a2f5ca64e5063bb
Signed-off-by: Doug Anderson <dianders@chromium.org>
Reviewed-on: https://gerrit.chromium.org/gerrit/38752
Reviewed-by: Jon Kliegman <kliegs@chromium.org>
arch/arm/mach-exynos/bitfix-snow.c

index 1b2422d..8b1d175 100644 (file)
@@ -402,13 +402,18 @@ static void _bitfix_recover_chunk(phys_addr_t failed_chunk,
 {
        const u32 failed_cu = bitfix_get_cu(failed_chunk);
        u32 cu;
+       size_t offset;
 
-       BUG_ON(should_skip_fn(failed_chunk));
+       /*
+        * If any of the pages in the failed chunk were skipped then we can't
+        * recover it; just bail.
+        */
+       for (offset = 0; offset < CHUNK_SIZE; offset += PAGE_SIZE)
+               BUG_ON(should_skip_fn(failed_chunk + offset));
 
        for (cu = 0; cu < CU_COUNT; cu++) {
                phys_addr_t this_chunk = (failed_chunk & ~CU_MASK) |
                        (cu << CU_OFFSET);
-               size_t offset;
 
                /* Don't include the failed corruption unit in our xor */
                if (cu == failed_cu)