samsung: snow: bitfix: Recover from bit errors across sleep/wake
Some Snow boards have an issue in read-only firmware that can cause
bit errors across a suspend/resume cycle. Given the pattern of the
corruption we can actually recover from nearly all bit errors given a
little bit of perparation time.
A full description of the problem and solution can be found in
bitfix-snow.c.
This change recovers from most cases of corruption but can still
be made more robust and optimized.
At the moment this code appears to add about 3.5 seconds to
suspend time and ~nothing to resume time. That's on top of the
~1 second for pm-check.
BUG=chrome-os-partner:15655
TEST=Run the following script:
cd /var/log
touch sst.txt
echo '------------' >> sst.txt
date >> sst.txt
while true; do
suspend_stress_test --noerrors_fatal --backup_rtc \
--suspend_min 15 --suspend_max 15 | tee -a sst.txt
dmesg | \
grep "\(CRC error\|bitfix\|...fixed\|[rR]ecover\)" \
>> sst.txt
dmesg -C
done
See messages like:
[ 229.959279] s3c_pm_check: Restore CRC error at
86856000 (
6a5fe07c vs
6acbe07c)
[ 229.959279] bitfix_recover_chunk: Attempting recovery at
86856000
[ 229.959279] ...fixed 0x86857140 from 0x76b47894 to 0x76b07894
[ 229.959279] ...fixed 0x8685751c from 0x00100051 to 0x00000051
[ 229.959279] ...fixed 0x86857d54 from 0x77773604 to 0x77f73604
[ 229.959279] s3c_pm_check: Recovered
[ 229.959279] s3c_pm_check: Restore CRC error at
86858000 (
4e502299 vs
4e5b2299)
[ 229.959279] bitfix_recover_chunk: Attempting recovery at
86858000
[ 229.959279] ...fixed 0x86858358 from 0x76f8a3d8 to 0x76f9a3d8
[ 229.959279] ...fixed 0x86858f2c from 0x00080002 to 0x00000002
[ 229.959279] ...fixed 0x86859b64 from 0x00021a66 to 0x00001a66
[ 229.959279] s3c_pm_check: Recovered
TEST=Run bitfix-test.sh from I26b3938fe5c481968dc019f188f46d211dd6a801
TEST=Time by doing:
echo Y > /sys/module/pm_check/parameters/pm_check_print_timings
suspend_stress_test --noerrors_fatal --backup_rtc \
--suspend_min 15 --suspend_max 15 -c3
dmesg | grep 'memory scan'
See:
[ 1433.174264] s3c_pm_check: Suspend memory scan took
4434401 usecs
[ 1433.174264] s3c_pm_check: Resume memory scan took
1122626 usecs
[ 1445.639303] s3c_pm_check: Suspend memory scan took
4460913 usecs
[ 1445.639303] s3c_pm_check: Resume memory scan took
1122503 usecs
[ 1458.219387] s3c_pm_check: Suspend memory scan took
4455245 usecs
[ 1458.219387] s3c_pm_check: Resume memory scan took
1122534 usecs
TEST=Boot with old bios and see bitfix get enabled by doing:
dmesg | grep bitfix_reserve
...and see:
[ 0.000000] bitfix_reserve: Detected firmware that needs bitfix
TEST=Run with legacy BIOS (doesn't set RO string) and enable manually
echo 'Y' > /sys/module/bitfix_snow/parameters/bitfix_enabled
echo 'Y' > /sys/module/pm_check/parameters/pm_check_enabled
...see bits get fixed using tests above.
Change-Id: I9c7ac2f85b8d9398c93f486f9401daee1526f571
Signed-off-by: Doug Anderson <dianders@chromium.org>
Reviewed-on: https://gerrit.chromium.org/gerrit/36905
Reviewed-by: Sam Leffler <sleffler@chromium.org>
Reviewed-by: Jon Kliegman <kliegs@chromium.org>