powerpc/eeh: Fix crash caused by null eeh_dev
authorGavin Shan <shangw@linux.vnet.ibm.com>
Mon, 16 Apr 2012 19:55:39 +0000 (19:55 +0000)
committerBenjamin Herrenschmidt <benh@kernel.crashing.org>
Mon, 23 Apr 2012 01:04:28 +0000 (11:04 +1000)
commit2ef822c55371b20548d4f58193c580407a5d738d
treee2f9c3fe5f761dda21ecfa4ee3bb91b5f16ce428
parentaec49c7c0e9d2abe88a3d7bc700fca66f05fd67d
powerpc/eeh: Fix crash caused by null eeh_dev

The problem was reported by Anton Blanchard. While EEH error
happened to the PCI device without the corresponding device
driver, kernel crash was seen. Eventually, I successfully
reproduced the problem on Firebird-L machine with utility
"errinjct". Initially, the device driver for Emulex ethernet
MAC has been disabled from .config and force data parity on
the Emulex ethernet MAC with help of "errinjct". Eventually,
I saw the kernel crash after issueing couple of "lspci -v"
command.

The root cause behind is that the PCI device, including the
reference to the corresponding eeh device, will be removed
from the system while EEH does recovery. Afterwards, the
PCI device will be probed again and added into the system
accordingly. So it's not safe to retrieve the eeh device from
the corresponding PCI device after the PCI device has been removed
and not added again.

The patch fixes the issue and retrieve the eeh device from OF node
instead of PCI device after the PCI device has been removed.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/platforms/pseries/eeh.c