Bad DMA from CEDAR card
Benjamin Herrenschmidt
benh at kernel.crashing.org
Tue Oct 23 03:45:11 PDT 2012
On Tue, 2012-10-23 at 18:54 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2012-10-23 at 18:42 +1100, Benjamin Herrenschmidt wrote:
> >
> > As you can see, it's not doing much before the failure:
>
> Allright, that debug output is bad, it's missing a bunch of stuff,
> due to a bad log level (the prink(KERN_DEBUG) in the atom debug
> stuff doesn't work anymore new kernel btw)
More data: I've done a bit of AtomDis under Dave instructions and
improved my tracing, and what it looks like is we run those 3 tables in
that order:
DAC1OutputControl
DAC2OutputControl
EnableCRTCMemReq
The error happens somewhere between the end of DAC2OutputControl and
early in EnableCRTCMemReq.
It all comes from drm_helper_disable_unused_functions(), the first two I
suspect as a result of drm_encoder_disable() and the last one as a
result of crtc_funcs->disable which itself goes into dpms.
I don't know (yet) whether anything happens in between that doesn't go
via ATOM, in which case that wouldn't be traced. That's the next thing
to check (including interrupts though we shouldn't be getting any at
this stage afaik).
Now here is the trace. As before, I enable atom_debug right before that
sequence, after I've done a 1s pause and a freeze check which passes, so
I'm reasonably confident the card is still somewhat in a sane state.
I then run the tables, with a hack that adds a 200ms pause after each
op, followed by a freeze check which I moved to after the ops instead of
before, so we don't miss a freeze caused by the last op of a table.
Here's the results so far:
>> execute D7E2 (len 24, WS 0, PS 4)
SET_ATI_PORT @ 0xD7E8
port: 0 (MM)
CLEAR_REG @ 0xD7EB
dst: REG[0x19EC].[31:0] <- 0x00000000
AND_REG @ 0xD7EF
dst: REG[0x19E4].[7:0] -> 0x04
src: IMM 0xFE
dst: REG[0x19E4].[7:0] <- 0x04
OR_REG @ 0xD7F4
dst: REG[0x19E4].[7:0] -> 0x04
src: PS[0x00,0x0000].[7:0] -> 0x00
dst: REG[0x19E4].[7:0] <- 0x04
EOT @ 0xD7F9
<<
>> execute D7CA (len 24, WS 0, PS 4)
SET_ATI_PORT @ 0xD7D0
port: 0 (MM)
CLEAR_REG @ 0xD7D3
dst: REG[0x19AC].[31:0] <- 0x00000000
AND_REG @ 0xD7D7
dst: REG[0x19A4].[7:0] -> 0x04
src: IMM 0xFE
dst: REG[0x19A4].[7:0] <- 0x04
OR_REG @ 0xD7DC
dst: REG[0x19A4].[7:0] -> 0x04
src: PS[0x00,0x0000].[7:0] -> 0x00
dst: REG[0x19A4].[7:0] <- 0x04
EOT @ 0xD7E1
<<
>> execute BADE (len 25, WS 0, PS 0)
SET_ATI_PORT @ 0xBAE4
port: 0 (MM)
0001:01:00.0: EEH freeze detected, fstate=3 pcierr=9
Interestingly enough, it _does_ go into EnableCRTCMemreq tho it
doesn't seem to have time to do anything in there before it
detects the failure. However it also doesn't appear to detect
it despite the delays on the last instruction of DAC2OutputControl.
I do wonder whether we end up actually doing something *else*
in between those two, that isn't going through ATOM. I'll try to
dig in that direction next.
The "disassembly" of those tables matches the output:
command_table 0000d7ca #44 (DAC1OutputControl):
Size 0018
Format Rev. 01
Param Rev. 00
Content Rev. 01
Attributes: Work space size 00 longs
Parameter space size 01 longs
Table update indicator 0
0006: 370000 SET_ATI_PORT 0000 (INDIRECT_IO_MM)
0009: 5400ac19 CLEAR reg[19ac] [XXXX]
000d: 0725a419fe AND reg[19a4] [...X] <- fe
0012: 0d21a41900 OR reg[19a4] [...X] <- param[00] [...X]
0017: 5b EOT
command_table 0000d7e2 #45 (DAC2OutputControl):
Size 0018
Format Rev. 01
Param Rev. 00
Content Rev. 01
Attributes: Work space size 00 longs
Parameter space size 01 longs
Table update indicator 0
0006: 370000 SET_ATI_PORT 0000 (INDIRECT_IO_MM)
0009: 5400ec19 CLEAR reg[19ec] [XXXX]
000d: 0725e419fe AND reg[19e4] [...X] <- fe
0012: 0d21e41900 OR reg[19e4] [...X] <- param[00] [...X]
0017: 5b EOT
Cheers,
Ben.
More information about the xorg-driver-ati
mailing list