Bad DMA from CEDAR card

Benjamin Herrenschmidt benh at kernel.crashing.org
Tue Oct 23 03:45:11 PDT 2012


On Tue, 2012-10-23 at 18:54 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2012-10-23 at 18:42 +1100, Benjamin Herrenschmidt wrote:
> > 
> > As you can see, it's not doing much before the failure:
> 
> Allright, that debug output is bad, it's missing a bunch of stuff,
> due to a bad log level (the prink(KERN_DEBUG) in the atom debug
> stuff doesn't work anymore new kernel btw)

More data: I've done a bit of AtomDis under Dave instructions and
improved my tracing, and what it looks like is we run those 3 tables in
that order:

DAC1OutputControl
DAC2OutputControl
EnableCRTCMemReq

The error happens somewhere between the end of DAC2OutputControl and
early in EnableCRTCMemReq.

It all comes from drm_helper_disable_unused_functions(), the first two I
suspect as a result of drm_encoder_disable() and the last one as a
result of crtc_funcs->disable which itself goes into dpms.

I don't know (yet) whether anything happens in between that doesn't go
via ATOM, in which case that wouldn't be traced. That's the next thing
to check (including interrupts though we shouldn't be getting any at
this stage afaik).

Now here is the trace. As before, I enable atom_debug right before that
sequence, after I've done a 1s pause and a freeze check which passes, so
I'm reasonably confident the card is still somewhat in a sane state.

I then run the tables, with a hack that adds a 200ms pause after each
op, followed by a freeze check which I moved to after the ops instead of
before, so we don't miss a freeze caused by the last op of a table.

Here's the results so far:

>> execute D7E2 (len 24, WS 0, PS 4)
   SET_ATI_PORT @ 0xD7E8
      port: 0 (MM)
   CLEAR_REG @ 0xD7EB
      dst: REG[0x19EC].[31:0] <- 0x00000000
   AND_REG @ 0xD7EF
      dst: REG[0x19E4].[7:0] -> 0x04
      src: IMM 0xFE
      dst: REG[0x19E4].[7:0] <- 0x04
   OR_REG @ 0xD7F4
      dst: REG[0x19E4].[7:0] -> 0x04
      src: PS[0x00,0x0000].[7:0] -> 0x00
      dst: REG[0x19E4].[7:0] <- 0x04
   EOT @ 0xD7F9
<<
>> execute D7CA (len 24, WS 0, PS 4)
   SET_ATI_PORT @ 0xD7D0
      port: 0 (MM)
   CLEAR_REG @ 0xD7D3
      dst: REG[0x19AC].[31:0] <- 0x00000000
   AND_REG @ 0xD7D7
      dst: REG[0x19A4].[7:0] -> 0x04
      src: IMM 0xFE
      dst: REG[0x19A4].[7:0] <- 0x04
   OR_REG @ 0xD7DC
      dst: REG[0x19A4].[7:0] -> 0x04
      src: PS[0x00,0x0000].[7:0] -> 0x00
      dst: REG[0x19A4].[7:0] <- 0x04
   EOT @ 0xD7E1
<<
>> execute BADE (len 25, WS 0, PS 0)
   SET_ATI_PORT @ 0xBAE4
      port: 0 (MM)
0001:01:00.0: EEH freeze detected, fstate=3 pcierr=9

Interestingly enough, it _does_ go into EnableCRTCMemreq tho it
doesn't seem to have time to do anything in there before it
detects the failure. However it also doesn't appear to detect
it despite the delays on the last instruction of DAC2OutputControl.

I do wonder whether we end up actually doing something *else*
in between those two, that isn't going through ATOM. I'll try to
dig in that direction next.

The "disassembly" of those tables matches the output:

command_table  0000d7ca  #44  (DAC1OutputControl):

  Size         0018
  Format Rev.  01
  Param Rev.   00
  Content Rev. 01
  Attributes:  Work space size        00 longs
               Parameter space size   01 longs
               Table update indicator 0

  0006: 370000            SET_ATI_PORT  0000  (INDIRECT_IO_MM)
  0009: 5400ac19          CLEAR  reg[19ac]  [XXXX]
  000d: 0725a419fe        AND    reg[19a4]  [...X]  <-  fe
  0012: 0d21a41900        OR     reg[19a4]  [...X]  <-  param[00]  [...X]
  0017: 5b                EOT

command_table  0000d7e2  #45  (DAC2OutputControl):

  Size         0018
  Format Rev.  01
  Param Rev.   00
  Content Rev. 01
  Attributes:  Work space size        00 longs
               Parameter space size   01 longs
               Table update indicator 0

  0006: 370000            SET_ATI_PORT  0000  (INDIRECT_IO_MM)
  0009: 5400ec19          CLEAR  reg[19ec]  [XXXX]
  000d: 0725e419fe        AND    reg[19e4]  [...X]  <-  fe
  0012: 0d21e41900        OR     reg[19e4]  [...X]  <-  param[00]  [...X]
  0017: 5b                EOT

Cheers,
Ben.




More information about the xorg-driver-ati mailing list