Random ocassional graphics freeze on Intel chipset

Alex Villací­s Lasso a_villacis at palosanto.com
Mon Oct 14 08:16:44 PDT 2013

El 02/10/13 10:01, Chris Wilson escribió:
> On Wed, Oct 02, 2013 at 09:22:18AM -0500, Alex Villací­s Lasso wrote:
>> El 02/10/13 05:19, Chris Wilson escribió:
>>> On Tue, Oct 01, 2013 at 06:15:00PM -0500, Alex Villací­s Lasso wrote:
>>>> I have seen this graphics freeze under stock 3.10.x from the Fedora
>>>> 18 x86_64 distro, and also with vanilla compiled 3.11 and 3.12-rc3.
>>>> After a few hours of working, the screen stops updating. The mouse
>>>> pointer moves around and changes if moved over different parts of
>>>> the screen, but the display itself does not change anymore. If I
>>>> check /sys/kernel/debug/dri/0/i915_error_state right then (via a
>>>> remote ssh), there is no error captured. However, if I do "echo 1 >
>>>> /sys/kernel/debug/dri/0/i915_wedged", after a few moments an error
>>>> is captured, as well as messages in the kernel log, both of which
>>>> are attached. If I try to restart the gnome-shell session, I get the
>>>> KMS console, and then the start of the graphic login, but then the
>>>> graphic login itself freezes again.
>>>> Is the attached information enough to diagnose the issue?
>>> Afaict it was a userspace hang, the GPU was rightfully idle. Only on the
>>> reset did it actually die.
>> If I do "echo 1 > /sys/kernel/debug/dri/0/i915_wedged" when the display is not frozen, I only get the following in dmesg, and the system keeps working normally:
>> [  323.441616] [drm] Manually setting wedged to 1
>> [  323.441622] [drm] capturing error event; look for more information in /sys/class/drm/card0/error
>> [  348.955655] [drm] Manually setting wedged to 0
>> Is it to be expected that an "userspace hang" will escalate into a failed reset when setting i915_wedged to 1, without anything being actually wrong at the kernel side, at least at first?
> Yes, your chipset is notorious for not being able to restart the rings.
> We've added a few attempts to workaround the issue, but I'm not
> surprised if it still occasionally fails.
>>> I'd suggest looking at the stacktraces of the usual suspects and see who
>>> is waiting upon whom, or if there is a more obvious lockup. Then begin
>>> the painful process of tracing the interoperation of those two processes
>>> to try and catch the breakdown.
>>> -Chris
>> I think Xorg is one of the "usual suspects". Should gnome-shell be one too? This is a Fedora 18 desktop with gnome-shell as installed from the DVD.
> X and gnome-shell are the two responsible for working together and
> presenting your desktop, so would definitely be the first to check for
> an error.
> -Chris
I got the hang again in kernel-3.10.13-101.fc18.x86_64. I switched to the KMS text console with Ctrl-Alt-F2, downloaded all of the debuginfo packages, and got this backtrace with "gdb -batch -ex bt -p `pidof gnome-shell`".

[New LWP 2217]
[New LWP 2182]
[New LWP 2035]
[New LWP 2034]
[New LWP 2033]
[New LWP 2032]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007febbcca7a5d in poll () at ../sysdeps/unix/syscall-template.S:81
#0  0x00007febbcca7a5d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007febbb943f42 in poll (__timeout=-1, __nfds=1, __fds=0x7fffbc0cefd0) at /usr/include/bits/poll2.h:46
#2  _xcb_conn_wait (c=c at entry=0x19d4130, cond=cond at entry=0x7fffbc0cf040, vector=vector at entry=0x0, count=count at entry=0x0) at xcb_conn.c:414
#3  0x00007febbb94543e in wait_for_reply (c=c at entry=0x19d4130, request=154086, e=e at entry=0x0) at xcb_in.c:399
#4  0x00007febbb94564b in xcb_wait_for_reply (c=c at entry=0x19d4130, request=154086, e=e at entry=0x0) at xcb_in.c:429
#5  0x00007febbbb5a435 in xcb_dri2_wait_msc_reply (c=c at entry=0x19d4130, cookie=..., e=e at entry=0x0) at dri2.c:1619
#6  0x00007febc1f55b32 in dri2WaitForMSC (pdraw=0x310b180, target_msc=0, divisor=2, remainder=<optimized out>, ust=0x7fffbc0cf158, msc=0x7fffbc0cf160, sbc=0x7fffbc0cf168) at dri2_glx.c:473
#7  0x00007febc1f3074b in __glXWaitVideoSyncSGI (divisor=2, remainder=0, count=0x7fffbc0cf1ac) at glxcmds.c:1850
#8  0x00007febbf396fcd in _cogl_winsys_wait_for_vblank () at winsys/cogl-winsys-glx.c:1143
#9  0x00007febbf397ffc in _cogl_winsys_onscreen_swap_region (onscreen=<optimized out>, user_rectangles=0x7fffbc0cf2a0, n_rectangles=1) at winsys/cogl-winsys-glx.c:1270
#10 0x00007febbf38f978 in cogl_onscreen_swap_region (onscreen=0x310af10, rectangles=rectangles at entry=0x7fffbc0cf2a0, n_rectangles=n_rectangles at entry=1) at ./cogl-onscreen.c:181
#11 0x00007febbfa18571 in clutter_stage_cogl_redraw (stage_window=0x19c0cb0) at cogl/clutter-stage-cogl.c:482
#12 0x00007febbfa8341d in clutter_stage_do_redraw (stage=0x3108a40 [ClutterStage]) at ./clutter-stage.c:1170
#13 _clutter_stage_do_update (stage=0x3108a40 [ClutterStage]) at ./clutter-stage.c:1228
#14 0x00007febbfa67d3d in master_clock_update_stages (stages=0x5d37910 = {...}, master_clock=0x2f6f4a0 [ClutterMasterClock]) at ./clutter-master-clock.c:386
#15 clutter_clock_dispatch (source=source at entry=0x30961b0, callback=<optimized out>, user_data=<optimized out>) at ./clutter-master-clock.c:520
#16 0x0000003b5d847a55 in g_main_dispatch (context=0x19a95d0) at gmain.c:2715
#17 g_main_context_dispatch (context=context at entry=0x19a95d0) at gmain.c:3219
#18 0x0000003b5d847d88 in g_main_context_iterate (context=0x19a95d0, block=block at entry=1, dispatch=dispatch at entry=1, self=<optimized out>) at gmain.c:3290
#19 0x0000003b5d848182 in g_main_loop_run (loop=0x19b0020) at gmain.c:3484
#20 0x00007febc3979467 in meta_run () at core/main.c:545
#21 0x0000000000401e2c in main ()

