[Nouveau] Optimus HDMI hotplug fails with "DRM: Dropped ACPI reprobe event due to RPM error: -22"

Craig Ringer ringerc at ringerc.id.au
Thu Feb 18 03:06:00 UTC 2021


Hi all

I'm trying to get HDMI hotplug working on my Lenovo T15g laptop with
Optimus graphics. HDMI works when plugged in at boot, but does not work
when hotplugged after boot, or when hot-unplugged then re-plugged. The
external display is not detected, its status remains 'disconnected' in
sysfs, and the display stays in what looks like DPMS-off state.

NOTE: This is a PRELIMINARY problem report and request for advice or
comment. I'm on a recent Fedora kernel but still need to try latest
mainline + nouveau. I still need to capture detailed debug logs from
nouveau, drm, kms, etc. And while writing the report I found an i915 config
issue I need to retry without. So this is mostly google-help for others
right now.

VERSIONS AND DEVICES
====

Kernel and nouveau version: 5.10.15-200.fc33.x86_64 with the bundled
nouveau driver. (I'll try latest mainline soon).

Video hardware:
  * GeForce RTX 2070 SUPER Mobile (PCI ID 10de:1e91)
  * Intel CometLake-H GT2 (PCI ID 8086:9bc4)

Laptop: Lenovo T15g. DMI identifies it as: LENOVO 20URCTO1WW/20URCTO1WW,
BIOS N30ET33W (1.16 ) 12/17/2020

I believe this is a muxless design with the external outputs under control
of the NVidia card, as the Intel card only has one output in
/sys/bus/drm/card0/ and the external display doesn't work (even when
attached at boot) if I blacklist the nouveau module.

BEHAVIOUR
============

An external HDMI display is only detected and used if it's attached before
boot. If hotplugged later instead it isn't detected and

    DRM: Dropped ACPI reprobe event due to RPM error: -22

is printed to dmesg.

"RPM error -22" is -EINVAL. AFAICS this is probably coming from the
rpm_resume() function [1] as called by __pm_runtime_resume() by
pm_runtime_get() by nouveau_display_acpi_ntfy() [2]. I haven't tracked it
down further yet - I'll do some perf probing and report back in a followup
post.

IIRC (need to repeat and verify) once hot-unplugged, the display won't
re-detect, even if it was connected at boot. Connecting it while the
machine is in S3 sleep doesn't help, it still doesn't get (re)detected on
resume.

    echo 'detect' > card1-HDMI-A-1/status

has no apparent effect - no message is printed to dmesg (default log level)
and the monitor isn't detected.

TAINTED KERNEL
============

While collecting info for this report, I noticed that I am still running
with some non-default i915 options from my old (non-hybrid-graphics)
laptop. I'll have to reboot without those to verify these i915 options
aren't the cause:

[    3.403694] Setting dangerous option enable_guc - tainting kernel
[    3.404506] Setting dangerous option enable_fbc - tainting kernel
[    3.405306] Setting dangerous option enable_dc - tainting kernel

I'll be sure to update once I disable these, but I'll post now. If nothing
else, it might help someone else.

NOUVEAU TIMEOUTS IN DMESG
============

I also noticed some nouveau related output in the kernel logs - I think
from the first suspend, or possibly the first HDMI unplug. I'll need to
verify this later. There are also some xhci_hcd messages that may or may
not be relevant. I'll include longer excerpts at the end of the post but
the basics are:

[25877.621114] nouveau 0000:01:00.0: timeout
[25877.621289] WARNING: CPU: 14 PID: 73556 at
drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247
nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25877.621631] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25877.621680] Call Trace:
[25877.621754]  gm200_acr_hsfw_boot+0xc3/0x160 [nouveau]
[25877.621782]  ? mutex_lock+0xe/0x30
[25877.621849]  nvkm_acr_hsf_boot+0x85/0xe0 [nouveau]
[25877.621916]  nvkm_acr_fini+0x25/0x30 [nouveau]
[25877.621984]  nvkm_subdev_fini+0x59/0xb0 [nouveau]
[25877.622100]  nvkm_device_fini+0x79/0x110 [nouveau]
[25877.622215]  nvkm_udevice_fini+0x47/0x60 [nouveau]
[25877.622277]  nvkm_object_fini+0xbc/0x150 [nouveau]
[25877.622343]  nvkm_object_fini+0x73/0x150 [nouveau]
[25877.622464]  nouveau_do_suspend+0x107/0x180 [nouveau]
[25877.622583]  nouveau_pmops_runtime_suspend+0x3b/0xb0 [nouveau]
[25877.622597]  pci_pm_runtime_suspend+0x5e/0x170
...

then

[25877.622741] nouveau 0000:01:00.0: acr: unload binary failed
[25877.946511] nouveau 0000:01:00.0: fifo: fault 00 [VIRT_READ] at
00000000000bd000 engine c0 [BAR2] client 07 [HUB/HOST_CPU] reason 0d
[REGION_VIOLATION] on channel -1 [01ffedf000 unknown]
[25913.829849] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at
00000000004df000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02
[PTE] on channel -1 [01ffedf000 unknown]

then

[25913.930365] nouveau 0000:01:00.0: timeout
[25913.930426] WARNING: CPU: 5 PID: 2395 at
drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247
nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25913.930511] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x96/0xa0 [nouveau]
...
[25913.930523] Call Trace:
[25913.930540]  gm200_acr_hsfw_boot+0xc3/0x160 [nouveau]
[25913.930543]  ? mutex_lock+0xe/0x30
[25913.930558]  nvkm_acr_hsf_boot+0x85/0xe0 [nouveau]
[25913.930573]  tu102_acr_init+0x15/0x30 [nouveau]
[25913.930587]  nvkm_acr_load+0x2b/0xd0 [nouveau]
[25913.930589]  ? ktime_get+0x38/0xa0
[25913.930603]  nvkm_subdev_init+0x92/0xd0 [nouveau]
[25913.930604]  ? ktime_get+0x38/0xa0
[25913.930629]  nvkm_device_init+0x10b/0x190 [nouveau]
[25913.930656]  nvkm_udevice_init+0x41/0x60 [nouveau]
[25913.930676]  nvkm_object_init+0x3e/0x100 [nouveau]
[25913.930690]  nvkm_object_init+0x6f/0x100 [nouveau]
[25913.930703]  nvkm_object_init+0x6f/0x100 [nouveau]
[25913.930729]  nouveau_do_resume+0x2b/0xc0 [nouveau]
[25913.930755]  nouveau_pmops_runtime_resume+0x7a/0x150 [nouveau]
[25913.930760]  pci_pm_runtime_resume+0xaa/0xc0
[...]
[25913.930806]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[...]
[25913.930820] nouveau 0000:01:00.0: acr: AHESASC binary failed
[25913.930821] nouveau 0000:01:00.0: acr: init failed, -110
[25913.930958] nouveau 0000:01:00.0: init failed with -110
[25913.930959] nouveau: systemd-logind[1510]:00000000:00000080: init failed
with -110
[25913.930960] nouveau: DRM-master:00000000:00000000: init failed with -110
[25913.930961] nouveau: DRM-master:00000000:00000000: init failed with -110
[25913.930963] nouveau 0000:01:00.0: DRM: Client resume failed with error:
-110
[25913.930963] nouveau 0000:01:00.0: DRM: resume failed with: -110

I'll do some poking around with perf, capture some ACPI state and verbose
nouveau + drm kernel logs for both attached-at-boot and detached-at-boot
cases, etc, then post a big diagnostics bundle in a bit. But I thought I'd
keep this initial report short-ish. I'll include some basic diag info below
though.

URL REFERENCES
============

URLs referenced:

[1]
https://github.com/torvalds/linux/blob/521b619acdc8f1f5acdac15b84f81fd9515b2aff/drivers/base/power/runtime.c#L702

[2]
https://github.com/torvalds/linux/blob/93b694d096cc10994c817730d4d50288f9ae3d66/drivers/gpu/drm/nouveau/nouveau_display.c#L530

BASIC DIAGNOSTICS
============

Basic diagnostics, when display physically connected (DVI-D -> HDMI) but
not detected by nouveau:

$ ls /sys/class/drm
card0  card0-eDP-1  card1  card1-DP-1  card1-DP-2  card1-DP-3  card1-eDP-2
 card1-HDMI-A-1  renderD128  renderD129  ttm  version

$ for f in */status; do printf "%s: %s\n" "$f" "$(cat $f)"; done
card0-eDP-1/status: connected
card1-DP-1/status: disconnected
card1-DP-2/status: disconnected
card1-DP-3/status: disconnected
card1-eDP-2/status: disconnected
card1-HDMI-A-1/status: disconnected

$ dmesg | tail -n 2
[42147.075025] nouveau 0000:01:00.0: DRM: Dropped ACPI reprobe event due to
RPM error: -22
[42151.153559] nouveau 0000:01:00.0: DRM: Dropped ACPI reprobe event due to
RPM error: -22

# for p in /sys/module/nouveau/parameters/*; do printf "%s: %s\n"
"$(basename $p)" "$(cat $p)"; done
[sudo] password for craig:
atomic: 0
config: (null)
debug: (null)
duallink: 1
fbcon_bpp: 0
hdmimhz: 0
ignorelid: 0
modeset: -1
mst: 1
noaccel: 0
nofbaccel: 0
runpm: -1
tv_disable: 0
tv_norm: (null)
vram_pushbuf: 0

$ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.10.15-200.fc33.x86_64 [SNIP root dev args]
libata.allow_tpm=on systemd.unified_cgroup_hierarchy=0 rhgb

$ sudo  lspci -vvnnqPP -d 10de:1e91
00:01.0/01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104M
[GeForce RTX 2070 SUPER Mobile / Max-Q] [10de:1e91] (rev a1) (prog-if 00
[VGA controller])
Subsystem: Lenovo Device [17aa:22c3]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
....
Kernel driver in use: nouveau
Kernel modules: nouveau

# dmidecode
...
Processor Information
    ...
    Version: Intel(R) Core(TM) i9-10980HK CPU @ 2.40GHz
...
BIOS Information
    Vendor: LENOVO
    Version: N30ET33W (1.16 )
    Release Date: 12/17/2020
    ...
    BIOS Revision: 1.16
    Firmware Revision: 1.12
...
Port Connector Information
    Internal Reference Designator: Not Available
    Internal Connector Type: None
    External Reference Designator: Hdmi1
    External Connector Type: Other
    Port Type: Video Port

System Information
    Manufacturer: LENOVO
    Product Name: 20URCTO1WW
    Version: ThinkPad T15g Gen 1
    [snip serial number and uuid]
    SKU Number: LENOVO_MT_20UR_BU_Think_FM_ThinkPad T15g Gen 1
    Family: ThinkPad T15g Gen 1

I'll attach a detailed lspci, bigger excerpts from demesg, etc in a
followup to make sure I don't upset any mail filter.


-- 
Craig Ringer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20210218/d5f2e341/attachment-0001.htm>


More information about the Nouveau mailing list