[RFC PATCH xserver] modesetting: re-set the crtc's mode when link-status goes BAD

Sun Feb 26 19:42:36 UTC 2017

On Fri, Feb 24, 2017 at 12:09:51PM -0800, Manasi Navare wrote:
> Hi Daniel,
> 
> We have ACKs on the userspace design from both Adams and Eric.
> Is this enough to merge the kernel patches?
> 
> I spoke to Eric briefly about this and he gave me a verbal r-b as well.
> He said the userspace patches cant get merged unless DRM patches are merged.
> 
> So what should be some of our next steps here?

Still needs review on the kernel patches themselves, iirc Jani still had
some opens on that. But I was out of the loop for 2 weeks :-)
-Daniel

> 
> Regards
> Manasi
> 
> 
> On Mon, Feb 13, 2017 at 01:05:17PM -0800, Eric Anholt wrote:
> > Martin Peres <martin.peres at linux.intel.com> writes:
> > 
> > > On 06/02/17 17:50, Martin Peres wrote:
> > >> On 03/02/17 10:04, Daniel Vetter wrote:
> > >>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> > >>>> On 01/02/17 22:05, Manasi Navare wrote:
> > >>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> > >>>>>> Jani Nikula <jani.nikula at linux.intel.com> writes:
> > >>>>>>
> > >>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric at anholt.net> wrote:
> > >>>>>>>> Martin Peres <martin.peres at linux.intel.com> writes:
> > >>>>>>>>
> > >>>>>>>>> Despite all the careful planing of the kernel, a link may become
> > >>>>>>>>> insufficient to handle the currently-set mode. At this point, the
> > >>>>>>>>> kernel should mark this particular configuration as being broken
> > >>>>>>>>> and potentially prune the mode before setting the offending
> > >>>>>>>>> connector's
> > >>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
> > >>>>>>>>> happen right after a modeset or later on.
> > >>>>>>>>>
> > >>>>>>>>> When available, we should use the link-status information to reset
> > >>>>>>>>> the wanted mode.
> > >>>>>>>>>
> > >>>>>>>>> Signed-off-by: Martin Peres <martin.peres at linux.intel.com>
> > >>>>>>>> If I understand this right, there are two failure modes being
> > >>>>>>>> handled:
> > >>>>>>>>
> > >>>>>>>> 1) A mode that won't actually work because the link isn't good
> > >>>>>>>> enough.
> > >>>>>>>>
> > >>>>>>>> 2) A mode that should work, but link parameters were too
> > >>>>>>>> optimistic and
> > >>>>>>>> if we just ask the kernel to set the mode again it'll use more
> > >>>>>>>> conservative parameters that work.
> > >>>>>>>>
> > >>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> > >>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
> > >>>>>>>> train again, and bring us back into this loop?  The only escape
> > >>>>>>>> that I
> > >>>>>>>> see would be some other userspace responding to the advertised
> > >>>>>>>> mode list
> > >>>>>>>> changing, and then asking X to modeset to something new.
> > >>>>>>>>
> > >>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
> > >>>>>>>> point, and only re-setting if our mode still exists?
> > >>>>>>> Disclaimer: I don't know anything about the internals of the
> > >>>>>>> modesetting
> > >>>>>>> driver.
> > >>>>>>>
> > >>>>>>> Perhaps we can identify the two cases now, but I'd put this more
> > >>>>>>> generally: if the link status has gone bad, it's an indicator to
> > >>>>>>> userspace that the circumstances may have changed, and userspace
> > >>>>>>> should
> > >>>>>>> query the kernel for the list of available modes again. It should no
> > >>>>>>> longer trust information obtained prior to getting the bad link
> > >>>>>>> status,
> > >>>>>>> including the current mode.
> > >>>>>>>
> > >>>>>>> But specifically, I think you're right, and AFAICT asking for the
> > >>>>>>> list
> > >>>>>>> of modes again is the only way for the userspace to distinguish
> > >>>>>>> between
> > >>>>>>> the two cases. I don't think there's a shortcut for deciding the
> > >>>>>>> current
> > >>>>>>> mode is still valid.
> > >>>>>> To avoid the busy-loop problem, I think I'd like this patch to
> > >>>>>> re-query
> > >>>>>> the kernel to ask about the current mode list, and only try to re-set
> > >>>>>> the mode if our mode is still there.
> > >>>>>>
> > >>>>>> If the mode isn't there, then it's up to the DE to take action in
> > >>>>>> response to the notification of new modes.  If you don't have a DE to
> > >>>>>> take appropriate action, you're kind of out of luck.
> > >>>>>>
> > >>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
> > >>>>>> about ABI was having to walk all the connectors on every uevent to see
> > >>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
> > >>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
> > >>>>>> the
> > >>>>>> kernel could go ahead with the current plan.
> > >>>>> Yes I agree. The kernel sets the link status BAD as soona s link
> > >>>>> training fails
> > >>>>> but does not prune the modes until a new modelist is requested by
> > >>>>> the userspace.
> > >>>>> So this patch should re query the mode list as soon as it sees the
> > >>>>> link status
> > >>>>> BAD in order for the kernel to validate the modes again based on new
> > >>>>> link
> > >>>>> paarmeters and send a new mode list back.
> > >>>> Seems like a bad behaviour from the kernel, isn't it? It should return
> > >>>> immediatly
> > >>>> if the mode is gonna be pruned :s
> > >>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
> > >>> validate requested modes against that. The mode list is purely for
> > >>> userspace's information. Which means updating it only when userspace asks
> > >>> for it is fine.
> > >>
> > >> Hmm, ok. Fair enough!
> > >>
> > >>> I also thought some more about the loop when we're stuck on BAD, and I
> > >>> think it shouldn't happen: When the link goes BAD we update the link
> > >>> parameter values to the new limits, and the kernel will reject any mode
> > >>> that won't fit anymore. Of course you might be unlucky, and the new link
> > >>> limits are also not working, but eventually you're stuck at the "you're
> > >>> link is broken, sry" stage, where the kernel rejects everything :-)
> > >>>
> > >>> So I think the busy-loop has strictly a limited amount of time until it
> > >>> runs out of steam.
> > >>
> > >> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> > >> on IRC.
> > >>
> > >> My current proposal is based on the idea that the kernel should reject a
> > >> mode
> > >> it knows it cannot set. This is already largely tested in real life: Try
> > >> setting 4K at 60Hz on an HDMI 1.x monitor, it should immediately fail on
> > >> prepare(). For this proposal to work, we would need to put in the ABI
> > >> that a
> > >> driver that sets the link-status to BAD should also make sure that whatever
> > >> the userspace does, no infinite loop should be created (by changing the
> > >> maximum link characteristics before setting the link-status property).
> > >>
> > >> Re-probing the list of modes and checking if the mode is still in there is
> > >> inherently racy, as a new screen may be plugged between the moment the list
> > >> is sent to the userspace and the moment when we try setting the mode. Or
> > >> the
> > >> DE may also have changed the mode in the mean time and the kernel would
> > >> have reduced the limits even more. So, the userspace cannot be expected to
> > >> always do the right thing here :s.
> > >>
> > >> From this point of view, I really do not like the idea of re-probing, since
> > >> it increases the delay before the DE can handle the change and it does not
> > >> bring any guarantee of working properly.
> > >>
> > >> Did I miss anything? Any opinions?
> > >
> > > Any comments on this, Eric? Does it sound logical to you or did I miss 
> > > something?
> > >
> > > The kernel patches are stuck until this patch gets in. So far, you look 
> > > like the only person who would be willing to review this patch (Ajax 
> > > acked the patch, but that's the extent he was willing to go).
> > 
> > I was just trying to provide review to get the kernel unstuck.  The
> > kernel should not be blocked until the patch gets lands (this obviously
> > isn't the case with ioctls, which *don't* land in userspace until kernel
> > does), you just need userspace published and generally agreed that the
> > ABI works.
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch