[Xf86-video-armsoc] 轉寄: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc (571146)

Rich Su Rich.Su at arm.com
Fri Dec 20 07:33:28 PST 2013


Hi,

Please have a review for below code change suggestion from Samsung DTV.

Thanks,
Rich

寄件者: hoseon.kim [mailto:hoseon.kim at samsung.com]
寄件日期: Friday, December 20, 2013 10:51 PM
收件者: support-mali; 'Marco Starace'; 'Ravi Agnihotri'; 'Jeba Samuel'
副本: '무랄리'; '비핀'; '서주원'; '이승은'; Rich Su; Norman Evanson; 김호선
主旨: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc (571146)

Hi Norman,

When I check dma-buf API Guide document which is attached, there is a mention like this…

< DMA Buffer Sharing API Guide >
…
- In order to avoid fd leaks on exec, the FD_CLOEXEC flag must be set
  on the file descriptor.  This is not just a resource leak, but a
  potential security hole.  It could give the newly exec'd application
  access to buffers, via the leaked fd, to which it should otherwise
  not be permitted access.

  The problem with doing this via a separate fcntl() call, versus doing it
  atomically when the fd is created, is that this is inherently racy in a
  multi-threaded app[3].  The issue is made worse when it is library code
  opening/creating the file descriptor, as the application may not even be
  aware of the fd's.

  To avoid this problem, userspace must have a way to request O_CLOEXEC
  flag be set when the dma-buf fd is created.  So any API provided by
  the exporting driver to create a dmabuf fd must provide a way to let
  userspace control setting of O_CLOEXEC flag passed in to dma_buf_fd().


According to this description, I think that armsoc_bo_set_dmabuf() should use O_CLOEXEC when it try to get dma_buf fd.
So, the code need to be changed like this…

[ Before ]
           /* Try to get dma_buf fd */
           prime_handle.handle = bo->handle;
           prime_handle.flags  = 0;
           res  = drmIoctl(bo->dev->fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &prime_handle);

[ After ]
           /* Try to get dma_buf fd */
           prime_handle.handle = bo->handle;
           prime_handle.flags  = O_CLOEXEC;
           res  = drmIoctl(bo->dev->fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &prime_handle);

Could you please check if it is correct or not with your xf86_video_armsoc manager?
And, is it possible to check with him, if this usage(without O_CLOEXEC) can make the problem case that I mentioned previous mail (below)?

I will be happy if you can give feedback for my question in today.

Best Regards,
Hoseon
From: hoseon.kim [mailto:hoseon.kim at samsung.com]
Sent: Thursday, December 19, 2013 7:52 PM
To: 'support-mali'; 'Marco Starace'; 'Ravi Agnihotri'; 'Jeba Samuel'
Cc: '무랄리'; '비핀'; '서주원'; '이승은'; 'Rich Su'; 'Norman Evanson'; 김호선
Subject: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc (571146)

Hi Eason,

Please find the call stack analysis information from kernel team.
If you find something valuable, please let me know.

Best Regards,
Hoseon

-       Kernel oops came from locks_remove_posix.

[14:54:15.511]     [   11.422439] PC is at locks_remove_posix+0x18/0x30
[14:54:15.515]     [   11.427193] LR is at filp_close+0x60/0x8c
[14:54:15.519]     [   11.431244] pc : [<c018ca9c>]    lr : [<c0148a78>]    psr: 40000013
[14:54:15.525]     [   11.431244] sp : e1bbdf30  ip : e1bbdf40  fp : e1bbdf3c
[14:54:15.530]     [   11.442868] r10: 00000000  r9 : e1bbc000  r8 : e598d040
[14:54:15.536]     [   11.448153] r7 : e5a33f00  r6 : e598d000  r5 : 00000000  r4 : e5a33f00
[14:54:15.543]     [   11.454758] r3 : 00000000  r2 : e1bbdf50  r1 : e598d000  r0 : e5a33f00
[14:54:15.549]     [   11.461364] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[14:54:15.556]     [   11.468586] Control: 30c53c7d  Table: bd9fb9c0  DAC: 55555555
[14:54:15.563]     [   11.474399] Process XOrgLauncherThr (pid: 374, stack limit = 0xe1bbc238)
[14:54:15.569]     [   11.481180] Stack: (0xe1bbdf30 to 0xe1bbe000)
[14:54:15.573]     [   11.485585] df20:                                     e1bbdf5c e1bbdf40 c0148a78 c018ca90
[14:54:15.581]     [   11.493864] df40: 00000000 0000001b e598d000 e598d008 e1bbdf94 e1bbdf60 c0165f9c c0148a24
[14:54:15.590]     [   11.502143] df60: e1bbdfa4 00000000 00000001 a341a230 00000001 be3fe504 00000006 c0013324
[14:54:15.598]     [   11.510421] df80: e1bbc000 00000000 e1bbdfa4 e1bbdf98 c01489f0 c0165e98 00000000 e1bbdfa8
[14:54:15.606]     [   11.518700] dfa0: c00130a0 c01489cc a341a230 00000001 0000001b aa5624d4 aa562910 00000001
[14:54:15.615]     [   11.526979] dfc0: a341a230 00000001 be3fe504 00000006 0051a77c 00000012 0051a784 aa561b1c
[14:54:15.623]     [   11.535258] dfe0: 00000000 aa561b08 ab4b7fa4 ab4b7fb4 80000010 0000001b ffffffff ffffffff
[14:54:15.632]     [   11.543534] Backtrace:
[14:54:15.633]     [   11.546006] [<c018ca84>] (locks_remove_posix+0x0/0x30) from [<c0148a78>] (filp_close+0x60/0x8c)
[14:54:15.643]     [   11.554814] [<c0148a18>] (filp_close+0x0/0x8c) from [<c0165f9c>] (__close_fd+0x110/0x1a4)
[14:54:15.650]     [   11.563087]  r6:e598d008 r5:e598d000 r4:0000001b r3:00000000
[14:54:15.657]     [   11.568816] [<c0165e8c>] (__close_fd+0x0/0x1a4) from [<c01489f0>] (sys_close+0x30/0x58)
[14:54:15.664]     [   11.576919] [<c01489c0>] (sys_close+0x0/0x58) from [<c00130a0>] (ret_fast_syscall+0x0/0x48)
[14:54:15.673]     [   11.585372] Code: e24cb004 e52de004 e8bd4000 e590300c (e5933028)
[14:54:15.680]     [   11.591538] [SELP] while loop ... please attach T32...




Because dentry is 0 in file (anon idnoe file), this kernel oops problem comes.

-001|filp_close(
    |    filp = 0xE5A33F00 -> (
    |      f_u = (fu_list = (next = 0xE5A33540, prev = 0xC014BA0C), fu_rcuhead =
    |      f_path = (mnt = 0x0, dentry = 0x0),
    |      f_op = 0xC03E389C,
    |      f_lock = (rlock = (raw_lock = (slock = 0x0, tickets = (owner = 0x0, n
    |      f_sb_list_cpu = 0x0,
    |      f_count = (counter = 0x1),
    |      f_flags = 0x0,
    |      f_mode = 0x1,


 It seems that this call comes from armsoc_bo_clear_dmabuf.

void armsoc_bo_clear_dmabuf(struct mali_bo *bo)
{
        assert(bo->refcnt > 0);
        assert(mali_bo_has_dmabuf(bo));

        close(bo->dmabuf);
        bo->dmabuf = -1;
}



From: support-mali [mailto:support-mali at arm.com]<mailto:[mailto:support-mali at arm.com]>
Sent: Thursday, December 19, 2013 12:38 PM
To: hoseon.kim; 'Marco Starace'; 'Ravi Agnihotri'; 'Jeba Samuel'
Cc: support-mali; '무랄리'; '비핀'; '서주원'; '이승은'; Rich Su; Norman Evanson
Subject: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc (571146)


[From Eason Tang - ARM Technical Support]



Hi Hoseon,



This Kernel oops may be caused in kernel instead of in armsoc which is in user space. Armsoc may be able to use/call the kernel from user space with any values and the kernel driver should handle the problem.

The problem described by this case seems to be initially in the drm driver (or possibly kbase). Did you get chance to check the if the “refcount” is incorrect? I would suggest to review the drm driver first for the possible issue.

As I know, Samsung DTV may have more than one Xserver running with Samsung application. How many Xserver/views are running when reproducing this issue? Any specific procedure to see this issue?

Is there any log showing kernel crash with full call stack that can be shared to check further?





Thanks,

-Eason



//---------------------------------------------------------------------------------------
From: hoseon.kim [mailto:hoseon.kim at samsung.com]
Sent: Wednesday, December 18, 2013 11:09 PM
To: Eason Tang; Norman Evanson; Rich Su; 'Marco Starace'; 'Ravi Agnihotri'; 'Jeba Samuel'
Cc: support-mali; '무랄리'; '비핀'; '서주원'; '이승은'; 김호선
Subject: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc

Hi Norman, Eason,

Please find the following analysis …
Really, this problem is coming when dma_buf_lock is not used, also.

So, please review the possibility of this problem with original xf86-video-armsoc driver itself.
I expect your feedback today with some debug patch to try, it is very urgent issue.

Best Regards,
Hoseon
From: hoseon.kim [mailto:hoseon.kim at samsung.com]
Sent: Wednesday, December 18, 2013 11:03 PM
To: 'Eason Tang'; 'Norman Evanson'; 'Rich Su'; 'Marco Starace'; 'Ravi Agnihotri'; 'Jeba Samuel'
Cc: 'support-mali'; '무랄리'; '비핀'; '서주원'; '이승은'; 김호선
Subject: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc

Hi Eason,

When we check the dma_buf_lock patch from ARM, we are not sure all the resources are protected correctly.
Could you please review your dma_buf_lock patch in this point of view?
And, if there are some suspected code, please let me know.

Please find the dma_buf_lock patch from ARM :
http://connect.arm.com/dropzone/samsung-lsi-mali-project/dma_buf_lock_wk48_v2.patch

In dma_buf_lock_dolock() function of dma_buf_lock.c,
All the kref_put() is protected by mutex, but kref_get() is not protected … is it no problem?

Best Regards,
Hoseon
From: Eason Tang HYPERLINK "mailto:[mailto:Eason.Tang at arm.com]"[mailto:Eason.Tang at arm.com]
Sent: Wednesday, December 18, 2013 7:02 PM
To: hoseon.kim; Norman Evanson; Rich Su; 'Marco Starace'; 'Ravi Agnihotri'; 'Jeba Samuel'
Cc: support-mali; '무랄리'; '비핀'; '서주원'; '이승은'
Subject: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc

Hi Hoseon,

I just checked the engineer who is in charge of xf86-video-armsoc, he has got this reported issue from “xf86-video-armsoc” mail list as I attached previously for your reference. We are checking it now.

We will suggest to join the mailing list and report bugs or ask questions through that.

Thanks,
-Eason

From: hoseon.kim HYPERLINK "mailto:[mailto:hoseon.kim at samsung.com]"[mailto:hoseon.kim at samsung.com]
Sent: Wednesday, December 18, 2013 5:52 PM
To: Eason Tang; Norman Evanson; Rich Su; 'Marco Starace'; 'Ravi Agnihotri'; 'Jeba Samuel'
Cc: support-mali; '무랄리'; '비핀'; '서주원'; '이승은'; 김호선
Subject: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc

Hi Eason,

I have checked the latest xf86-video-armsoc code, and I could not find something meaningful.
If ARM engineer who is in charge of xf86-video-armsoc can help us, it will be better to get closer for the point of problem.

Could you please check if ARM xf86-video-armsoc engineer can go through this issue?

Best Regards,
Hoseon
From: Eason Tang [mailto:Eason.Tang at arm.com]
Sent: Wednesday, December 18, 2013 5:48 PM
To: HYPERLINK "mailto:hoseon.kim at samsung.com"hoseon.kim at samsung.com; Norman Evanson; Rich Su; Marco Starace; Ravi Agnihotri; Jeba Samuel

Cc: support-mali; 무랄리; 비핀; 서주원; 이승은
Subject: RE: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc

Hi Hoseon,

I just forwarded the reported issue to the below mail list, please refer to below links and mail list for xf86-video-armsoc.


Please be aware that we now have an armsoc git repo hosted by freedesktop:
http://cgit.freedesktop.org/xorg/driver/xf86-video-armsoc/

And we have a mailing list:
http://lists.x.org/mailman/listinfo/xf86-video-armsoc

Both the linaro and freedesktop repos are currently identical. Please move to the freedesktop location for the armsoc, and join the mailing list and report bugs or ask questions through that. You are also welcome to submit code changes on the mailing list for review, and comment on changes we make which will be posted on the mailing list.


Thanks,
-Eason

From: HOSEON KIM HYPERLINK "mailto:[mailto:hoseon.kim at samsung.com]"[mailto:hoseon.kim at samsung.com]
Sent: Wednesday, December 18, 2013 4:26 PM
To: Norman Evanson; Eason Tang; Rich Su; support-mali; Marco Starace; Ravi Agnihotri; Jeba Samuel
Cc: 무랄리; 비핀; 서주원; 이승은; 김호선
Subject: [urgent] dmabuf close error and kernel oops from xf86-video-armsoc

Hi Norman, Eason, Rich,

Recentry, I received many NULL pointer access error reports from product team.
It always shows the same error from xf86-video-armsoc driver.
Could you please assign xf86-video-armsoc engineer, and check the following error case with highest priority?

< armsoc_dumb.c >
void mali_bo_clear_dmabuf(struct mali_bo *bo)
{
        assert(bo->refcnt > 0);
        assert(mali_bo_has_dmabuf(bo));
        close(bo->dmabuf);
        bo->dmabuf = -1;
}

close -> ... -> filp_close -> locks_remove_posix -> filp->f_path.dentry ==> kernel oops because it is NULL

Please check if you have any previous history like this, and let me know which approach will be better to debug it.
It is reproduced very rarely (once or twice in a day), so your quick response will be very helpful to get more detail debug iinformation.


Best Regards,
Hoseon

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.x.org/archives/xf86-video-armsoc/attachments/20131220/c545644a/attachment-0001.html>


More information about the Xf86-video-armsoc mailing list