When using PBOs to upload texture data, which call triggers the actual DMA operation?
maraeo at gmail.com
Tue Aug 16 11:24:47 UTC 2016
OpenGL discussions should take place on mesa-dev.
The following only applies to r600 & radeonsi:
- glUnmapBuffer usually doesn't unmap buffers from the CPU address
space. This is an optimization, because mapping is a very slow
operation and we want to the next glMapBuffer call to be free.
- glUnmapBuffer can still execute a DMA transfer under certain
circumstances if it's beneficial (e.g. GL_MAP_INVALIDATE_RANGE_BIT),
but that's only an implementaton detail.
- In most cases, glUnmapBuffer doesn't execute any DMA operation.
- All memory writes should be treated as being immediately visible to
the GPU after glUnmapBuffer. Writes to persistent+coherent mappings
are guaranteed to be visible even before glUnmapBuffer.
- All buffers are treated equally. PBOs aren't treated in a special way here.
- glTexSubImage2D with a PBO source is basically a blit operation,
because it does a copy between a buffer and a texture.
- All GPU operations are asynchronous and don't cause any stalls on
the CPU side. You can call glUnmapBuffer, then glTexSubImage2D, then
draw with the texture without any other operations in between. The
only case when the driver will stall is the next glMapBuffer call.
On Fri, Aug 5, 2016 at 7:16 PM, Clemens Eisserer <linuxhippy at gmail.com> wrote:
> I am trying to better understand /optimize texture upload on r600 and
> GCN based GPUs.
> Currently I use PBOs to upload data generated by a worker thread to
> textures, using the following steps:
> 1. Unmap buffer n (from worker)
> 2. glTexSubImage2D n-1 to texture n-1
> 3. bin texture n-2 & draw & glutSwapBuffers
> 4. map buffer n-3 again and pass it to worker thread
> For each buffer only one step is executed per frame to avoid GPU stalls.
> However, after I had a look at radeon_gem_objects I am not sure this
> approach makes a lot of sence.
> All PBOs are located in system memory (GTT), so as far as I understand
> it, unmapping a PBO is actually a no-on and doesn't trigger any
> However, where is the actual DMA transfer triggerd - by
> glTexSubImage2D? And at which point the driver checks for DMA
> completion - at glutSwapBuffers?
> Furthermore, is it possible to perform async upload and rendering in
> parallel in case there are no data-dependencies?
> Some insights would be really great to better optimize the code.
> Thank you in advance & best regards, Clemens
> xorg-driver-ati mailing list
> xorg-driver-ati at lists.x.org
More information about the xorg-driver-ati