[Mesa-dev] [PATCH 2/5] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
Chris Wilson
chris at chris-wilson.co.uk
Tue Apr 10 16:03:52 UTC 2018
Quoting Scott D Phillips (2018-04-10 16:33:18)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > Quoting Chris Wilson (2018-04-05 20:54:54)
> > > Quoting Scott D Phillips (2018-04-03 21:05:42)
>
> [...]
>
> > > Ok, was hoping to see how you choose to use the streaming load, but I
> > > guess that's the next patch.
> > >
> > > Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> >
> > Oh, one point Eric Anholt mentioned on another thread about movntqda is
> > that stale data inside the internal buffer is not automatically
> > invalidated. We may need to emit explicit mfence before the copies if we
> > are in doubt. A single mfence per tiled-copy is probably not enough to
> > worry about optimising away.
>
> Looking around, I found this errata about movntdqa not honoring the
> ordering guarantees of locked instructions (VLP31 in the pdf):
>
> https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf
>
> So I added this code near the top of tiled_to_linear():
>
> if (mem_copy == (mem_copy_fn)_mesa_streaming_load_memcpy) {
> /* Various atom processors have errata where the movntdqa instruction
> * (which is used in streaming_load_memcpu) may incorrectly be reordered
> * before locked instructions. To work around that, we put an lfence
> * here to manually wait for preceeding loads to be completed.
> */
> __builtin_ia32_lfence();
> }
>
> It seems that an mfence won't suffice where the errata mentions you need
> the lfence, by my hazy understanding. Do I have that right, or should
> this be an mfence?
An lfence is a weaker version of mfence. We are not using locked
instructions for serialising access within the data, or at least not
from the perspective of serialising it with the GPU. Certainly it's not
been an issue for the kernel. *touch wood*
Note you can use _mm_*fence() to keep use a consistent instruction set.
-Chris
More information about the mesa-dev
mailing list