[Mesa-dev] [PATCH 2/5] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
Scott D Phillips
scott.d.phillips at intel.com
Tue Apr 10 15:33:18 UTC 2018
Chris Wilson <chris at chris-wilson.co.uk> writes:
> Quoting Chris Wilson (2018-04-05 20:54:54)
> > Quoting Scott D Phillips (2018-04-03 21:05:42)
[...]
> > Ok, was hoping to see how you choose to use the streaming load, but I
> > guess that's the next patch.
> >
> > Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
>
> Oh, one point Eric Anholt mentioned on another thread about movntqda is
> that stale data inside the internal buffer is not automatically
> invalidated. We may need to emit explicit mfence before the copies if we
> are in doubt. A single mfence per tiled-copy is probably not enough to
> worry about optimising away.
Looking around, I found this errata about movntdqa not honoring the
ordering guarantees of locked instructions (VLP31 in the pdf):
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf
So I added this code near the top of tiled_to_linear():
if (mem_copy == (mem_copy_fn)_mesa_streaming_load_memcpy) {
/* Various atom processors have errata where the movntdqa instruction
* (which is used in streaming_load_memcpu) may incorrectly be reordered
* before locked instructions. To work around that, we put an lfence
* here to manually wait for preceeding loads to be completed.
*/
__builtin_ia32_lfence();
}
It seems that an mfence won't suffice where the errata mentions you need
the lfence, by my hazy understanding. Do I have that right, or should
this be an mfence?
More information about the mesa-dev
mailing list