[Mesa-dev] [PATCH v2] ac: Use DPP for build_ddxy where possible.
Nicolai Hähnle
nhaehnle at gmail.com
Wed May 23 16:37:17 UTC 2018
On 23.05.2018 15:30, Bas Nieuwenhuizen wrote:
> WQM is pretty reliable now on LLVM 7, so let us just use
> DPP + WQM.
>
> This gives approximately a 1.5% performance increase on the
> vrcompositor built-in benchmark.
>
> v2: Use ac_build_quad_swizzle.
> ---
> src/amd/common/ac_llvm_build.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 36c1d62637b..0c0228fe9c7 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -1170,7 +1170,21 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
> LLVMValueRef tl, trbl, args[2];
> LLVMValueRef result;
>
> - if (ctx->chip_class >= VI) {
> + if (ctx->chip_class >= VI && HAVE_LLVM >= 0x0700) {
Do you really need the chip_class check here? ac_build_quad_swizzle
should just use ds_swizzle on the older chips, right?
So all the code below can be removed once we drop support for LLVM < 7
(which will of course be quite some time in the future, but hey!)
Apart from that,
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
> + unsigned tl_lanes[4], trbl_lanes[4];
> +
> + for (unsigned i = 0; i < 4; ++i) {
> + tl_lanes[i] = i & mask;
> + trbl_lanes[i] = (i & mask) + idx;
> + }
> +
> + tl = ac_build_quad_swizzle(ctx, val,
> + tl_lanes[0], tl_lanes[1],
> + tl_lanes[2], tl_lanes[3]);
> + trbl = ac_build_quad_swizzle(ctx, val,
> + trbl_lanes[0], trbl_lanes[1],
> + trbl_lanes[2], trbl_lanes[3]);
> + } else if (ctx->chip_class >= VI) {
> LLVMValueRef thread_id, tl_tid, trbl_tid;
> thread_id = ac_get_thread_id(ctx);
>
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list