[Mesa-dev] [PATCH v2] ac: Use DPP for build_ddxy where possible.

Wed May 23 16:37:17 UTC 2018

On 23.05.2018 15:30, Bas Nieuwenhuizen wrote:
> WQM is pretty reliable now on LLVM 7, so let us just use
> DPP + WQM.
> 
> This gives approximately a 1.5% performance increase on the
> vrcompositor built-in benchmark.
> 
> v2: Use ac_build_quad_swizzle.
> ---
>   src/amd/common/ac_llvm_build.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
> index 36c1d62637b..0c0228fe9c7 100644
> --- a/src/amd/common/ac_llvm_build.c
> +++ b/src/amd/common/ac_llvm_build.c
> @@ -1170,7 +1170,21 @@ ac_build_ddxy(struct ac_llvm_context *ctx,
>   	LLVMValueRef tl, trbl, args[2];
>   	LLVMValueRef result;
>   
> -	if (ctx->chip_class >= VI) {
> +	if (ctx->chip_class >= VI && HAVE_LLVM >= 0x0700) {

Do you really need the chip_class check here? ac_build_quad_swizzle 
should just use ds_swizzle on the older chips, right?

So all the code below can be removed once we drop support for LLVM < 7 
(which will of course be quite some time in the future, but hey!)

Apart from that,

Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>

> +		unsigned tl_lanes[4], trbl_lanes[4];
> +
> +		for (unsigned i = 0; i < 4; ++i) {
> +			tl_lanes[i] = i & mask;
> +			trbl_lanes[i] = (i & mask) + idx;
> +		}
> +
> +		tl = ac_build_quad_swizzle(ctx, val,
> +		                           tl_lanes[0], tl_lanes[1],
> +		                           tl_lanes[2], tl_lanes[3]);
> +		trbl = ac_build_quad_swizzle(ctx, val,
> +		                             trbl_lanes[0], trbl_lanes[1],
> +		                             trbl_lanes[2], trbl_lanes[3]);
> +	} else if (ctx->chip_class >= VI) {
>   		LLVMValueRef thread_id, tl_tid, trbl_tid;
>   		thread_id = ac_get_thread_id(ctx);
>   
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.