This example is a self-contained use of the reduce
primitive, meant to plot performance. This builds on the simpler
functionality example. Set your
parameters in the pane and click "Start" to run and plot performance data
for a WebGPU reduce. The inputCount input specifies
how many different input lengths to run, which will be evenly
(logarithmically) interpolated between the specified start and end
lengths. Otherwise, the parameters are the same as in the
functionality example. This
example explains
how to time a Gridwise primitive.
The entire JS source file is in github.
To measure CPU and/or GPU timing, include a timing directive in the call
to primitive.execute. Typically we call the primitive once
without any timing information to handle warmup effects (e.g., compiling
the kernel) and then call the kernel many times and average the runtimes
of that second set of calls. We then average the total runtime over the
number of trials.
/* call the primitive once to warm up */
await primitive.execute({
inputBuffer: memsrcBuffer,
outputBuffer: memdestBuffer,
});
/* call params.trials times */
await primitive.execute({
inputBuffer: memsrcBuffer,
outputBuffer: memdestBuffer,
trials: params.trials, /* integer */
enableGPUTiming: true,
enableCPUTiming: true,
});
We can get timing information back from the primitive with a
getResults call. The GPU time might be an array of timings if
the GPU call has multiple kernels within it. In the below example, we
simply flatten that array by adding it up into a total time.
let { gpuTotalTimeNS, cpuTotalTimeNS } = await primitive.getTimingResult();
if (gpuTotalTimeNS instanceof Array) {
// gpuTotalTimeNS might be a list, in which case just sum it up
gpuTotalTimeNS = gpuTotalTimeNS.reduce((x, a) => x + a, 0);
}
averageGpuTotalTimeNS = gpuTotalTimeNS / params.trials;
averageCpuTotalTimeNS = cpuTotalTimeNS / params.trials;
The reduce primitive computes a single output value from an
input array using a binary operation (such as add, max, or min). This makes
it simpler to time than sort (which overwrites its input) since the input
remains unchanged after each execution.