Demo Programming Guide
This document describes the current interface of the zkCuda demo. The related code can be found in zkcuda. A complete example is available in zkcuda_1.rs.
Kernel Function Definition
An example of a kernel function is as follows:
#[kernel]
fn add_2_macro<C: Config>(api: &mut API<C>, a: &[InputVariable; 2], b: &mut OutputVariable) {
*b = api.add(a[0], a[1]);
}
This function is similar to the sub-circuit function in the Go frontend.
It is roughly equivalent to the following Cuda function:
__global__ void add_2_kernel(int* input, int* output, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
int input_idx = idx * 2;
output[idx] = input[input_idx] + input[input_idx + 1];
}
}
The macro #[kernel]
rewrites your function's parameter section and generates a function for compiling the kernel. The definition above will compile to the following two functions:
fn add_2_macro<C: Config>(api: &mut API<C>, a: &Vec<InputVariable>, b: &mut OutputVariable) {
// implementation
}
fn compile_add_2_macro<C: Config>() {
// implementation
}
Compilation
Before using the kernel, it needs to be compiled. The example kernel above can be compiled as follows:
let kernel_add_2: Kernel<M31Config> = compile_add_2_macro().unwrap();
The macro #[kernel]
has done almost all the work for you; you just need to call the function with the compile_
prefix.
Context
The context automatically maintains the existing proof and commits the input variables. It provides the following functions:
impl<C: Config, P: ProvingSystem<C>> Default for Context<C, P> {
fn default() -> Self {
// Implementation
}
}
impl<C: Config, P: ProvingSystem<C>> Context<C, P> {
pub fn copy_to_device(&mut self, host_memory: &[C::CircuitField]) -> DeviceMemoryHandle {
// Implementation
}
pub fn copy_to_host(&self, device_memory_handle: DeviceMemoryHandle) -> Vec<C::CircuitField> {
// Implementation
}
pub fn call_kernel(
&mut self,
kernel: &Kernel<C>,
ios: &mut [Option<DeviceMemoryHandle>],
parallel_count: usize,
is_broadcast: &[bool],
) {
// Implementation
}
pub fn to_proof(self) -> CombinedProof<C, P> {
// Implementation
}
}
The call_kernel
function here is relatively long. In addition to the kernel itself, it requires a few other parameters. parallel_count
specifies how many zk threads will run in parallel. is_broadcast
determines how each parameter will be distributed. If a parameter's is_broadcast
is true
, each zk thread will receive the same input; otherwise, the input provided by the user will be divided into parallel_count
parts, with each zk thread receiving one part.
For example, suppose a kernel requires input lengths of 2, 4, 4
, and parallel_count = 8
, is_broadcast = [false, true, false]
. In this case, the user needs to provide three inputs with lengths of 16, 4, 32
, respectively.
Kernel API (ExpanderCompilerCollection)
The compiler APIs that can be used inside a kernel are the same as those used in regular circuits. You can learn more from Rust APIs.
Complete Example
Here's an example of how to use this CUDA-like circuit frontend:
See zkcuda_1.rs.
This example also introduces a method that does not rely on #[kernel]
for definition, which is more cumbersome. In fact, #[kernel]
is just syntactic sugar for this method.