XFB And Me

Getting Deeper

After getting the extensions and features enabled so things would run (more on this in a future post), I dove into the deep end of ntv because I was getting some crazy assert() errors.

Critical terminology for this post:

NIR - the compiler IR used in mesa
VTN - shorthand for spirv_to_nir, which is the compiler that translates SPIR-V back to NIR
SPIR-V - the IR used by Vulkan and OpenCL

What Even Is This

That was a question I asked myself repeatedly. The tl;dr is that to correctly translate OpenGL -> Vulkan, Zink needs to, in addition to mimicking the operations and behaviors of GL in Vulkan, do this amazing dance where it takes GLSL, compiles it to NIR, translates it to SPIR-V like the underlying Vulkan driver expects, compiles it back into (hopefully identical-ish) NIR, and then finally sends it to the underlying driver’s GPU compiler.

Also Vulkan uses a different coordinate space with an inverted (compared to GL) Y axis and different Z range, so that’s something to keep in mind.

Thus began a crash course in learning NIR and SPIR-V somewhat simultaneously.

After Considerable Struggle, It Happened

Here’s the most significant blocks:

/* for streamout create new outputs, as streamout can be done on individual components,
   from complete outputs, so we just can't use the created packed outputs */
static void
emit_so_info(struct ntv_context *ctx, unsigned max_output_location,
             const struct pipe_stream_output_info *so_info)
{
   for (unsigned i = 0; i < so_info->num_outputs; i++) {
      struct pipe_stream_output so_output = so_info->output[i];

      SpvId vec_type = ctx->so_output_types[so_output.register_index];
      SpvId pointer_type = spirv_builder_type_pointer(&ctx->builder,
                                                      SpvStorageClassOutput,
                                                      vec_type);
      SpvId var_id = spirv_builder_emit_var(&ctx->builder, pointer_type,
                                            SpvStorageClassOutput);
      char name[10];

      snprintf(name, 10, "xfb%d", i);
      spirv_builder_emit_name(&ctx->builder, var_id, name);
      spirv_builder_emit_offset(&ctx->builder, var_id, (so_output.dst_offset * 4));
      spirv_builder_emit_xfb_buffer(&ctx->builder, var_id, so_output.output_buffer);
      spirv_builder_emit_xfb_stride(&ctx->builder, var_id, so_info->stride[so_output.output_buffer] * 4);
      uint32_t location = so_output.register_index;
      if (location == 0)
         location = max_output_location;
      spirv_builder_emit_location(&ctx->builder, var_id, location);
      if (so_output.start_component)
         spirv_builder_emit_component(&ctx->builder, var_id, so_output.start_component);

      uint32_t *key = ralloc_size(NULL, sizeof(uint32_t));
      *key = (uint32_t)so_output.register_index << 2 | so_output.start_component;
      _mesa_hash_table_insert(ctx->so_outputs, key, (void *)(intptr_t)var_id);

      assert(ctx->num_entry_ifaces < ARRAY_SIZE(ctx->entry_ifaces));
      ctx->entry_ifaces[ctx->num_entry_ifaces++] = var_id;
   }
}

What we’ve got here is the emission of the xfb decorations based on the passed stream output info. If you’re unfamiliar, mesa internally names everything gallium-related with the pipe_ namespace, and each struct pipe_stream_output so_output represents an output variable from the shader for which we’re providing transform feedback.

To do this, the code loops over the shader’s stream outputs, creating output variables of type pointer-to-vec and then tagging on the xfb decorations. In SPIR-V, a decoration provides information about additional attributes that a variable may possess, e.g., its name, like gl_Position. Here, the xfb decorations are used to apply xfb semantics to the created variables using the provided stream output info, which gives information about where in memory each stream output exists.

At the end, the created variable is added into a hash table for use when emitting the instructions to store the xfb data below, then the new variable is added as an entry point, which makes it accessible to instructions.

static void
emit_so_outputs(struct ntv_context *ctx,
               const struct pipe_stream_output_info *so_info)
{
  for (unsigned i = 0; i < so_info->num_outputs; i++) {
     uint32_t components[NIR_MAX_VEC_COMPONENTS];
     struct pipe_stream_output so_output = so_info->output[i];
     uint32_t so_key = (uint32_t) so_output.register_index << 2 | so_output.start_component;
     struct hash_entry *he = _mesa_hash_table_search(ctx->so_outputs, &so_key);
     assert(he);
     SpvId so_output_var_id = (SpvId)(intptr_t)he->data;

     SpvId type = ctx->so_output_types[so_output.register_index];
     SpvId output = ctx->outputs[so_output.register_index];
     SpvId src = spirv_builder_emit_load(&ctx->builder, type, output);

     SpvId result;
     if (so_output.num_components > 1) {
        for (unsigned c = 0; c < so_output.num_components; c++) {
           components[c] = so_output.start_component + c;
        }

        result = spirv_builder_emit_vector_shuffle(&ctx->builder, type,
                                                         src, src,
                                                         components, so_output.num_components);
        result = emit_unop(ctx, SpvOpBitcast, type, result);
     } else
        result = src;

     spirv_builder_emit_store(&ctx->builder, so_output_var_id, result);
  }
}

Now that the xfb decorations have been emitted, the rest of ntv runs until everything is nearly done, at which point there’s a need to store the variable being output into the newly-created output stream variable.

This is done by getting the variable id stored previously in the hash table and matching it up to the struct nir_variable::data.driver_location member based on the register index, as the previously-emitted output variables have all been stored to an easily accessible array for this purpose. With the shader output variable id and the corresponding xfb output id in hand, this code performs a load instruction on the shader output variable, performs a vector shuffle to reorder the components based on the start component if it’s a multi-component type, then stores the loaded value into the new xfb variable.

CapabilityTransformFeedback and ExecutionModeXfb also get set on the SPIR-V shader to activate the xfb feature, and then the whole thing “Just Works™” as far as the shader is concerned.

Tune In Next Time For…

Going over details of the API-related changes required for xfb while I continue to stall for more meaty content to write about.

Written on June 3, 2020