【Archive】GAMES202

百分比近似软阴影

若要判断当前着色点是否处于阴影之中，我们需要知到该点与光源之间的遮挡关系。一种常见的做法是 Shadow Mapping，假设从光源看向场景，就像做深度测试一样，保存一张从光源处获得的 DepthBuffer，然后比较着色点与光源之间的距离与 DepthBuffer 中对应点的深度，前者更大则代表场景中有另一物体比当前着色点更接近光源，即该着色点被遮挡。

Two Pass Shadow Map 硬阴影

float useShadowMap(sampler2D shadowMap, vec4 coord) {
    vec4 closestDepthVec = texture2D(shadowMap, coord.xy);
    float closestDepth = unpack(closestDepthVec);

    return coord.z < closestDepth + 0.011 ? 1.0 : 0.0;
}

void main(void) {
    // 转换至 NDC 空间
    vec3 shadowCoord = vPositionFromLight.xyz / vPositionFromLight.w;
    shadowCoord = shadowCoord * 0.5 + 0.5;

    // 省略布林冯光照模型的实现
    vec3 phongColor = blinnPhong();
    // 这里将可见度作为一个单独的项拆分出来
    float visibility = 1.0;

    visibility = useShadowMap(uShadowMap, vec4(shadowCoord, 1.0));
    //visibility = PCF(uShadowMap, vec4(shadowCoord, 1.0));
    //visibility = PCSS(uShadowMap, vec4(shadowCoord, 1.0));
    gl_FragColor = vec4(phongColor * visibility, 1.0);
}

coord.z < closestDepth ? 1.0 : 0.0 的问题

这个问题源自于 Shadow Map 对场景走样/失真的理解。光源对场景的离散化认为 Shadow Map 内每一个纹素对应的表面属于同一个深度，此时对于侧对光源的表面，单个纹素内无法表示表面深度剧烈的变化，Shadow Map 便会将平面理解为一个…搓衣板。
光源与地板几乎位于同一平面上
解决方法：为判断条件添加一个容忍度。
return coord.z < (closestDepth + 0.01) ? 1.0 : 0.0;

新的问题：当这个容忍度较大时阴影的根部会与模型发生分离，这个问题是目前难以解决的。

Percentage Closer Filter

不再仅仅查询 DepthBuffer 中对应点的深度，而是计算一定范围内深度大于着色点深度的比例，认为深度小于着色点深度的物体会遮挡着色点。

float PCF(sampler2D shadowMap, vec4 coord) {
    poissonDiskSamples(coord.xy);

    int unShadowCount = 0;
    for(int i = 0; i < NUM_SAMPLES; i ++) {
        vec2 sampleCoord = poissonDisk[i] * FILTER_DIAMETER + coord.xy;
        vec4 closestDepthVec = texture2D(shadowMap, sampleCoord); 
        float closestDepth = unpack(closestDepthVec);
        if(coord.z < closestDepth + 0.011) {
            unShadowCount += 1;
        }
    }

    return float(unShadowCount) / float(NUM_SAMPLES);
}

Percentage Closer Soft Shadow

PCF 的问题：整体的阴影都会获得一个模糊，而现实情况是阴影的根部往往是最“扎实”的，远端的阴影才会渐渐变得模糊。
解决方法：已知阴影模糊的程度取决于 Filter 的大小，如果能让 Filter 的大小随阴影着色点与遮挡物的距离增加而增加，便能实现较为真实的软阴影。
PCSS 分为三步：

avgblocker depth
calculate penumbra size
filtering (PCF)

avgblocker depth

在一定范围内查找能够遮挡着色点的物体的深度平均值。

float findBlocker(sampler2D shadowMap, vec2 uv, float zReceiver) {
    poissonDiskSamples(uv);

    int shadowCount = 0;
    float blockDepthSum = 0.0;
    for(int i = 0; i < NUM_SAMPLES; i++) {
        vec2 sampleCoord = poissonDisk[i] * FILTER_DIAMETER + uv;
        vec4 closestDepthVec = texture2D(shadowMap, sampleCoord);
        float closestDepth = unpack(closestDepthVec);
        if(zReceiver > closestDepth + 0.011) {
            blockDepthSum += closestDepth;
            shadowCount += 1;
        }
    }

    return blockDepthSum / float(shadowCount);
}

calculate penumbra size

根据第一步得到的遮挡物深度计算阴影的模糊程度。
相似三角形原理，wPenumbra 便代表了阴影的模糊程度

$W_{Penumbra}=(d_{Receiver}-d_{Blocker})*W_{Light}/d_{Blocker}$

float PCSS(sampler2D shadowMap, vec4 coord) {
    float zReceiver = coord.z;

    // STEP 1: avgblocker depth
    float zBlocker = findBlocker(shadowMap, coord.xy, zReceiver);
    if(zBlocker <= EPS) return 1.0;
    if(zBlocker >= 1.0) return 0.0;

    // STEP 2: penumbra size
    float wPenumbra = W_LIGHT * (zReceiver - zBlocker) / zBlocker;

    // STEP 3: filtering
    int unShadowCount = 0;
    for(int i = 0; i < NUM_SAMPLES; i ++) {
        vec2 sampleCoord = poissonDisk[i] * FILTER_DIAMETER * wPenumbra + coord.xy;
        vec4 closestDepthVec = texture2D(shadowMap, sampleCoord); 
        float closestDepth = unpack(closestDepthVec);
        if(zReceiver < closestDepth + 0.011) {
            unShadowCount += 1;
        }
    }

    return float(unShadowCount) / float(NUM_SAMPLES);
}

结果

预计算辐射传递

如何计算环境光照可以近似为一个 ManyLight 的问题，在这种情况下，如果想要得知某一着色点获得的光照，需要对该点的上半球进行采样，这会是一个非常大的性能开销。使用 PRT 的做法可以使用预计算的数据，在实时的条件下实现环境光照。未实现：球谐函数的快速旋转。

前置知识

Product Integral

$\int _\Omega f(x)g(x)dx$ 两函数乘积的积分可以视为滤波，结果的频率由其中更低频的函数决定。

基函数

$f(x)=\sum c_i\cdot B_i(x)$ 对于任一函数，我们可以用一系列基函数的线性组合来近似出原函数。

球谐函数

定义： $\frac{1}{\sin \theta}\frac{\partial}{\partial \theta}(\sin \theta \frac{\partial Y}{\partial \theta})+\frac{1}{\sin ^2\theta}\frac{\partial ^2Y}{\partial \varphi ^2}+l(l+1)Y=0$
球谐函数是一系列定义在球面上的基函数，这里不深入理解函数的定义，而是重点关注该函数的几个性质：

$c_i=\int _\Omega f(\omega)B_i(\omega){\rm d}x$
正交性，任意两分量相乘结果为零。
旋转不变性，旋转的结果可由同阶基函数的线性组合得到。
只用三阶的球谐函数便可以很好地恢复出渲染方程中的 BRDF 项。
所以对于 Diffuse 材质，同样用三阶的球谐函数表示 Lighting 项，误差不会超过 %3。

Precomputed Radiance Transfer

先写出实时渲染中渲染方程的形式：

$L(o)=\int _\Omega L(i)V(i)\rho (i, o)max(0, n\cdot i)di$

积分中 L(i) 当作 Lighting 项，其余为 LightTransport 项。假设场景中的物体不变（即每个点的 LightTransport 就像该点的一个性质一样不变），光源可以旋转。

Diffuse 的情况

BRDF 项为一个常值，可以直接提到积分外。

$L(o)=\rho \int _\Omega L(i)V(i)max(0, n\cdot i)di$

由 $L(i)\approx \sum l_iB_i(i)$ 得

$L(o)\approx \rho \sum l_i\int _\Omega B_i(i)V(i)max(0, n\cdot i)di$ （不考虑交换积分与求和顺序的条件）

由 $c_i=\int _\Omega f(\omega)B_i(\omega){\rm d}x$ 得

$L(o)\approx \rho \sum l_iT_i$

Precompute 部分

$c_i=\int _\Omega f(\omega)B_i(\omega){\rm d}x$

使用黎曼积分的方式计算。

$c_i=\sum _if(\omega)B_i(\omega)\Delta \omega$

std::vector<Eigen::Array3f> PrecomputeCubemapSH(const std::vector<std::unique_ptr<float[]>> &images, const int &width, const int &height, const int &channel)

for (int i = 0; i < 6; i++) {
    for (int y = 0; y < height; y++) {
        for (int x = 0; x < width; x++) {
			// 遍历天空盒每个面的每个像素

            // 像素对应的方向
            Eigen::Vector3f dir = cubemapDirs[i * width * height + y * width + x];
            int index = (y * width + x) * channel;
            // 像素对应的光照
            Eigen::Array3f Le(
				images[i][index],
				images[i][index + 1],
				images[i][index + 2]);
            // 像素在单位球上对应的面积
            float w = CalcArea(x, y, width, height);	
            for (int l = 0; l <= SHOrder; l++) {
                for (int m = -l; m <= l; m++) {
                    int k = sh::GetIndex(l, m);
					// 投影于基函数上
                    double basisFunc =
						sh::EvalSH(l, m, dir.cast<double>().normalized());	
                    SHCoeffiecents[k] += Le * basisFunc * w;
                }
            }
        }
    }
}

void preprocess(const Scene *scene)

for (int i = 0; i < mesh->getVertexCount(); i++) 
	// 获取当前 mesh 的坐标和法线{
	const Point3f &v = mesh->getVertexPositions().col(i);
	const Normal3f &n = mesh->getVertexNormals().col(i);
	auto shFunc = [&](double phi, double theta) -> double {
		Eigen::Array3d d = sh::ToVector(phi, theta);
		const auto wi = Vector3f(d.x(), d.y(), d.z());
		double H = wi.dot(n);
		if (m_Type == Type::Unshadowed) {
			if (H > 0) {
				return H;
			}
			return 0;
		}
		else {
			Ray3f ray(v, wi.normalized());
			// 没有hit到其他地方的射线就对最终光照信息有贡献
			if (H > 0 && !scene->rayIntersect(ray)) {
				return H;
			}
			return 0;
		}
	};
	// 投影
	auto shCoeff = sh::ProjectFunction(SHOrder, shFunc, m_SampleCount);
	for (int j = 0; j < shCoeff->size(); j++) {
		m_TransportSHCoeffs.col(i).coeffRef(j) = (*shCoeff)[j];
	}
}

结果保存在 light.txt 与 transport.txt 中。

Real Time 部分

材质

class DiffuseMaterial extends Material {

	constructor(vertexShader, fragmentShader) {
		let precomputeL_mat = getMat3ValueFromRGB(precomputeL[guiParams.envmapId]);

		super({
			'aPrecomputeLR': { type: 'matrix3fv', value: precomputeL_mat[0]},
			'aPrecomputeLG': { type: 'matrix3fv', value: precomputeL_mat[1]},
			'aPrecomputeLB': { type: 'matrix3fv', value: precomputeL_mat[2]}
		}, ['aPrecomputeLT'], vertexShader, fragmentShader, null);
		this.mapid = guiParams.envmapId;
	}
}

async function buildDiffuseMaterial(vertexPath, fragmentPath) {
	let vertexShader = await getShaderString(vertexPath);
	let fragmentShader = await getShaderString(fragmentPath);

	return new DiffuseMaterial(vertexShader, fragmentShader);
}

VertexShader

uniform mat3 aPrecomputeLR;
uniform mat3 aPrecomputeLG;
uniform mat3 aPrecomputeLB;
attribute mat3 aPrecomputeLT;

attribute vec3 aVertexPosition;
attribute vec3 aNormalPosition;
attribute vec2 aTextureCoord;

uniform mat4 uModelMatrix;
uniform mat4 uViewMatrix;
uniform mat4 uProjectionMatrix;

varying highp vec2 vTextureCoord;
varying highp vec3 vFragPos;
varying highp vec3 vNormal;
varying highp vec3 vColor;

const float kd = 2.0;
const float pi = 3.14159;

void main(void) {
  vFragPos = (uModelMatrix * vec4(aVertexPosition, 1.0)).xyz;
  vNormal = (uModelMatrix * vec4(aNormalPosition, 0.0)).xyz;

  vec3 iden = vec3(1.0, 1.0, 1.0);
  vColor = vec3(
      dot(matrixCompMult(aPrecomputeLR, aPrecomputeLT) * iden, iden),
      dot(matrixCompMult(aPrecomputeLG, aPrecomputeLT) * iden, iden),
      dot(matrixCompMult(aPrecomputeLB, aPrecomputeLT) * iden, iden)
  ) * kd / pi;

  gl_Position =
  	uProjectionMatrix * uViewMatrix * uModelMatrix * vec4(aVertexPosition, 1.0);
  vTextureCoord = aTextureCoord;
}

FragmentShader

#ifdef GL_ES
precision mediump float;
#endif

uniform sampler2D uSampler;

varying highp vec2 vTextureCoord;
varying highp vec3 vFragPos;
varying highp vec3 vNormal;
varying highp vec3 vColor;

void main(void) {
  vec3 color = texture2D(uSampler, vTextureCoord).rgb;
  color = pow(color, vec3(2.2));

  gl_FragColor = vec4(vColor, 1.0);
}

结果

屏幕空间反射

离线渲染的条件下，光线追踪可以处理光线在空间内的多次弹射，带来高质量的间接光照结果，那么如何在实时的条件下得到令人满意的全局光照效果呢。其中的一种做法是在做 RayTracing 时使用 DepthBuffer 判断射线与平面的相交问题，认为能被 DepthBuffer 看见得世界坐标是“位于物体外部”的坐标，看不见则在物体内部，那么一条步进的 Ray 从可见转为不可见时的位置便是某一表面所在的坐标，这样相交的计算就转变为二维空间下的步进与查表。

Shader

Diffuse

EvalDiffuse(wi, wo, uv) 的返回值为 BSDF 的值。参数 wi 和 wo 为世界坐标系中的值，分别代表入射方向和出射方向，这两个方向的起点都是着色点，uv 为着色点在屏幕空间中的坐标。

vec3 EvalDiffuse(vec3 wi, vec3 wo, vec2 uv) {
  vec3 Ld = GetGBufferDiffuse(uv);
  vec3 n = GetGBufferNormalWorld(uv);
  float nDotL = max(0.0, dot(n, wi));

  return Ld * nDotL * INV_PI;
}

DirectionalLight

EvalDirectionalLight(uv) 的返回值为，着色点位于 uv 处得到的光源的辐射度，并且需要考虑遮挡关系，可以使用 GetGBufferuShadow(uv) 函数得到。

vec3 EvalDirectionalLight(vec2 uv) {
    vec3 woeldPos = GetGBufferPosWorld(uv);
    vec3 wi = normalize(uLightDir);
    vec3 wo = normalize(uCameraPos - woeldPos);
    vec3 bsdf = EvalDiffuse(wi, wo, uv);

    return uLightRadiance * bsdf * GetGBufferuShadow(uv);
}

RayMarch

RayMarch(vec3 ori, vec3 dir, vec3 hitPos) 的返回值为是否相交，相交时将参数 hitPos 设置为交点。参数 ori 和 dir 为世界坐标系，分别代表光线的起点和方向。

#define EPS 1e-3
bool OutOfScreen(vec3 pos) {
    vec2 uv = GetScreenCoordinate(pos);
    return uv.x < 0.0 || uv.y < 0.0 || uv.x > 1.0 || uv.y > 1.0;
}
bool Visible(vec3 pos) {
    return GetDepth(pos) < GetGBufferDepth(GetScreenCoordinate(pos));
}

bool RayMarch(vec3 ori, vec3 dir, out vec3 hitPos) {
    bool isHit = false, intersected = false;
    float stepLenth = 1.0;
    vec3 crtPos = ori;
    for (int i = 0; i < 1280; i++) {
        if(OutOfScreen(crtPos)) {
            break;
        }
        if(Visible(crtPos + dir * stepLenth)) {
            // 无交点则步长不变继续步进
            crtPos += dir * stepLenth;
        }
        else {
            intersected = true;
        }
        if(stepLenth < EPS) {
            // 当精度高到一定程度时结束循环
            isHit = true;
            hitPos = crtPos;
            break;
        }
        if(intersected) {
            // 一旦发现前方为交点则不断提高精度
            stepLenth *= 0.5;
        }
    }

    return isHit;
}

IndirectLight

使用蒙特卡洛的方式对半球进行积分，直接光照着色点 q 对间接光照作色点 p 的贡献为

$L_{indir}=\frac{EvalDiffuse(q)}{pdf}*EvalDiffuse(p)*EvalDirectionalLight(p)$

SampleHemisphereCos(seed, pdf) 会返回一个局部坐标系的随机位置。LocalBasis(n, b1, b2) 通过传入的世界坐标系的法线 n，建立局部坐标系，返回两个切线向量。

#define SAMPLE_NUM 2
vec3 EvalIndirectLight(float seed, vec3 pos) {
    float pdf;
    vec3 Lid = vec3(0.0), hitPos;
    vec2 screenPos = GetScreenCoordinate(pos);

    // 局部转换至世界坐标
    vec3 normal = GetGBufferNormalWorld(screenPos), b1, b2;
    LocalBasis(normal, b1, b2);
    mat3 TBN = mat3(b1, b2, normal);

    for(int i = 0; i < SAMPLE_NUM; i++) {
        vec3 sampleDir = normalize(TBN * SampleHemisphereCos(seed, pdf));
        // SampleHemisphereUniform(seed, pdf);
        if(RayMarch(pos, sampleDir, hitPos)) {
            vec3 wi = normalize(uLightDir);
            vec3 wo = normalize(uCameraPos - pos);
            vec2 screenHitPos = GetScreenCoordinate(hitPos);

            vec3 L = EvalDiffuse(sampleDir, wo, screenPos) / pdf *
        EvalDiffuse(wi, -sampleDir, screenHitPos) * EvalDirectionalLight(screenHitPos);
            
            Lid += L;
        }
    }

    return Lid / float(SAMPLE_NUM);
}

完整的 FragmentSHader

#ifdef GL_ES
precision highp float;
#endif

uniform vec3 uLightDir;
uniform vec3 uCameraPos;
uniform vec3 uLightRadiance;
uniform sampler2D uGDiffuse;
uniform sampler2D uGDepth;
uniform sampler2D uGNormalWorld;
uniform sampler2D uGShadow;
uniform sampler2D uGPosWorld;

varying mat4 vWorldToScreen;
varying highp vec4 vPosWorld;

#define M_PI 3.1415926535897932384626433832795
#define TWO_PI 6.283185307
#define INV_PI 0.31830988618
#define INV_TWO_PI 0.15915494309

float Rand1(inout float p) {
  p = fract(p * .1031);
  p *= p + 33.33;
  p *= p + p;
  return fract(p);
}

vec2 Rand2(inout float p) {
  return vec2(Rand1(p), Rand1(p));
}

float InitRand(vec2 uv) {
  vec3 p3 = fract(vec3(uv.xyx) * .1031);
  p3 += dot(p3, p3.yzx + 33.33);
  return fract((p3.x + p3.y) * p3.z);
}

vec3 SampleHemisphereUniform(inout float s, out float pdf) {
  vec2 uv = Rand2(s);
  float z = uv.x;
  float phi = uv.y * TWO_PI;
  float sinTheta = sqrt(1.0 - z*z);
  vec3 dir = vec3(sinTheta * cos(phi), sinTheta * sin(phi), z);
  pdf = INV_TWO_PI;
  return dir;
}

vec3 SampleHemisphereCos(inout float s, out float pdf) {
  vec2 uv = Rand2(s);
  float z = sqrt(1.0 - uv.x);
  float phi = uv.y * TWO_PI;
  float sinTheta = sqrt(uv.x);
  vec3 dir = vec3(sinTheta * cos(phi), sinTheta * sin(phi), z);
  pdf = z * INV_PI;
  return dir;
}

void LocalBasis(vec3 n, out vec3 b1, out vec3 b2) {
  float sign_ = sign(n.z);
  if (n.z == 0.0) {
    sign_ = 1.0;
  }
  float a = -1.0 / (sign_ + n.z);
  float b = n.x * n.y * a;
  b1 = vec3(1.0 + sign_ * n.x * n.x * a, sign_ * b, -sign_ * n.x);
  b2 = vec3(b, sign_ + n.y * n.y * a, -n.y);
}

vec4 Project(vec4 a) {
  return a / a.w;
}

float GetDepth(vec3 posWorld) {
  float depth = (vWorldToScreen * vec4(posWorld, 1.0)).w;
  return depth;
}

/*
 * Transform point from world space to screen space([0, 1] x [0, 1])
 *
 */
vec2 GetScreenCoordinate(vec3 posWorld) {
  vec2 uv = Project(vWorldToScreen * vec4(posWorld, 1.0)).xy * 0.5 + 0.5;
  return uv;
}

float GetGBufferDepth(vec2 uv) {
  float depth = texture2D(uGDepth, uv).x;
  if (depth < 1e-2) {
    depth = 1000.0;
  }
  return depth;
}

vec3 GetGBufferNormalWorld(vec2 uv) {
  vec3 normal = texture2D(uGNormalWorld, uv).xyz;
  return normal;
}

vec3 GetGBufferPosWorld(vec2 uv) {
  vec3 posWorld = texture2D(uGPosWorld, uv).xyz;
  return posWorld;
}

float GetGBufferuShadow(vec2 uv) {
  float visibility = texture2D(uGShadow, uv).x;
  return visibility;
}

vec3 GetGBufferDiffuse(vec2 uv) {
  vec3 diffuse = texture2D(uGDiffuse, uv).xyz;
  diffuse = pow(diffuse, vec3(2.2));
  return diffuse;
}

/*
 * Evaluate diffuse bsdf value.
 *
 * wi, wo are all in world space.
 * uv is in screen space, [0, 1] x [0, 1].
 *
 */
vec3 EvalDiffuse(vec3 wi, vec3 wo, vec2 uv) {
  vec3 Ld = GetGBufferDiffuse(uv);
  vec3 n = GetGBufferNormalWorld(uv);
  float nDotL = max(0.0, dot(n, wi));
  return Ld * nDotL * INV_PI;
}

/*
 * Evaluate directional light with shadow map
 * uv is in screen space, [0, 1] x [0, 1].
 *
 */
vec3 EvalDirectionalLight(vec2 uv) {
  vec3 woeldPos = GetGBufferPosWorld(uv);
  vec3 wi = normalize(uLightDir);
  vec3 wo = normalize(uCameraPos - woeldPos);
  vec3 bsdf = EvalDiffuse(wi, wo, uv);
  return uLightRadiance * bsdf * GetGBufferuShadow(uv);
}

#define EPS 1e-3
bool OutOfScreen(vec3 pos) {
  vec2 uv = GetScreenCoordinate(pos);
  return uv.x < 0.0 || uv.y < 0.0 || uv.x > 1.0 || uv.y > 1.0;
}
bool Visible(vec3 pos) {
  return GetDepth(pos) < GetGBufferDepth(GetScreenCoordinate(pos));
}
bool RayMarch(vec3 ori, vec3 dir, out vec3 hitPos) {
  bool isHit = false, intersected = false;
  float stepLenth = 0.1;
  vec3 crtPos = ori;
  for (int i = 0; i < 1280; i++) {
    if(OutOfScreen(crtPos)) {
      break;
    }
    if(Visible(crtPos + dir * stepLenth)) {
      // 无交点则步长不变继续步进
      crtPos += dir * stepLenth;
    }
    else {
      intersected = true;
    }
    if(stepLenth < EPS) {
      // 当精度高到一定程度时结束循环
      isHit = true;
      hitPos = crtPos;
      break;
    }
    if(intersected) {
      // 一旦发现前方为交点则不断提高精度
      stepLenth *= 0.5;
    }
  }
  return isHit;
}

#define SAMPLE_NUM 2
vec3 EvalIndirectLight(float seed, vec3 pos) {
  float pdf;
  vec3 Lid = vec3(0.0), hitPos;
  vec2 screenPos = GetScreenCoordinate(pos);

  vec3 normal = GetGBufferNormalWorld(screenPos), b1, b2;
  LocalBasis(normal, b1, b2);
  mat3 TBN = mat3(b1, b2, normal);

  for(int i = 0; i < SAMPLE_NUM; i++) {
    // SampleHemisphere
    vec3 sampleDir = normalize(TBN * SampleHemisphereCos(seed, pdf));
    if(RayMarch(pos, sampleDir, hitPos)) {
      vec3 wi = normalize(uLightDir);
      vec3 wo = vec3(1.0);
      vec2 screenHitPos = GetScreenCoordinate(hitPos);

      vec3 L = EvalDiffuse(sampleDir, wo, screenPos) / pdf *
        EvalDiffuse(wi, wo, screenHitPos) * EvalDirectionalLight(screenHitPos);
        
      Lid += L;
    }
  }
  return Lid / float(SAMPLE_NUM) * 100.0;
}

void main() {
  float s = InitRand(gl_FragCoord.xy);

  vec3 L = vec3(0.0);
  // L += GetGBufferDiffuse(GetScreenCoordinate(vPosWorld.xyz));
  L += EvalDirectionalLight(GetScreenCoordinate(vPosWorld.xyz));

  // vec3 test_dir = vec3(0.0);
  // vec3 wo = normalize(uCameraPos - vPosWorld.xyz);
  // vec3 normal = GetGBufferNormalWorld(GetScreenCoordinate(vPosWorld.xyz));
  // test_dir = reflect(-wo,normal);
  // vec3 test_hit;
  // if(RayMarch(vPosWorld.xyz, test_dir, test_hit)) {
  //   L = GetGBufferDiffuse(GetScreenCoordinate(test_hit));
  // }

  L += EvalIndirectLight(s, vPosWorld.xyz);

  vec3 color = pow(clamp(L, vec3(0.0), vec3(1.0)), vec3(1.0 / 2.2));
  gl_FragColor = vec4(vec3(color.rgb), 1.0);
}

结果

Kulla-Conty 微表面模型能量补偿

微表面模型的 BRDF(Microfacet BRDF) 存在一个根本问题，就是其中的 G 项认为被微表面遮挡的反射光会直接消失，从而忽略了微表面间的多次弹射，这就导致了能量的损失。并且当材质的粗糙度越高时，能量的损失会越严重。Kulla-Conty 近似模型通过引入一个微表面 BRDF 的补偿项，来补偿光线的多次弹射，使得材质的渲染结果可以近似保持能量守恒。

BRDF

$f_r(i,o)=\frac{F(i,h)G(i,o,h)D(h)}{4(n,i)(n,o)}$

F 项，选用 Schlick 近似：

$F=R_0+(1-R_0)(1-\cos \theta)^5$

D 项，选用 GGX 分布：

$D_{GGX}(h)=\frac{\alpha ^2}{\pi ((n\cdot h)^2(\alpha ^2-1)+1)^2}$

G 项，选用 Smith 模型：

$G_{Smith}(i,o,h)=G_{Schlick}(l,h)G_{Schlick}(v,h)$
$G_{Schlick}(v,n)+\frac{n\cdot v}{n\cdot v(1-k)+k}$
$k=\frac{(roughness+1)^2}{8}$

Precompute 部分

理论

Kulla-Conty 模型假设 Li 为 1，并试图积分出在给定 BRDF 下一着色点辐射出的所有能量 E，那么在考虑光线只发生一次弹射的情况下，所损失的、需要补充的能量自然就是 (1 - E) 了。
接下来我们希望能得到一个 BRDF，并且该公式在半球上的积分值为 (1 - E)，在实时计算时将该 BRDF 补充至原 BRDF 上便能得到正确的结果。

GGX_E

$E(\mu _o)=\int _0^{2\pi}\int _0 ^1f_r(\mu _o,\mu _i,\phi)\mu _id\mu _id\phi$

注： $\mu =\sin \theta$ ，该公式本质上仍然是渲染方程的一种变形，将该积分换元便可得到一个在半球上对 $\omega$ 积分的公式。

预计算实现

#include <iostream>
#include <vector>
#include <algorithm>
#include <cmath>
#include <sstream>
#include <fstream>
#include <random>
#include "vec.h"

#define STB_IMAGE_WRITE_IMPLEMENTATION

#include "stb_image_write.h"

const int resolution = 128;

typedef struct samplePoints {
    std::vector<Vec3f> directions;
    std::vector<float> PDFs;
}samplePoints;

// 返回半球上 sample_count 个采样点
samplePoints squareToCosineHemisphere(int sample_count){
    samplePoints samlpeList;
    const int sample_side = static_cast<int>(floor(sqrt(sample_count)));

    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_real_distribution<> rng(0.f, 1.f);
    for (int t = 0; t < sample_side; t++) {
        for (int p = 0; p < sample_side; p++) {
            double samplex = (t + rng(gen)) / sample_side;
            double sampley = (p + rng(gen)) / sample_side;
            
            double theta = 0.5f * acos(1 - 2 * samplex);
            double phi =  2 * PI * sampley;
            Vec3f wi = Vec3f(sin(theta) * cos(phi), sin(theta) * sin(phi), cos(theta));
            float pdf = wi.z / PI;
            
            samlpeList.directions.push_back(wi);
            samlpeList.PDFs.push_back(pdf);
        }
    }
    return samlpeList;
}

float FresnelSchlick(float HdotV, float R0) {
    return R0 + (1.f - R0) * pow(1.f - HdotV, 5.f);
}

float DistributionGGX(float NdotH, float roughness) {
    float a = roughness*roughness;
    float a2 = a*a;
    float NdotH2 = NdotH*NdotH;

    float nom   = a2;
    float denom = (NdotH2 * (a2 - 1.f) + 1.f);
    denom = PI * denom * denom;

    return nom / denom;
}

float GeometrySchlickGGX(float cosTerm, float roughness) {
    float a = roughness;
    float k = (a * a) / 2.f;

    float nom = cosTerm;
    float denom = cosTerm * (1.f - k) + k;

    return nom / denom;
}

float GeometrySmith(float roughness, float NdotV, float NdotL) {
    float ggx2 = GeometrySchlickGGX(NdotV, roughness);
    float ggx1 = GeometrySchlickGGX(NdotL, roughness);

    return ggx1 * ggx2;
}

Vec3f IntegrateBRDF(float roughness, float NdotV) {
    float A = 0.f, B = 0.f, C = 0.f;
    const int sample_count = 1024;

    Vec3f N = Vec3f(0.f, 0.f, 1.f);
    Vec3f V = Vec3f(std::sqrt(1.f - NdotV * NdotV), 0.f, NdotV);
    
    samplePoints sampleList = squareToCosineHemisphere(sample_count);
    for (int i = 0; i < sample_count; i++) {
        Vec3f L = normalize(sampleList.directions[i]);
        Vec3f H = normalize(V + L);

        float HdotV = std::max(dot(H, V), 0.f);
        float NdotL = std::max(dot(N, L), 0.f);
        float NdotH = std::max(dot(N, H), 0.f);

        float R0 = 1.f;
        float F = FresnelSchlick(HdotV, R0);
        float G = GeometrySmith(roughness, NdotV, NdotL);
        float D = DistributionGGX(NdotH, roughness);

        float numerator = F * G * D;
        float denominator = 4.f * NdotV * NdotL;
        float micro = numerator / std::max(denominator, 1e-10f);

        float pdf = sampleList.PDFs[i];
        A += micro * NdotL / pdf;
    }
    C = B = A;

    return {A / sample_count, B / sample_count, C / sample_count};
}

uint8_t data[resolution * resolution * 3];
int main() {
    float step = 1.f / resolution;
    for (int i = 0; i < resolution; i++) {
        for (int j = 0; j < resolution; j++) {
            float roughness = step * (static_cast<float>(i) + 0.5f);
            float NdotV = step * (static_cast<float>(j) + 0.5f);

            Vec3f irr = IntegrateBRDF(roughness, NdotV);

            data[(i * resolution + j) * 3 + 0] = uint8_t(irr.x * 255.0);
            data[(i * resolution + j) * 3 + 1] = uint8_t(irr.y * 255.0);
            data[(i * resolution + j) * 3 + 2] = uint8_t(irr.z * 255.0);
        }
    }
    stbi_flip_vertically_on_write(true);
    stbi_write_png("GGX_E_MC_LUT.png", resolution, resolution, 3, data, resolution * 3);
    
    std::cout << "Finished precomputed!" << std::endl;
    return 0;
}

结果

GGX_E_LUT 其中纵轴为粗糙度，横轴为观察角度

随机采样的问题：粗糙度从下向上增加，可以看到粗糙度较低时结果有很多噪声，这是因为低粗糙度的微表面材质接近镜面反射材质，该点接收到的光线只会被反射向一个很小的立体角。而均匀随机采样的蒙特卡洛积分很难处理这种高频信息，因此积分值的方差就会很大。
解决方法：重要性采样。

GGX_E_IS

对于给定出射方向 o 的情况，目的是重要性采样生成入射方向 i，那么有两个核心问题需要解决：如何采样和对应的概率 pdf 是什么。

采样

我们首先根据选用的 NDF 模型，重要性采样微表面法向 m（也就是 i, o 的半程向量 h），随后利用反射关系来计算入射方向 i。

$i=2(m\cdot o)m-o$

同时对于任意 NDF 下，采样 m 对应的概率密度 $pdf_m(m)$ ，有

$pdf_m(m)=D(m)(m\cdot n)$

GGX NDF 对应的采样点应该为

$\theta =\arctan (\frac{\alpha \sqrt{\xi_ 1}}{\sqrt{1-\xi _1}})$
$\phi =2\pi \xi _2$

其中 $\xi _1$ 与 $\xi _2$ 由 Hammersley 采样得到。

pdf

因为最后生成的采样方向是入射方向 i, 所以最后结果的权重应该是：

$weight(i)=\frac{f_r(i,o,h)(i,n)}{pdf_i(i)}$

将之前采样微表面法线的概率密度 $pdf_m(m)$ 转换成采样入射的概率密度 $pdf_i(i)$ ，两者之间的转换只需要乘上一个 Jacobian 项：

$pdf_i(i)=pdf_m(m)|\frac{\partial \omega _m}{\partial \omega _i}|$

对于反射有：

$|\frac{\partial \omega _m}{\partial \omega _i}|=\frac{1}{4(i\cdot m)}$

所以：

$weight(i)=\frac{(o\cdot m)G(i,o,h)}{(o\cdot n)(m\cdot n)}$

预计算实现

// 省略一些重复的代码

float FresnelSchlick(float HdotV, float R0);

float GeometrySchlickGGX(float cosTerm, float roughness);

float GeometrySmith(float roughness, float NoV, float NoL);

// 球面上的均匀分布
Vec2f Hammersley(uint32_t i, uint32_t N) { // 0-1
    uint32_t bits = (i << 16u) | (i >> 16u);
    bits = ((bits & 0x55555555u) << 1u) | ((bits & 0xAAAAAAAAu) >> 1u);
    bits = ((bits & 0x33333333u) << 2u) | ((bits & 0xCCCCCCCCu) >> 2u);
    bits = ((bits & 0x0F0F0F0Fu) << 4u) | ((bits & 0xF0F0F0F0u) >> 4u);
    bits = ((bits & 0x00FF00FFu) << 8u) | ((bits & 0xFF00FF00u) >> 8u);
    float rdi = float(bits) * 2.3283064365386963e-10;

    return {float(i) / float(N), rdi};
}

// 重要性采样法线方向，即 H
Vec3f ImportanceSampleGGX(Vec2f Xi, Vec3f N, float roughness) {
    float a = roughness * roughness;

    // 定义球面坐标系
    float sinTheta = a * sqrt(Xi.x);
    float cosTheta = sqrt(1.f - Xi.x);
    float phi = 2 * PI * Xi.y;

    // 球面坐标系转换为笛卡尔坐标系，单位球上半径为 1
    float carX = sinTheta * cos(phi);
    float carY = sinTheta * sin(phi);
    float carZ = cosTheta;

    Vec3f result = Vec3f(carX, carY, carZ);
    return normalize(result);
}

Vec3f IntegrateBRDF(float roughness, float NoV) {
    float A = 0.f, B = 0.f, C = 0.f;
    const int sample_count = 1024;

    Vec3f N = Vec3f(0.f, 0.f, 1.f);
    Vec3f V = Vec3f(std::sqrt(1.f - NoV * NoV), 0.f, NoV);

    for (int i = 0; i < sample_count; i++) {
        Vec2f Xi = Hammersley(i, sample_count);
        Vec3f H = ImportanceSampleGGX(Xi, N, roughness);
        Vec3f L = normalize(H * 2.f * dot(H, V) - V);

        float NoL = std::max(L.z, 0.f);
        float NoH = std::max(H.z, 0.f);
        float VoH = std::max(dot(V, H), 0.f);
        
        float R0 = 1.f;
        float F = FresnelSchlick(VoH, R0);
        float G = GeometrySmith(roughness, NoV, NoL);

        A += F * G * VoH / (NoV * NoH);
    }
    C = B = A;

    return {A / sample_count, B / sample_count, C / sample_count};
}

uint8_t data[resolution * resolution * 3];
int main() {
    float step = 1.f / resolution;
    for (int i = 0; i < resolution; i++) {
        for (int j = 0; j < resolution; j++) {
            float roughness = step * (static_cast<float>(i) + 0.5f);
            float NdotV = step * (static_cast<float>(j) + 0.5f);

            Vec3f irr = IntegrateBRDF(roughness, NdotV);
            //Vec3f irr = Vec3f(1.f) - IntegrateBRDF(roughness, NdotV);

            data[(i * resolution + j) * 3 + 0] = uint8_t(irr.x * 255.0);
            data[(i * resolution + j) * 3 + 1] = uint8_t(irr.y * 255.0);
            data[(i * resolution + j) * 3 + 2] = uint8_t(irr.z * 255.0);
        }
    }
    stbi_flip_vertically_on_write(true);
    stbi_write_png("GGX_E_LUT_IS.png", resolution, resolution, 3, data, resolution * 3);
    
    std::cout << "Finished precomputed!" << std::endl;
    return 0;
}

结果

既然已经得到了一着色点在各种条件下的出射能量 E（左图），那么需要补偿的能量便是 (1 - E)（右图）。

GGX_Eavg

上面说到我们希望能得到一个 BRDF，并且该公式在半球上的积分值为 (1 - E)，即：

$\int f_{ms}(\mu _o,\mu _i)=\int \frac{(1-E(\mu _o))(1-E(\mu _i))}{\pi (1-E_{avg})}=1-E(\mu _o)$

其中：

$E_{avg}(\mu _o)=2\int _o^1E(\mu _i)\mu _id\mu _i$

推导：

$E_{ms}(\mu _o)=\int _0^{2\pi}\int _0^1f_{ms}(\mu _o,\mu _1,\phi)$
$=2\pi \int _0^1\frac{(1-E(\mu _o))(1-E(\mu _i))}{\pi (1-E_{avg})}\mu _id\mu _i$
$=2\frac{1-E(\mu _o)}{1-E_{avg}}\int _0^1(1-E(\mu _i))\mu _id\mu _i$
$=\frac{1-E(\mu _o)}{1-E_{avg}}(1-E_{avg})$
$=(1-E_{avg})$

也就是说只要再给 $E_{avg}$ 打一张表，就可以实时地查询预计算的信息得到 $f_{ms}$ 了。

结果

使用重要性采样（左图），使用均匀采样（右图）

这是一张一维的表，纵轴为粗糙度从下向上增加，而同一行上的存储值是相同的。

Real Time 部分

颜色

当材质拥有颜色时，颜色的性质会自然地、正确地带来能量的损失。那么在实现无色的 $f_{ms}$ 的基础上，将一考虑颜色带来的能量损失的颜色项 $f_{add}$ （三通道）与 $f_{ms}$ 相乘便能得到最终需要补充的 BRDF。
先引入一个概念：平均菲涅尔项 $F_{avg}$ （三通道）
Favg 的实现
那么一次反射后出射的能量为： $F_{avg}E_{avg}$
两次反射后出射的能量为： $F_{avg}(1-E_{avg})F_{avg}E_{avg}$
K 次反射后出射的能量为： $F_{avg}^k(1-E_{avg})^kF_{avg}E_{avg}$
上述能量全部叠加即可得到颜色项：

$f_{add}=\frac{F_{avg}E_{avg}}{1-F_{avg}(1-E_{avg})}$

最终的 BRDF

之前我们得到了补偿的 BRDF 项：

$f_{ms}(\mu _o,\mu _i)=\frac{(1-E(\mu _o))(1-E(\mu _i))}{\pi (1-E_{avg})}$

最终的 BRDF 便是：

$f_r=f_{micro}+f_{add}f_{ms}$

KullaContyFragmentShader

#ifdef GL_ES
precision mediump float;
#endif

uniform vec3 uLightPos;
uniform vec3 uCameraPos;
uniform vec3 uLightRadiance;
uniform vec3 uLightDir;

uniform sampler2D uAlbedoMap;
uniform float uMetallic;
uniform float uRoughness;
uniform sampler2D uBRDFLut;
uniform sampler2D uEavgLut;
uniform samplerCube uCubeTexture;

varying highp vec2 vTextureCoord;
varying highp vec3 vFragPos;
varying highp vec3 vNormal;

const float PI = 3.14159265359;

float DistributionGGX(vec3 N, vec3 H, float roughness) {
    float a = roughness * roughness;
    float a2 = a * a;
    float NdotH = max(dot(N, H), 0.0);
    float NdotH2 = NdotH * NdotH;

    float nom = a2;
    float denom = (NdotH2 * (a2 - 1.0) + 1.0);
    denom = PI * denom * denom;

    return nom / denom;
}

float GeometrySchlickGGX(float cosTerm, float roughness) {
    float a = roughness;
    float k = (a * a) / 2.0;

    float nom = cosTerm;
    float denom = cosTerm * (1.0 - k) + k;

    return nom / denom;
}

float GeometrySmith(vec3 N, vec3 V, vec3 L, float roughness) {
    float NdotL = max(dot(N,L), 0.0);
    float NdotV = max(dot(N,V), 0.0);

    float ggx2 = GeometrySchlickGGX(NdotV, roughness);
    float ggx1 = GeometrySchlickGGX(NdotL, roughness);

    return ggx1 * ggx2;
}

vec3 fresnelSchlick(vec3 F0, vec3 V, vec3 H) {
    float HdotV = max(dot(H, V), 0.0);

    return F0 + (1.0 - F0) * pow(1.0 - HdotV, 5.0);
}


/* https://blog.selfshadow.com/publications/s2017-shading-course/imageworks/
   s2017_pbs_imageworks_slides_v2.pdf */
vec3 AverageFresnel(vec3 r, vec3 g) {
    return vec3(0.087237) + 0.0230685*g - 0.0864902*g*g + 0.0774594*g*g*g
           + 0.782654*r - 0.136432*r*r + 0.278708*r*r*r
           + 0.19744*g*r + 0.0360605*g*g*r - 0.2586*g*r*r;
}

vec3 MultiScatterBRDF(float NdotL, float NdotV) {
  vec3 albedo = pow(texture2D(uAlbedoMap, vTextureCoord).rgb, vec3(2.2));

  vec3 E_o = texture2D(uBRDFLut, vec2(NdotL, uRoughness)).xyz;
  vec3 E_i = texture2D(uBRDFLut, vec2(NdotV, uRoughness)).xyz;

  vec3 E_avg = texture2D(uEavgLut, vec2(0, uRoughness)).xyz;
  // copper
  vec3 edgetint = vec3(0.827, 0.792, 0.678);
  vec3 F_avg = AverageFresnel(albedo, edgetint);
  
  vec3 fms = (vec3(1.0) - E_o) * (vec3(1.0) - E_i) / (PI * (vec3(1.0) - E_avg));
  vec3 F_add = F_avg * E_avg / (vec3(1.0) - F_avg * (vec3(1.0) - E_avg));

  return F_add*fms;
}

void main(void) {
  vec3 albedo = pow(texture2D(uAlbedoMap, vTextureCoord).rgb, vec3(2.2));

  vec3 N = normalize(vNormal);
  vec3 V = normalize(uCameraPos - vFragPos);
  float NdotV = max(dot(N, V), 0.0);

  vec3 F0 = vec3(0.04); 
  F0 = mix(F0, albedo, uMetallic);

  vec3 Lo = vec3(0.0);

  // calculate per-light radiance
  vec3 L = normalize(uLightDir);
  vec3 H = normalize(V + L);
  float distance = length(uLightPos - vFragPos);
  float attenuation = 1.0 / (distance * distance);
  vec3 radiance = uLightRadiance;

  float NDF = DistributionGGX(N, H, uRoughness);   
  float G   = GeometrySmith(N, V, L, uRoughness);
  vec3 F = fresnelSchlick(F0, V, H);
      
  vec3 numerator    = NDF * G * F; 
  float denominator = 4.0 * max(dot(N, V), 0.0) * max(dot(N, L), 0.0);
  vec3 Fmicro = numerator / max(denominator, 0.001); 
  
  float NdotL = max(dot(N, L), 0.0);        

  vec3 Fms = MultiScatterBRDF(NdotL, NdotV);
  vec3 BRDF = Fmicro + Fms;
  
  Lo += BRDF * radiance * NdotL;
  vec3 color = Lo;
  
  color = color / (color + vec3(1.0));
  color = pow(color, vec3(1.0/2.2)); 
  
  gl_FragColor = vec4(color, 1.0);
}

结果

粗糙度从右至左依次增加，上排是正确实现了 Kulla-Conty 近似的结果，下排是微表面模型不使用 Kulla-Conty 近似时造成的能量损失。

实时光线追踪

如果认为，以路径追踪的方式，计算一次一着色点的直接光照和一次弹射的间接光照为 1 sample per pixel (1 SPP)。那么对于一帧，即使是 RTX 系列的 GPU 也只能做到 1 SPP，此时的噪声会非常夸张。但是路径追踪依旧为我们提供了非常宝贵的间接光照信息，那么接下来工作的重心就在如何降噪上了。降噪分为两个部分，单帧降噪与像素在时间上的复用。

单帧降噪

双边滤波

与其他滤波方式类似，双边滤波也使用周围像素的加权平均值代替中心像素值。重要的是双边滤波的权重不仅考虑了像素的欧氏距离（比如高斯滤波只考虑距离对权值的影响），还考虑了滤波范围域中的辐射差异。
假设 (i, j) 为中心像素 (k, l) 为周围一像素，如果我们希望两像素之间的差异（距离，颜色）越大贡献越小：

$w(i,j,k,l)=exp(-\frac{(i-k)^2+(j-l)^2}{2\sigma _d^2}-\frac{|I(i,j)-I(k,l)|^2}{2\sigma _r^2})$

联合双边滤波

既然现在是在图形学中进行滤波，那么我们可以很轻易地从渲染管线中扒拉出更多有参考价值的信息，比如说任意像素的深度信息、世界坐标、法线方向等，甚至为不同的物体进行编号，只要将这些信息存在一张 Buffer 中即可。
联合双边滤波便是将这些属性的差异也考虑进权重的计算内，滤波核以任何一种会向 0 衰减的函数描述即可。
假设 i 为中心像素 j 为周围一像素，这次用到的滤波核：

$J(i,j)=exp(-\frac{|i-j|^2}{2\sigma _p^2}-\frac{|\widetilde{C}[i]-\widetilde{C}[j]|^2}{2\sigma _c^2}-\frac{D_{normal}(i,j)^2}{2\sigma _n^2}-\frac{D_{plane}(i,j)^2}{2\sigma _d^2})$

其中：

$\widetilde{C}$ ：有噪声的输入图像。
$D_{normal}(i,j)=arccos(Normal[i]\cdot Normal[j])$ ：
某一夹角左右两平面上的像素不应当有太多贡献。
$D_{plane}(i,j)=Normal[i]\cdot \frac{Position[i]-Position[i]}{|Position[j]-Position[i]}$ ：
法线相似但是有高低差的两平面上的像素不应当有太多贡献。

a-trous wavelet transform

假如要遍历完 NxN 个像素会是一个较大的性能开销，这里使用 a-trous wavelet transform 的方式加速采样滤波核内的像素。每一趟固定采样 5x5 个像素，第 i 趟时采样点之间的距离为 $2^{i-1}$ 。
时间复杂度从 $o(n^2)$ 降低为 $o(log _2n)$ 。

实现

float DistanceDifPow(int i_x, int i_y, int j_x, int j_y) {
    return float(pow((i_x - j_x), 2) + pow((i_y - j_y), 2));
}

float CloorDifPow(Float3 color_i, Float3 color_j) {
    Float3 dif = color_i - color_j;
    return dif.x * dif.x + dif.y * dif.y + dif.z * dif.z;
}

float DnormalPow(Float3 normal_i, Float3 normal_j) {
    float cosTerm = std::clamp(Dot(normal_i, normal_j), 0.f, 1.f);
    float acos = std::acos(cosTerm);
    return pow(acos, 2);
}

float DplanePow(Float3 normal_i, Float3 position_i, Float3 position_j) {
    Float3 nor = Normalize(position_j - position_i);
    return std::clamp((float)pow(Dot(normal_i, nor), 2), 1e-5f, 1.f);
}

// 联合双边滤波核
float JointBilateralFilteringKernel(float distanceDifPow, float colorDifPow,
                                    float normalDifPow, float positionDifPow,
                                    float sigma_p, float sigma_c,
                                    float sigma_n, float sigma_d) {

    return exp(-(distanceDifPow / (2 * pow(sigma_p, 2))) -
            (colorDifPow / (2 * pow(sigma_c, 2))) -
            (normalDifPow / (2 * pow(sigma_n, 2))) -
            (positionDifPow / (2 * pow(sigma_d, 2))));
}

Buffer2D<Float3> Denoiser::Filter(const FrameInfo &frameInfo) {
    int height = frameInfo.m_beauty.m_height;
    int width = frameInfo.m_beauty.m_width;
    int kernelRadius = 16;

    Buffer2D<Float3> filteredImage = CreateBuffer2D<Float3>(width, height);
    Buffer2D<Float3> imageBuffer = CreateBuffer2D<Float3>(width, height);
    Buffer2D<Float3> normalBuffer = CreateBuffer2D<Float3>(width, height);
    Buffer2D<Float3> positionBuffer = CreateBuffer2D<Float3>(width, height);
#pragma omp parallel for
    for (int y = 0; y < height; y++) {
        for (int x = 0; x < width; x++) {
            filteredImage(x, y) = frameInfo.m_beauty(x, y);
            imageBuffer(x, y) = frameInfo.m_beauty(x, y);
            normalBuffer(x, y) = frameInfo.m_normal(x, y);
            positionBuffer(x, y) = frameInfo.m_position(x, y);

            // 权重
            float weight = 0.f;
            // a-trous wavelet 的步长
            int step = 1;

            bool useA_trou = true;
            if (useA_trou) {
                // 第一个循环确定 filter 大小，第二第三个循环的内容固定地循环 25 次。
                for (int num = 0; 2 * step <= kernelRadius; num++) {
                    step = pow(2, num);
                    for (int j_y = y - 2 * step; j_y <= y + 2 * step; j_y += step) {
                        for (int j_x = x - 2 * step; j_x <= x + 2 * step; j_x += step) {

                            if (j_y < 0 || j_y >= height || j_x < 0 || j_x >= width) {
                                // 超出图像范围时返回
                                continue;
                            }

                            imageBuffer(j_x, j_y) = frameInfo.m_beauty(j_x, j_y);
                            normalBuffer(j_x, j_y) = frameInfo.m_normal(j_x, j_y);
                            positionBuffer(j_x, j_y) = frameInfo.m_position(j_x, j_y);

                            float distanceTerm = DistanceDifPow(x, y, j_x, j_y);
                            float colorTerm =
                                CloorDifPow(imageBuffer(x, y), imageBuffer(j_x, j_y));
                            float normalTerm =
                                DnormalPow(normalBuffer(x, y), normalBuffer(j_x, j_y));
                            float planeTerm =
                                DplanePow(normalBuffer(x, y),
                                    positionBuffer(x, y), positionBuffer(j_x, j_y));

                            if (width >= 1280) {
                                // pink-room
                                m_sigmaColor = 20;
                            }

                            float exp = JointBilateralFilteringKernel(
                                distanceTerm, colorTerm, normalTerm, planeTerm,
                                m_sigmaCoord, m_sigmaColor, m_sigmaNormal, m_sigmaPlane);

                            filteredImage(x, y) += imageBuffer(j_x, j_y) * exp;
                            weight += exp;
                        }
                    }
                }
            }
            else {// useA_trou == false
                for (int j_y = y - kernelRadius; j_y <= y + kernelRadius; ++j_y) {
                    for (int j_x = x - kernelRadius; j_x <= x + kernelRadius; ++j_x) {
                        if (j_y < 0 || j_y >= height || j_x < 0 || j_x >= width) {
                            continue;
                        }
                        imageBuffer(j_x, j_y) = frameInfo.m_beauty(j_x, j_y);
                        normalBuffer(j_x, j_y) = frameInfo.m_normal(j_x, j_y);
                        positionBuffer(j_x, j_y) = frameInfo.m_position(j_x, j_y);
                        float distanceTerm = DistanceDifPow(x, y, j_x, j_y);
                        float colorTerm =
                            CloorDifPow(imageBuffer(x, y), imageBuffer(j_x, j_y));
                        float normalTerm =
                            DnormalPow(normalBuffer(x, y), normalBuffer(j_x, j_y));
                        float planeTerm =
                            DplanePow(normalBuffer(x, y),
                                positionBuffer(x, y), positionBuffer(j_x, j_y));
                        if (width >= 1280) { // for pink-room
                            m_sigmaColor = 30;
                        }
                        float exp = JointBilateralFilteringKernel(
                            distanceTerm, colorTerm, normalTerm, planeTerm,
                            m_sigmaCoord, m_sigmaColor, m_sigmaNormal, m_sigmaPlane);
                        filteredImage(x, y) += imageBuffer(j_x, j_y) * exp;
                        weight += exp;
                    }
                }
            }
            if (weight > 0.f) {
                filteredImage(x, y) /= weight;
            }
            else {
                filteredImage(x, y) = 0.f;
            }
        }
    }

    return filteredImage;
}

结果

滤波之后亮度提高的原因：事实上第二张图的亮度并没有提高，第一张图的辐射期望也是正确的。第一张图的亮度看起来偏暗是由于路径追踪的方差过大，以至于有大量超过 RGB 色域显示范围的 Irradiance 被直接 Clamp 至 255，产生了能量损失。在第二张图中进行滤波便是将无法显示的数值分配给了周围的像素，使得这部分数值可以正常地被显示出来。

时间上的累积

先将当前帧做单帧滤波，然后把一像素点在上一帧所在位置的像素值按一定权值混合入当前帧。如果说每一帧都能做 1 SPP，那么便几乎相当于一秒内将 30 SPP 的采样结果累积了下来。

投影到上一帧

第一个问题，如何找到一像素点在上一帧所对应的位置，准确来说是：一像素点对应世界坐标变换回上一帧的世界坐标，然后变换至上一帧的屏幕空间的坐标。
公式如下：

$Screen_{i-1}=P_{i-1}V_{i-1}M_{i-1}M_i^{-1}World_i$

其中：

下角标的 i 代表第 i 帧
M 代表物体坐标系到世界坐标系的矩阵
V 代表世界坐标系到摄像机坐标系的矩阵
P 代表摄像机坐标系到屏幕坐标系的矩阵

得到上一帧的屏幕空间坐标后需要检查：

上一帧是否在屏幕内。（该点第一次出现于屏幕内，没有时间上的信息）
上一帧和当前帧的物体的标号。（该点上一帧被另一物体遮挡，不应当使用时间上的信息）

若坐标非法便不使用上一帧的数据，100% 使用当前帧的数据。投影坐标的合法性保存在 m_valid 中供下一步参考。

实现

void Denoiser::Reprojection(const FrameInfo& frameInfo) {
    int height = m_accColor.m_height;
    int width = m_accColor.m_width;
    Matrix4x4 preWorldToScreen =
        m_preFrameInfo.m_matrix[m_preFrameInfo.m_matrix.size() - 1];

#pragma omp parallel for
    for (int y = 0; y < height; y++) {
        for (int x = 0; x < width; x++) {

            int id = (int)frameInfo.m_id(x, y);
            if (-1 != id) {
                Matrix4x4 invModel = Inverse(frameInfo.m_matrix[id]);
                Matrix4x4 preModel = m_preFrameInfo.m_matrix[id];
                Matrix4x4 PVMM_1 = preWorldToScreen * preModel * invModel;
                Float3 projPos = PVMM_1(frameInfo.m_position(x, y), Float3::Point);

                //Float3 invPos = invModel(frameInfo.m_position(x, y), Float3::Point);
                //Float3 prePos = preModel(invPos, Float3::Point);
                //Float3 projPos = preWorldToScreen(prePos, Float3::Point);

                int pre_x = (int)projPos.x;
                int pre_y = (int)projPos.y;
                if (pre_x < 0 || pre_x >= width || pre_y < 0 || pre_y >= height) {
                    m_valid(x, y) = false;
                    continue;
                }

                // 在 pre_x 或 pre_y 超出屏幕时 m_id(pre_x, pre_y) 属于违法操作。
                int pre_id = (int)m_preFrameInfo.m_id(pre_x, pre_y);
                if (pre_id != id) {
                    m_valid(x, y) = false;
                    continue;
                }

                m_misc(x, y) = m_accColor(pre_x, pre_y);
                m_valid(x, y) = true;
            }
            //m_valid(x, y) = false;
            //m_misc(x, y) = Float3(0.f);
        }
    }
    std::swap(m_misc, m_accColor);
}

混合入当前帧

将已降噪的当前帧 $\overline{C}_i$ 与上一帧的结果 $\overline{C}_{i-1}$ 进行融合，公式如下：

$\overline{C}_i=\alpha \overline{C}_i+(1-\alpha)Clamp(\overline{C}_{i-1})$

如果上一帧的坐标非法， $\alpha$ 便为 1。

Clamp

计算 $\overline{C}_i$ 在 5x5 领域内的均值 $\mu$ 与方差 $\sigma$ ，然后将上一帧的颜色值 Clapm 进 $(\mu -k\sigma,\mu +k\sigma)$ 内。

实现

void Denoiser::TemporalAccumulation(const Buffer2D<Float3> &curFilteredColor) {
    int height = m_accColor.m_height;
    int width = m_accColor.m_width;
    int range = 2;

#pragma omp parallel for
    for (int y = 0; y < height; y++) {
        for (int x = 0; x < width; x++) {
            /* TemporalAccumulation 在 Reprojection 之后被调用，
            所以这里的 m_accColor 是上一帧的结果。*/
            Float3 preColor = m_accColor(x, y);

            Float3 mu(0.f);
            for (int i = x - range; i <= x + range; i++) {
                for (int j = y - range; j <= y + range; j++) {
                    mu += curFilteredColor(i, j);
                }
            }
            mu /= (float)pow(2.f * range + 1.f, 2);

            Float3 sigma(0.f);
            for (int i = x - range; i <= x + range; i++) {
                for (int j = y - range; j <= y + range; j++) {
                    sigma += Sqr(curFilteredColor(i, j) - mu);
                }
            }
            sigma /= (float)pow(2.f * range + 1.f, 2);

            if (width >= 1280) {
                // pink-room
                m_colorBoxK = 0.1f;
            }

            // 将上一帧对应的像素的值拉近当前帧像素
            preColor = Clamp(preColor,
                mu - sigma * m_colorBoxK, mu + sigma * m_colorBoxK);

            float alpha = 1.0f;
            if (m_valid(x, y)) {
                alpha = m_alpha;
            }
            m_misc(x, y) = Lerp(preColor, curFilteredColor(x, y), alpha);
            //m_misc(x, y) = Lerp(preColor, curFilteredColor(x, y), 1.0f);
        }
    }
    std::swap(m_misc, m_accColor);
}

结果

#RayTracing #GlobalIllumination #GLSL #JavaScript #Shadow

光栅化软渲 Simple Renderer 上一篇

辐射度量学基础下一篇