Human-centric Computing and Information Sciences volume 12, Article number: 26 (2022)
Cite this article 1 Accesses
https://doi.org/10.22967/HCIS.2022.12.026
Shadows are critical for realistic and visually appealing images. In particular, for interactive applications such as video games, the shadows of objects significantly increase the realism of the virtual environment. However, for scenes in which a large number of characters are animated, it is not easy to generate shadows in real time because the shadow of each character must be rendered individually. This study introduces a simple yet efficient technique for generating shadows for large crowds. This technique considers the space occupied by each character in the final image. If the space is significantly small, the algorithm applies a technique that uses a simple and fast yet physically correct projection shadow matrix to generate shadows. If the space is relatively large, which means that the character is close to the camera, the algorithm applies a widely known shadow mapping technique. This technique is slower than the projection shadow matrix but is more physically practical. The rendering of large crowds is achieved using the GPU instancing technique. Because the algorithm adaptively choses shadow generation techniques according to their visual saliency, the overall performance improves. Through several experiments, we demonstrated that our algorithm can generate shadows for more than 20,000 characters, all of which have a full-skinned mesh, at a real-time rate while maintaining the visually plausible shadow quality, with the rate being approximately 54% faster than the traditional shadow mapping method.
Human-Computer Interaction and Multimodal, Computer Animation, Crowd Simulation, Shadow Generation, Shadow Mapping
Shadows are important for creating synthetic scenes because they provide visual cues to clarify the geometrical relationship between light sources and objects [1]. For applications such as video games, shadows provide a more realistic atmosphere to the virtual environment because they establish spatial relationships between the objects in a scene. This increases the immersive feeling of the game for users. In addition, it can provide an important measure for the environmental quality of public space, such as buildings in real life [2].
Several techniques have been proposed to generate shadows of virtual objects. An excellent surveycan be found a previous study [1, 3–5]. Traditional global illumination algorithms such as ray tracing can solve the shadow generation problem implicitly [6, 7]. In particular, a previous study [7] proposed a baked technique that precomputes the global illumination data including shadow to reduce the computational overhead. By contrast, the rasterization pipeline requires a special technique to generate shadows because it is difficult to distinguish whether a particular pixel belongs to a shadow. Many of such algorithms are based on the so-called shadow map [8], which contains the closest distance from the light source to the surface of the objects in the scene. Given this map, a simple distance comparison was performed after all pixels were transformed into the light source space. However, the quality of shadows generated by the shadow map depends heavily on itself because it is only an image with finite resolution. Therefore, several nagging aliasing artifact problems still exist. Many additional works or variations have been proposed to reduce aliasing [9–11]. In this study, we address a target scene in which many three-dimensional (3D) characters are animated and rendered. For example, a big battle scene in which two camps are fighting with each other or a sports scene in which crowds are entering or exiting a stadium would require numerous 3D characters [12, 13]. Numerous characters imply more than 10,000 characters, and each character has a fully skinned mesh that animates over time. Conventional shadow mapping techniques are not adequate for this scene because a considerable amount of computational overhead is required to animate and render each character. In particular, when we assume that each character is fully skinned, animating and rendering takes time because all the vertices of the mesh must be updated during every time frame.
Among the many variations of the conventional shadow mapping technique, cascaded shadow mapping (CSM) was proposed by NVIDIA [14]. In this technique, rather than using a single shadow map, several shadow maps are used depending on the distance from the camera. In our approach, we modified the CSM technique so that it could be optimized for crowd scenes. The logic behind selecting the CSM method over other shadow generation techniques are as follows: for a crowd scene, the camera is sometimes located relatively far away from the crowds so that it can capture all of them. This type of camera setup makes some characters appear extremely small on the screen. For these small characters, highly detailed shadows are not required because shadows take only a few pixels in the final image. This approach is similar to the original CSM method. In the CSM method, to overcome the aliasing problem caused by the fixed resolution of the shadow map, multiple shadow maps are used instead. When the characters are closer to the camera, a highly detailed shadow map with a small field-of-view is used. When the characters are far from the camera, a shadow map with a wide field-of-view is used, which is a rough map. However, because multiple shadow maps must be created, the overall performance is not efficient. Our basic idea is to combine the existing CSM technique with a simple projected shadow matrix technique according to character’s importance. Although the projected shadow matrix can quickly generate a shadow of the character, it only creates a hard shadow. This drawback was mitigated by filtering. These shadows are sufficient for characters located far from the camera. In our interpretation, the important value is the extent to which the shadow takes space in the final rendered image. To approximate the shadow size of each character, we placed a bounding sphere on each character and projected the sphere on the final image. Thereafter, the area of the projected ellipses was calculated. The area was then adjusted according to the angle of the light source. All computations and rendering were performed on the GPU side for optimization.
Through several experiments, we verified that the proposed shadow generation algorithm can be at least 50% faster than the conventional method, and this improvement increases with an increase in the number of characters. For a maximum of 20,000 characters, we can render all crowds and their shadows in real time. The main contributions of the present study are as follows:
New rendering scheme: we introduced a new rendering scheme for a large number of 3D characters based on their importance, which is the space that each character takes in the image plane.
Hybrid shadow generation technique for large crowds: we used two methods and selected one of them automatically for each character. In this process, the parameters for the CSM method were set automatically according to the scene.
Shadow generation has been a widely researched topic in the computer graphics community for several decades [4]. Not only does the shadow upgrade the visual quality of presence of virtual objects, but also becomes an important measure for real life problem such as a right of light. For example, some researchers use the virtual shadow of 3D building built with computer graphics to estimate and visualize how much light enters into a specific region of building [2].
We can classify shadows into two types: hard and soft. The hard shadow has a sharp border outline and includes fully shadowed regions without any soft edges [3]. In contrast, the soft shadow has smooth edges across the shadow. Soft shadows are preferred because they are generally similar to real shadows in real life [5, 7]. To generate soft shadows from a light source, the shadow mapping technique has been used traditionally [4, 8, 15]. This algorithm operates in two stages. The entire scene must be rendered from the light position during the first stage and then rendered again from the viewpoint of the camera. If a pixel is observed in the second rendering phase, but not from the first rendering, it is located in the shadows. A simple comparison between the value of the shadow map and the depth of the pixel determines whether a particular pixel is inside the shadow or not. Although shadow mapping is powerful and convenient, it has an intrinsic limitation in that the result from the first stage must be stored in a texture with a finite resolution, which causes serious aliasing problems if the resolution is not significantly large [8]. To overcome the aliasing, if we increase the resolution of a shadow map substantially, it causes a significant memory use. Some researchers have proposed a compression technique of the shadow map to reduce the memory use [16] instead. However, it requires an extra computation time to compress the shadow maps.
Filtering- or image-based methods have been used to overcome this limitation practically. The filtering method, which is called percentage closer filtering (PCF) [9], undergoes a series of comparisons per pixel and is averaged together. The filtering-based method can generate smooth shadows; however, it is not efficient because multiple comparisons must be performed for each pixel, which incurs computational overhead. Instead, Macedo et al proposed a vectorizing technique that applies a set of PCF and smoothing technique to convert the jagged shadow map into a smoothed edge [17], but still requires an additional computation time for each pixel, which is not easy to achieve the real time performance especially when we need to render a large number of 3D objects.
Some of the popular image-based methods include variance soft shadow mapping (VSSM) or CSM. The VSSM method is based on the statistical moment value of the shadow map and applies the Chebyshev inequality to approximate the fraction value, which determines whether a pixel becomes a shadow [10]. Going one step further from VSSM, the moment shadow mapping method is turned out to create high-quality shadows more efficiently using higher-order statistical values of the shadow map [18]. Those variant methods can generate shadows with minimizing computation overhead but still have a same limitation for large number of moving objects.
As another approach, CSM, is useful for wide scenes, such as large terrains and forests [14]. Because the crowd scene also needs a wide scene in general, we chose the CSM for our basic algorithm but modified it to support a large crowd. Basically, this technique subdivides the user’s view frustum into several partitions and obtains different shadow maps for each partition. There are several methods for splitting view frustum [19, 20]. In the CSM approach, the sampling density decreases for each successive partition because the same number of shadow map samples covers a larger area [14, 20]. Among multiple shadow maps, the algorithm automatically chooses one pixel based on its distance from the camera. Please refer to the paper in [14] and [19] for more details. In recent years, inspired by large emergence of machine learning techniques, many approaches have been proposed for solving computer graphics problem with deep learning method. Liu et al proposed a shadow-specific generative adversarial network (GAN) model for a single light source [21]. A similar technique was also proposed by [22]. Although these techniques are able to synthesize artificial shadow, they require large amount of training data, which is not easy for practical purpose.
One of the important problems of all previous work is that they have not considered the large number of moving objects and camera setup. In crowd simulation, large number of 3D characters are moving simultaneously, and each character is fully meshed, which requires additional computational power. Therefore, shadow generation must be light enough so that it does not affect the overall performance while maintaining the good visual quality. We address the problem by mixing up two methods, which are the CSM and shadow matrix method, in a seamless manner. One important aspect that must be considered while creating shadows of crowds is that each character may be considerably small on the final rendered image in general camera setup [12]. Therefore, a computationally expensive method is not required to create a shadow for such a small character. We divide the entire crowd in the final image into two classes, and thereafter apply different shadow generation methods for each class. The classification is based on the area in which each shadow occurs in the image plane. This approach is similar to the foveated rendering technique in virtual reality in some sense [23, 24]. Foveated rendering uses an eye tracker to reduce the rendering workload [23]. The eye tracker keeps tracking the user’s gaze, and its position provides a clue to render multiple rendering layers progressively from high resolution to low resolution. Such an attention model provides an important clue for crowd rendering [25]. In our approach, although we used multiple rendered layers for different sizes of characters to combine the final image, each layer had exactly the same resolution.
$|p-o|=r^2$(1)
where $o$ is the center of the sphere and $r$ is the radius of the sphere. Radius r must be sufficiently large to include the four limbs of the character in the sphere. If we cast a ray from the camera to the image plane in a pinhole camera model, in which the image plane is located between the camera and the scene, the ray goes to point $p$ on the sphere as it intersects the image plane at (x, y, 1) if we assume that the focal length is equal to 1,$p=td,$
$d=(x,y,1)/\sqrt{x^2+y^2+1}$(2)
$|(td)-o|^2-r^2=0$(3)
After arranging Equation (3) with respect to t, the quadratic equation $At^2+Bt+C=0$ can be obtained. The coefficients A, B, and C of the equation are described as follows, where (∙) represents a dot product.$\begin{cases} A=1 \cr B=-2o∙d \cr C=|o|^2-r^2 \end{cases}$(4)
The only condition that this quadratic function has solutions is when its discriminant is greater than zero. From this fact, if we expand the discriminant, which has a $B^2-4AC$ form, and arrange it, we obtain the following inequality:$(o∙d)^2-(|o|^2-r^2)≥0$(5)
By substituting $d$ defined in Equation (2) and eliminating positive constants in Equation (5), we would obtain the following equation:$(o_x x+o_y y+o_z )^2-(x^2+y^2+1)(|o|^2-r^2 )≥0$(6)
It turns out that the Equation (6) is the general implicit form of an ellipse. If we organize all terms of the equation, we obtain Equation (7).
$ax^2+by^2+cxy+dx+ey+f≥0,$
$a=r^2-o_y^2-o_z^2$
$b=r^2-o_x^2-o_z^2$
$c=2o_x o_y$
$d=2o_x o_z$
$e=2o_y o_z$
$f=(r^2-o_x^2-o_y^2)$
(7)
$max= \sqrt{\frac{2(\frac{ae^2-cde+bd^2}{4ad-c^2}-f)}{(a+b-\sqrt{(a-b)^2+c^2}}},min=\sqrt{\frac{2(\frac{ae^2-cde+bd^2}{4ab-c^2}-f)}{(a+b+\sqrt{(a-b)^2+c^2}}}$(8)
If we substitute all coefficients from a to f in Equation (8) and simplify them, we obtain the final major and minor axes as follows:$Min= \sqrt{\frac{-r^2 (r^2-|o|^2)}{(|o|^2-o_z^2)(r^2-o_z^2)(r^2-|o|^2)}},Max= \sqrt{\frac{-r^2 (r^2-|o|^2 )}{(|o|^2-o_z^2)(r^2-o_z^2 )(r^2-o_z^2)}}$(9)
Once we know Min and Max, projected area A can be simply calculated as follows:$A= πMin∙Max$(10)
Because our ultimate goal is to approximate the shadow size of the character in the image plane, rather than the character size, we adjusted the areas of the projected bounding spheres by the current light direction with the plane. If we assume that the plane is flat, the modified area can be calculated as follows:$A=A ∙ tan(l∙n)$(11)
where l is the unit light direction vector and n is the normal vector of the plane.(12)
$C_i=λC_i^{log}+(1-λ) C_i^{uni} \mkern18mu 0<λ<1$
$\begin{cases}C_i^{log}=n(f/n)^{i/m} \cr
C_i^{uni}=n+(f-n)i/m\end{cases}$
(13)
$f= \max\limits_{i∈k}|p_i|$
$n= \min\limits_{i∈k)}|p_i|$
(14)
$x_n=C_i tan(θ/2)$
$x_f=C_{i+1} tan(θ/2)$
(15)
$z=\overline l,$ $x=u×z,$ $y=z×x$
where u is the up vector and $\overline l$ is the normalized light vector(16)
$\hat x_i=V^l x_i, \mkern18mu 1≤i≤8$ (17)
Once we find the $ \hat x_i$, a simple minimum or maximum comparison over the eight points can find the new eight corner points of the frustum ($x_1'-x_8'$). For each view frustum in the cascade, we rendered the crowds in $S_i$ to store their depth values. These depth maps were then passed to the fragment processor to determine whether a pixel belonged to a shadow.$c=C_a+(1-w_{shadow}) C_{ds}$(18)
where $c_a$ is the ambient color, and $c_{ds}$ is the summation of the diffuse and specular colors. The rendering result was also saved into another texture, $T_3$ for the final blending step.
We built a software system to verify the proposed algorithm. The hardware setup included a PC with an Intel i7 CPU, 32 GB main memory, and an NVIDIA GeForce RTX 2080 graphics card. We used the GLSL (OpenGL Shading Language) version 4.6 for implementing the GPU shaders and C++ for CPU coding. For character animation, we used two different speed running animations coded in Autodesk’s FBX format. The classification step was implemented using a compute shader. The compute shader invokes as many threads as the number of characters. Then, each thread computes the area of the projected ellipse of a character separately. Finally, a designated thread calculates the number of characters in $L_i$ and $S_i$. The shear storage buffer object structure was used to store the model matrices of each character [28]. This array structure can be accessed in the GPU and copied to the CPU side as well.
The first experiment was performed to check the visual quality of the proposed method against the pure CSM-only method. Fig. 5 depicts two cases of 1000 crowds. For a better understanding, please refer to the accompanying video. As we can see in the picture, when the characters are located far away from the camera, their space on the image plane is quite small. Therefore, even though we apply the simple projection matrix technique, the synthesized shadows do not look bad, which is almost indistinguishable from the CSM shadow perceptually.
The second experiment was conducted to check the performance between the proposed method and the CSM-only method as we increased the number of characters. For this experiment, we increased the number of characters from to 1,000–20,000. The camera was placed at a fixed position so that it could see the crowds from close to distant look simultaneously. The light was rotated around a circle above the plane to test the dynamic shadow. Figure 6 depicts a graph comparing the proposed method with CSM-only and moment shadow mapping (MSM) methods [18]. The resolution of shadow maps is equal for different methods (1024×1024). We found that the proposed method was 54.8% faster on average than the CSM-only method, although the speed increased as the number of characters increased. Fig. 6 also lists the numerical data for the second experiment. All numerical measurements are the average values after 10 repetitions.
In this study, we propose an efficient shadow generation algorithm for large crowds. The basic idea of the proposed method is to divide the whole crowd into two groups and apply different methods for each group based on the space occupied by them on the final rendered image. Because we did not use any complicated methods for barely visible distant characters, the overall performance increased without degrading visual quality, even though we used the simple projective matrix method. Nevertheless, there are a few limitations to the proposed method. First, the simple projective matrix method does not consider the inter-shadow between characters. That is, when two characters are sufficiently apart, their shadows may shade each other. Because our shadow textures are generated without considering other nearby characters, it is not possible to simulate this effect. However, because these characters were positioned very far away from the camera, we believe that the inter-character shadows are not visually important. Second, the proposed technique assumes that the plane on which the crowds are moving is flat. This is the same reason as in the first limitation. Although the shadow mapping technique can handle non-flat ground, the projective shadow matrix technique, on the other hand, is difficult to deform the shadow mesh for non-flat ground. In the future, we intend to develop a method for non-flat planes. However, we must consider the trade-off between performance improvements over the computation cost in this process, especially for very small characters on the back side.
The overall performance depends on the classification parameters. Parameter t controls the ratio of the number of characters between the two groups. As the value increased, the overall performance also increased. In future work, we can set the threshold automatically according to scene analysis. Another possible extension is to apply the proposed technique to a multicore CPU device. Because all experiments were performed using OpenGL, the multicore CPU was not supported. Current 3D graphics APIs, such as Vulkan, support multicore CPUs. Therefore, we would like to test our algorithm on the device and determine the extent to which the performance can be improved.
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2018R1D1A1B07048414 and 2021R1A2C1012316).
The author declares that he has no competing interests.
Name : Mankyu Sung
Affiliation : Keimyung University
Biography : Mankyu Sung received his B.S in Computer Sciences from ChungNam National University, Daejeon, Rep. of Korea, in 1993 and his M.S and Ph.D in Computer Sciences from University of Wisconsin-Madison, WI, USA, in 2005. From January 1995 to July 2012, he worked for Digital Contents Division of ETRI, Daejeon, Rep. of Korea. He has been an assistant professor with Dept. of Game & Mobile, Keimyung University, Daegu, Rep. of Korea since Mar. 2012. His current research interests include Computer Graphics, Computer Animation, Computer Games and Human-Computer Interaction. He is a member of the ACM.
Mankyu Sung*, Efficient Shadow Generation of Massive Crowds Based on the Area of Screen Space, Article number: 12:26 (2022) Cite this article 1 Accesses
Download citationAnyone you share the following link with will be able to read this content:
Provided by the Springer Nature SharedIt content-sharing initiative