Duong Hoang   —   Code

I hold a PhD in computer science from the University of Utah. My areas of expertise include data compression, scientific visualization and analysis, and computer graphics. In general, though, I just like solving interesting problems in all areas of computer sciene and (applied) mathematics.

Contact me via: email, GitHub, Twitter, or LinkedIn. Here are my resume and Google Scholar profile.

2019
Linear-Lifting Wavelet Extrapolation
Since wavelet transforms are computed for powers of two, their practical application often require extending the signal to suitable lengths. Whereas several approaches exist for this extension (e.g., zero padding, linear extrapolation, and symmetric extension), each is associated with different types of boundary artifacts, such as discontinuous and nonsmooth signal that lead to large wavelet coefficients, which may or may not be conducive to an application. We present a new, linear-lifting approach to extend the input function at the boundary that avoids such artifacts and helps contain the number of unneccesary cells near the boundary. Specifically, our approach interleaves linear extrapolation with lifting steps to produce a function that is smooth across all dimensions. Unlike mirror-based extensions, linear extrapolation renders the lifting steps noninvertible, because to maintain the size of the data, the extrapolated coefficient is usually not kept in memory after the forward lifting step. In order to support perfect reconstruction, we choose to pay the extra storage cost and maintain the coefficient in memory. However, in practice, the inverse lifting steps are never performed, and thus the extrapolated function is never explicitly computed or stored.
2018
Arithmetic Coding
This is an arithmetic coder implemented in the D language. Arithmetic coding is a form of entropy encoding for lossless data compression. It was my first "serious" program written in D; I like its expressiveness and saner metaprogramming (compared to C++). The language is similar enough to C++ that you would have virtually no problem understanding it if you are already familiar with C++.
2017
SPECK Wavelet Encoder
This is my C++ implementation of the well-known set-partitioning embedded block coder (SPECK) for compression of wavelet coefficients. SPECK was originally designed to compress images. My implementation deals instead with high-precision scientific data, stored as 3D arrays of floating-points. I wanted to benchmark the performance of the SPECK algorithm, but could not find an implementation online that has all the features I needed, so I implemented it myself. The implementation follows very closely the description of the algorithm in the original paper by Pearlman et al. The relevant code is in the file speck.h, but the zip file also contains other stuffs related to wavelet transform.
2017
hana: Fast and Lightweight Reader and Writer for the IDX Data Format
hana is a library to read and write the IDX binary format. IDX is a data format that supports fast, progressive multiresolution queries for large 3D/4D scientific data stored as regular grids. hana can read IDX data only in a given region of interest, at a given resolution level, and store the result as a regular gird in memory in row-major order. Given a raw input array, hana can also convert this array into the IDX format and store the converted data to disk, with optionally ZIP compression. Compared to OpenVISUS, the official I/O code for IDX, hana is a lot more lightweight and requires no dependencies other than the C++ STL. It also runs quite a bit faster, and supports a more flexible progressive API for multiresolution queries. OpenVISUS should be used, however, if more features are desired, such as the ability to stream data from a remote server, or more compression methods such as JPEG. The original reference for IDX is "Global Static Indexing for Real-Time Exploration of Very Large Regular Grids", by Pascucci and Frank.
2016
bi-Laplacian Interpolants
Given a 2D field sampled on a regular grid, we construct a multiresolution hierarchy by the following process. We first group the grid samples into blocks, then subsample the blocks by throwing away every other block along an axis to create the next coarser resolution level. The subsampling process can be done recursively on the subsequent resolution levels, alternating between the axes. The reason the downsampling is done this way is so that we can still perform effective compression within each block due to data coherency. (In a regular subsampling process, the data points at coarse resolution levels are further and further apart, thus are less coherent, which can severely hurt compression.) During reconstruction, however, we need to solve the "hole-filling" problem, since now a number of blocks are missing and will need to be filled in by interpolation. I have implemented an interpolation method by solving the bi-Laplacian (or bi-harmonic) constraints. Basically, if x is a known sample, we want the reconstructed function to interpolate x. If x is an unknown sample, we wish to set its Laplacian or bi-Laplacian to zero.

In this example, I used a 384 x 384-pixel slice from a 3D simulation dataset. This is produced by solving the bi-Laplacian equation on the subsampled data. I used a bi-Conjugate gradient solver with an incomplete LU preconditioner (without which the iteration did not converge). The initial guess is one in which the known samples are there, and the unknown ones are set to 0. The number of iterations depend on the stopping tolerance, but is typically between 10 and 20. We can see that the top and right boundaries are not resolved well. This is because we don't have known samples at these boundaries. In general, I noticed it becomes much harder for the iteration to converge every time we go down one level in resolution. I had to tweak the preconditioner to make it more aggressive. It may be interesting to see if a better initial guess (with bilinear interpolation for example) would help with convergence.

2014
Global Illumination with Photon Mapping
I used photon mapping with final gathering to render this scene. There are a couple of difficulties with the rendering. The area light outside the window is difficult to sample, so I stored the first bounce in the photon map and did not sample the light directly for the Monte Carlo bounce (the light is still sampled directly for the first primary-ray bounce though). The corners and creases of the scene make it hard for photons to get to, and those areas have the most noise in the final rendering. I used the Cozy Living Room model from TurboSquid.com. The scene comes with texture maps and materials for all objects. I rearranged the scene a bit, added a teapot, a stack of mangas (Love Hina), an open book, a poster for my favorite anime Spirited Away, and a red U block which I modeled in Blender. I had to spend quite a bit of time fixing the scene (mostly inverting normals) using a combination of Blender and MeshLab.

There are two light sources, both coming from outside through the window. One of them is a strong sun light, the other is a weaker light that represents light coming from the outside environment. I set the angle and intensity of these light sources to make the scene have the look of a morning at 9am. The window blinds prevents much of the light to come in, thus a large part of the scene is lit by indirect light, giving it a soft, smoothing feel. I used Blinn-Phong shading model with a Fresnel reflection term for all objects here. I separated the direct and indirect lighting so that I didn't have to use 32 (Anti-Aliasing) x 512 (Monte Carlo) samples for each pixel. I also played a bit with pre-filtering the photon map but did not get very far with it. Pre-filtering didn't seem to help in my case (I used a hash grid instead of a kd-tree to store the photon map, so a nearest-point query is not quite cheaper than an intersecting sphere query, and is awkward to do).

2014
Path Tracing on the CPU
This is my multithreaded CPU path tracing code that supports area lights, soft shadows, glossy reflection, refraction, caustics, texture filtering, anti-aliasing with ray differentials, sub-pixel adaptive sampling, quasi-Monte Carlo sampling using the Halton sequence, and reconstruction filtering. The code may be useful to you if you are new to computer graphics and want to learn the basics of ray/path tracing.
2014
Motion Graph
In this project, we implement a basic motion graph, and test it on two different datasets. Given a set of motion data in BVH file format, our program constructs a directed graph where each node is a motion frame and each edge connects either adjacent frames in the original motions, or frames in which the character has similar poses. We use the similarity metric and interpolation scheme described in [KGP02]. This similarity metric does not use joint angles, thus we can avoid assigning weights to the joints. More importantly, it implicitly takes into account differences in not only positions but also velocities and accelerations. We interpolate using a window of frames to ensure smooth transitions. By walking randomly in the graph, we can generate random, arbitrarily long motions, comprising of different short motion sequences in the database. Our character never gets stuck because our graph is pruned to get rid of dead ends. Due to time constraint, we are not able summarize the graph by clustering, address the foot skating problem due to interpolation, nor implement any graph search algorithms to constraint the synthesized motions.
2014
Inverse Kinematics
using Damped Least Squares
In this project, I implement an inverse kinematics (IK) solver using the damped least squares (DLS) method. Compared to other popular methods for solving IK problems such as Jacobian transpose or pseudo-inverse, DLS is more stable near singularities, which means the animation is less jerky when the arms are fully stretched. The solver supports any combination of translational joints (1-dof) and rotational joints (1/2/3-dof). It can track multiple targets at the same time, while maintaining realtime framerates (60 FPS) for moderately complex bodies. Simple, per-dof joint limits are supported. The solver can also maintain a constraint on the horizontal position of the system’s center of mass. The code is written in C++, using some of C++11's features. To compile (on Windows), just use the provided Visual Studio 2010's solution file. All external libraries are bundled in.
Reference: Samuel R. Buss and Jin-Su Kim, Selectively Damped Least Squares for Inverse Kinematics, Journal of Graphics Tools, 10:37–49, 2004.
2013
ImageMapper: Image-Layout Matching
for Failure Analysis of Chip Fabrication
Part of conventional CAD layout navigation in failure analysis work involves identifying defect regions and overlaying the physical image from the microscopes to a CAD database that links the design layout and its logical circuit schematic, along with net list navigation. Current manual alignment by feature points is inaccurate, slow and error-prone. With design layouts becoming increasingly more complex, this difficulty poses challenges to the productivity of failure analysis labs’ workflows. ImageMapper, solves this problem by enabling fully-automatic search and alignment of inspection images to IC design layouts. ImageMapper has been deployed in Avalon, a CAD navigation software system for failure analysis from Synopsys, one of the world's leading EDA companies. The key features of ImageMapper are:
  • Automatically matches and aligns die image to design layout
  • Aligns images of unknown scale, translation, rotation, and skew
  • Handles a wide range of image magnification
  • Works with images that show multiple layers of the chip
  • Supports SEM, FIB, and Optical Microscope images
  • Detects repeated patterns
  • Is robust to fabrication defects and image noise

I was initially the sole developer for this software, and later lead a small team to continue developing during my employment at Core Resolution, a Singapore-based software startup. ImageMapper solves a unique problem of matching an image to a design layout (which is not an image but rather a set of polygons). I designed and implemented all the algorithms employed for this task. Since this is a commercial software, there is no public code or documentation of the algorithms that I can publicly share.

2012
Assorted Interactive Visualization Codes
in Processing
These are small programs I wrote when I was learning the Processing language (it is basically Java). They range from simple shap drawing to somewhat complex animations using various pixel effects. Most of the programs are interactive. I still use Processing from time to time to make interactive technical demonstration for research ideas.
2012
Smoothed Particle Hydrodynamics Demos
I implement a Smoothed Particle Hydrodynamics (SPH) method to simulate water. My implementation follows closely the method described in [MCG03]. The code also includes other performance optimizations, such as Z-indexing and sorting [IABT11], SIMD vectorization, and multicore parallelization. I test the implementation with six test scenes simulating different situations involving water, and show that our system is efficient and scalable.
  • Dam break. A column of water initially occupying half of tank is collapsing, creating back and forth waves from one side of the tank to the other.
  • Water drop. A volume of water is initially put in the air, above a water bed. It then falls down, creating some waves.
  • Sink. A volume of water is falling down to the level below through a hole in the center.
  • Wave. A tank initially contains some water. One of the walls move back and forth periodically, creating large waves to the other side.
  • Ball. A ball is dropping on a volume of water. The ball then move back and forth periodically, pushing the water around.
  • A column of water is put into a container with the U shape ( ). The water flows from one side to the other until equilibrium is reached.
In the video, for every scene, I use 6400 particles inside a volume of roughly 0.008 m^3. However, not all scenes use the same set of values for the parameters. For performance evaluation, I run the dam break scene on an Intel Core2 Duo 2.0GHz system with 2GB DDR2 RAM and a GeForce 8600M GT graphics card. With 6400 particles the frame rate is 22.15 fps, with 24000 particles, the demo runs at 4.85 fps. These results show that the performance linear with respect to the number of input particles.
2012
Assorted MATLAB Code for
Computational Photography Papers
Colorization I implement the colorization algorithm described in the paper by Anat Levin. The code is a refactored version of the author’s code. I remove seemingly redundant code, make variables’ names more readable, add comments to describe each step, and implement another weighting function. The code follows the paper closely, except for some tunable parameters which are important but never discussed in the paper. Reference: Colorization Using Optimization, Anat Levin, Dani Lischinski, Yair Weiss, ACM Transactions on Graphics, Volume 23 Issue 3, Pages 689 - 694

Flash/No flash I implement the Flash/No-Flash technique, following the paper very closely. The only difference is in the calculation of shadow mask. I use a simple heuristics: pixels that are in shadows caused by the flash have intensities below some threshold. Reference: Digital Photography with Flash and No-Flash Image Pairs, Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, Kentaro Toyama, ACM Transactions on Graphics, Volume 23 Issue 3, 2004, Pages 664 - 672

Seam Carving I implement the dynamic programming algorithm described in the seam carving paper. The implementation follows the paper exactly. I do not implement the part where users can edit the energy field. Image enlarging seems to be glossed over in the paper. My algorithm enlarges an image by selecting k seams out of nk smallest seams (n = 1, 2, 3, etc). When tracing back the k seams, if any two of them “merge” at some stage, I push them further away from each other. This is to prevent the stretching effect when the k seams are very close to one another. Reference: Seam Carving for Content-Aware Image Resizing, Shai Avidan, Ariel Shamir, ACM Transactions on Graphics, Volume 26 Issue 3, 2007, Article No. 10

Focal Stacking My focal stacking algorithm works by doing a deblur on a weighted sum of the input image stack. For each pixel in each input image, a sharpness value is computed by summing the squared gradients in x and y directions. My algorithm then sums all input image using the sharpness values as weights. It’s as if the result image was taken using a camera with a translating detector described in Hajime Nagahara’s paper. This image can then be deblurred with a PSF that has a similar shape to the one used in that paper (see the Results section). The Mask image is computed using the same weights, but with original pixel values being an input image’s position in the stack. Reference: Flexible Depth of Field Photography, Hajime Nagahara, Sujit Kuthirummal, Changyin Zhou, Shree K. Nayar, Proceedings of the 10th European Conference on Computer Vision (ECCV '08'): Part IV, Pages 60 - 73

Tone Mapping I implement four different tone mapping methods. The first one is a linear map. The second one is Erik Reinhard’s log-average method based on equations (1) and (2) in his paper. The third one is Reihard’s global operator (equation (3)). The fourth method is Reinhard’s dodge and burn method (equation (5), (6), (7), and (8)). All methods except the first one follow the paper exactly. Reference: Photographic Tone Reproduction for Digital Images, Erik Reinhard, Michael Stark, Peter Shirley, James Ferwerda, ACM Transactions on Graphics, Volume 21 Issue 3, 2002, Pages 267 - 276

2010
General Purpose Computation on GPU
The ZIP file contains some GPGPU codes:
  • GLSL vertex and fragment shader to perform procedural bump mapping and reflection mapping
  • GLSL fragment shader to compute the inclusive all-prefix-sums (inclusive scan) of an input array on the GPU
  • CUDA kernels to perform discrete convolution
  • CUDA kernels to compute a sorted version of an input integer array with all the duplicate elements removed, using stream compaction and the scan-and-scatter approach