Duong Hoang's personal page

2023 PhD thesis

Compact and Progressive Encodings for Task-Adaptive Queries of Scientific Data

Duong Hoang

PDF

As computational power outpaces bandwidth, handling multi-petabyte datasets, which are easier to produce than to store or access, becomes a challenge. Traditional solutions focus on data reduction through lossy compression or downsampling, but many overlook considerations like the scientific tasks to be performed on the data or the computational resources required. Additionally, there’s a lack of a common framework to gauge these techniques’ efficacy and costs. This dissertation introduces such a framework and presents novel data reduction techniques offering more holistic considerations and compelling trade-offs. A novel hierarchical data model that unifies spatial resolution and numerical precision is developed. Most current reduction techniques focus on either resolution or precision, but this new model allows for approximations anywhere in the 2D space of resolution and precision, offering refined approximation in either axis without necessarily refining the other. A novel progressive streaming framework is proposed to study trade-offs between reducing resolution and precision during common analysis tasks. Using the wavelet transform, both resolution and precision-based reduction techniques can be modeled as different orderings of wavelet coefficient bits. This enables the first-ever demonstration that reducing both resolution and precision is significantly more effective than reducing either alone. It further allows the optimization of bit streams for specific analysis tasks, offering insights about how different tasks fetch data. Lastly, the new data model is adapted to encode unstructured particle data, a common form of data representation in scientific simulation or imaging. Borrowing techniques from the structured case, it’s shown that the odd-even decomposition can significantly reduce reconstruction error while maintaining a constant memory footprint. This results in novel particle hierarchies and optimal traversal heuristics that outperform traditional approaches in progressive decoding quality-cost trade-offs.

2023 journal paper

Progressive Tree-Based Compression of Large-Scale Particle Data

IEEE Transactions on Visualization and Computer Graphics

Duong Hoang Harsh Bhatia Peter Lindstrom Valerio Pascucci

PDF Code BibTeX

Scientific simulations and observations using particles have been creating large datasets that require effective and efficient data reduction to store, transfer, and analyze. However, current approaches either compress only small data well while being inefficient for large data, or handle large data but with insufficient compression. Toward effective and scalable compression/decompression of particle positions, we introduce new kinds of particle hierarchies and corresponding traversal orders that quickly reduce reconstruction error while being fast and low in memory footprint. Our solution to compression of large-scale particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error estimation heuristics can be supplied by the user. For low-level node encoding, we introduce new schemes that effectively compress both uniform and densely structured particle distributions.

2022 conference paper

High-Quality Progressive Alignment of Large 3D Microscopy Data

IEEE Symposium on Large Data Analysis and Visualization (LDAV)

Aniketh Venkat Duong Hoang Attila Gyulassy Peer-Timo Bremer
Frederick Federer Alessandra Angelucci Valerio Pascucci

PDF

Large-scale three-dimensional (3D) microscopy acquisitions fre-quently create terabytes of image data at high resolution and magni-fication. Imaging large specimens at high magnifications requires acquiring 3D overlapping image stacks as tiles arranged on a two-dimensional (2D) grid that must subsequently be aligned and fused into a single 3D volume. Due to their sheer size, aligning many overlapping gigabyte-sized 3D tiles in parallel and at full resolution is memory intensive and often I/O bound. Current techniques trade accuracy for scalability, perform alignment on subsampled images, and require additional postprocess algorithms to refine the alignment quality, usually with high computational requirements. One common solution to the memory problem is to subdivide the overlap region into smaller chunks (sub-blocks) and align the sub-block pairs in parallel, choosing the pair with the most reliable alignment to determine the global transformation. Yet aligning all sub-block pairs at full resolution remains computationally expensive. The key to quickly developing a fast, high-quality, low-memory solution is to identify a single or a small set of sub-blocks that give good alignment at full resolution without touching all the overlapping data. In this paper, we present a new iterative approach that leverages coarse resolution alignments to progressively refine and align only the promising candidates at finer resolutions, thereby aligning only a small user-defined number of sub-blocks at full resolution to determine the lowest error transformation between pairwise overlapping tiles. Our progressive approach is 2.6x faster than the state of the art, requires less than 450MB of peak RAM (per parallel thread), and offers a higher quality alignment without the need for additional postprocessing refinement steps to correct for alignment errors.

2022 journal paper

AMM: Adaptive Multilinear Meshes

IEEE Transactions on Visualization and Computer Graphics

★ Best Paper Award at PacificVis 2022

Harsh Bhatia Duong Hoang Nate Morrical
Valerio Pascucci Peer-Timo Bremer Peter Lindstrom

PDF Code BibTeX

Adaptive representations are increasingly indispensable for reducing the in-memory and on-disk footprints of large-scale data. Usual solutions are designed broadly along two themes: reducing data precision, e.g. , through compression, or adapting data resolution, e.g. , using spatial hierarchies. Recent research suggests that combining the two approaches, i.e. , adapting both resolution and precision simultaneously, can offer significant gains over using them individually. However, there currently exist no practical solutions to creating and evaluating such representations at scale. In this work, we present a new resolution-precision-adaptive representation to support hybrid data reduction schemes and offer an interface to existing tools and algorithms. Through novelties in spatial hierarchy, our representation, Adaptive Multilinear Meshes (AMM), provides considerable reduction in the mesh size. AMM creates a piecewise multilinear representation of uniformly sampled scalar data and can selectively relax or enforce constraints on conformity, continuity, and coverage, delivering a flexible adaptive representation. AMM also supports representing the function using mixed-precision values to further the achievable gains in data reduction. We describe a practical approach to creating AMM incrementally using arbitrary orderings of data and demonstrate AMM on six types of resolution and precision datastreams. By interfacing with state-of-the-art rendering tools through VTK, we demonstrate the practical and computational advantages of our representation for visualization techniques. With an open-source release of our tool to create AMM, we make such evaluation of data reduction accessible to the community, which we hope will foster new opportunities and future data reduction schemes.

2021 conference paper

High-Quality and Low-Memory-Footprint Progressive Decoding of
Large-scale Particle Data

IEEE Symposium on Large Data Analysis and Visualization (LDAV)

★ Best Paper Honorable Mention

Duong Hoang Harsh Bhatia Peter Lindstrom Valerio Pascucci

PDF Talk Code Slides BibTeX

Particle representations are used often in large-scale simulations and observations, frequently creating datasets containing several millions of particles or more. Due to their sheer size, such datasets are difficult to store, transfer, and analyze efficiently. Data compression is a promising solution; however, effective approaches to compress particle data are lacking and no community-standard and accepted techniques exist. Current techniques are designed either to compress small data very well but require high computational resources when applied to large data, or to work with large data but without a focus on compression, resulting in low reconstruction quality per bit stored. In this paper, we present innovations targeting tree-based particle compression approaches that improve the tradeoff between high quality and low memory-footprint for compression and decompression of large particle datasets. Inspired by the lazy wavelet transform, we introduce a new way of partitioning space, which allows a low-cost depth-first traversal of a particle hierarchy to cover the space broadly. We also devise novel data-adaptive traversal orders that significantly reduce reconstruction error compared to traditional data-agnostic orders such as breadth-first and depth-first traversals. The new partitioning and traversal schemes are used to build novel particle hierarchies that can be traversed with asymptotically constant memory footprint while incurring low reconstruction error. Our solution to encoding and decoding of large particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error heuristics can be supplied by the user. Through extensive experimentation, we demonstrate the efficacy and the flexibility of the proposed techniques when combined as well as when used independently with existing approaches on a wide range of scientific particle datasets.

2021 conference paper

Load-Balancing Parallel I/O of
Compressed Hierarchical Layouts

IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

Ke Fan Duong Hoang Steve Petruzza
Thomas Gilray Valerio Pascucci Sidharth Kumar

PDF BibTeX

Scientific phenomena are being simulated at ever-increasing resolution and fidelity thanks to advances in modern supercomputers. These simulations produce a deluge of data, putting unprecedented demand on the end-to-end data-movement pipeline that consists of parallel writes for checkpoint and analysis dumps and parallel localized reads for exploratory analysis and visualization tasks. Parallel I/O libraries are often optimized for uniformly distributed large-sized accesses, whereas reads for analysis and visualization benefit from data layouts that enable random-access and multiresolution queries. While multiresolution layouts enable interactive exploration of massive datasets, efficiently writing such layouts in parallel is challenging, and straightforward methods for creating a multiresolution hierarchy can lead to inefficient memory and disk access. In this paper, we propose a compressed, hierarchical layout that facilitates efficient parallel writes, while being efficient at serving random access, multiresolution read queries for post-hoc analysis and visualization. To efficiently write data to such a layout in parallel is challenging due to potential load-balancing issues at both the data transformation and disk I/O steps. Data is often not readily distributed in a way that facilitates efficient transformations necessary for creating a multiresolution hierarchy. Further, when compression or data reduction is applied, the compressed data chunks may end up with different sizes, confounding efficient parallel I/O. To overcome both these issues, we present a novel two-phase load-balancing strategy to optimize both memory and disk access patterns unique to writing non-uniform multiresolution data. We implement these strategies in a parallel I/O library and evaluate the efficacy of our approach by using real-world simulation data and a novel approach to microbenchmarking on the Theta Supercomputer of Argonne National Laboratory.

2021 journal paper, presented at IEEE VIS 2020

Efficient and Flexible Hierarchical
Data Layouts for a Unified Encoding of
Scalar Field Precision and Resolution

IEEE Transactions on Visualization and Computer Graphics

Duong Hoang Brian Summa Harsh Bhatia Peter Lindstrom
Pavol Klacansky Will Usher Peer-Timo Bremer Valerio Pascucci

PDF Talk Slides Code BibTeX

To address the problem of ever-growing scientific data sizes making data movement a major hindrance to analysis, we introduce a novel encoding for scalar fields: a unified tree of resolution and precision, specifically constructed so that valid cuts correspond to sensible approximations of the original field in the precision-resolution space. Furthermore, we introduce a highly flexible encoding of such trees that forms a parameterized family of data hierarchies. We discuss how different parameter choices lead to different trade-offs in practice, and show how specific choices result in known data representation schemes such as ZFP, IDX, and JPEG2000. Finally, we provide system-level details and empirical evidence on how such hierarchies facilitate common approximate queries with minimal data movement and time, using real-world data sets ranging from a few gigabytes to nearly a terabyte in size. Experiments suggest that our new strategy of combining reductions in resolution and precision is competitive with state-of-the-art compression techniques with respect to data quality, while being significantly more flexible and orders of magnitude faster, and requiring significantly reduced resources.

2019 journal paper, presented at IEEE VIS 2018

A Study of the Trade-off Between
Reducing Precision and Reducing Resolution
for Data Analysis and Visualization

IEEE Transactions on Visualization and Computer Graphics

Duong Hoang Pavol Klacansky Harsh Bhatia
Peer-Timo Bremer Peter Lindstrom Valerio Pascucci

PDF Slides Supplementary BibTeX

There currently exist two dominant strategies to reduce data sizes in analysis and visualization: reducing the precision of the data, e.g., through quantization, or reducing its resolution, e.g., by subsampling. Both have advantages and disadvantages and both face fundamental limits at which the reduced information ceases to be useful. The paper explores the additional gains that could be achieved by combining both strategies. In particular, we present a common framework that allows us to study the trade-off in reducing precision and/or resolution in a principled manner. We represent data reduction schemes as progressive streams of bits and study how various bit orderings such as by resolution, by precision, etc., impact the resulting approximation error across a variety of data sets as well as analysis tasks. Furthermore, we compute streams that are optimized for different tasks to serve as lower bounds on the achievable error. Scientific data management systems can use the results presented in this paper as guidance on how to store and stream data to make efficient use of the limited storage and bandwidth in practice.

2017 conference poster

Accelerating In-situ Feature Extraction
of Large-Scale Combustion Simulation
with Subsampling

The International ACM Symposium on High-Performance
Parallel and Distributed Computing (HPDC)

Sidharth Kumar Steve Petruzza Duong Hoang Valerio Pascucci

PDF BibTeX

The increasing gap between available compute power and I/O capabilities is driving more users to adopt in-situ processing capabilities. However, making the transition from post-processing to in-situ mode is challenging, as parallel analysis algorithms do not always scale well with the actual simulation code. This is observed because these algorithms either involve global reduction operations with significant communication overheads or have to perform an overwhelmingly large amount of computation. In this work we propose a software stack, designed specially to expedite parallel analysis algorithms, thus making it possible for them to work in in-situ mode. Our software stack comprises of data-reduction techniques, sub-sampling and compression, used to reduce the computation load of the analysis algorithm. For better approximation of data, we can also use wavelets instead of sub sampling. We demonstrate the efficacy of our pipeline using two analysis algorithms, parallel merge tree and isosurface rendering. We also study the trade-offs between the I/O gains and the corresponding error induced by the data reduction techniques. We present results with KARFS simulation framework run on the Tesla cluster of the Scientific computing and Imaging institute at the University of Utah.

2017 conference paper

Reducing Network Congestion and Synchronization Overhead During
Aggregation of Hierarchical Data

IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

Sidharth Kumar Duong Hoang (joint first author)
Steve Petruzza John Edwards Valerio Pascucci

PDF BibTeX

Hierarchical data representations have been shown to be effective tools for coping with large-scale scientific data. Writing hierarchical data on supercomputers, however, is challenging as it often involves all-to-one communication during aggregation of low-resolution data which tends to span the entire network domain, resulting in several bottlenecks. We introduce the concept of indexing templates, which succinctly describe data organization and can be used to alter movement of data in beneficial ways. We present two techniques, domain partitioning and localized aggregation, that leverage indexing templates to alleviate congestion and synchronization overheads during data aggregation. We report experimental results that show significant I/O speedup using our proposed schemes on two of today’s fastest supercomputers, Mira and Shaheen II, using the Uintah and S3D simulation frameworks.

2016 conference poster

Visualizing Publication Data

IEEE Visualization Conference (VIS)

Kui Wu Duong Hoang Alexander Lex

Best Project 2015 - University of Utah’s CS-5630 (Visualization)

PDF Video BibTeX

PubVis is an interactive tool to visualize connections between papers published over multiple years in a collection of papers. It provides users with a multi-faceted way to explore research topics of interest, spot their trends over time, and find high-impact papers and influential authors on the topics. Navigation is done through following citation relationships and shared-keyword relationships among papers, as well as collaboration relationships among authors. PubVis’s strength lies in its light abstraction model and ease of interaction, while being able to present a large amount of information to users at once.

2015 technical report

Interactive Analysis of Large Combustion
Simulation Datasets on Commodity Hardware

Technical report

Duong Hoang Paul Arias Giorgio Scorzelli
Peer-Timo Bremer Hong Im Valerio Pascucci

PDF Video BibTeX

Recent advances in high performance computing have enabled large scale scientific simulations that produce massive amounts of data, often on the order of terabytes or even petabytes. Analysis of such data on desktop workstations is difficult and often limited to only small subsets of the data or coarse resolutions yet can still be extremely time-consuming. Alternatively, one must resort to big computing clusters which are not easily available, difficult to program and with software often not flexible enough to provide custom, scalable analyses. Here we present a use case from turbulent combustion where scientist are interested in studying, for example, the heat release rate of turbulent flames in a simulation totaling more than 5.5 TB of data. This requires a non-trivial processing pipeline of extracting iso-surfaces, tracing lines normal to the surface and integrating the heat release for all 660 time steps to accumulate histograms and time curves. Using existing tools, this analysis even at reduced resolution, takes days and is thus rarely performed. Instead, we present a novel application based on light-weight streaming processing and fast multi-resolution disk accesses that enables this analysis to be done interactively using commodity hardware. In particular, our system allows scientist to immediately explore all parameters involved in the process, i.e. iso-values, integration lengths, etc., providing unprecedented analysis capabilities. Our application has already led to a number of interesting insights. For example, our results suggest using an adaptive iso-value and integration length for each time steps would lead to more accurate results. The new approach replaces a traditional batch-processing work flow, and provides a new paradigm in scientific discovery through interactive analysis of large simulation data.

2013 conference paper

Control of Rotational Dynamics
for Ground Behaviors

ACM SIGGRAPH/Eurographics Symposium
on Computer Animation (SCA)

David Brown Adriano Macchietto KangKang Yin Victor Zordan

I improved the paper's rendering system to add per-pixel lighting and ambient occlusion, which greatly improves realism, by highlighting ground contact shadows for the various rolling motions in this paper. I also ran experiments, and helped render some of the results for the paper.

This paper proposes a physics-based framework to generate rolling behaviors with significant rotational components. The proposed technique is a general approach for guiding coordinated action that can be layered over existing control architectures through the purposeful regulation of specific whole-body features. Namely, we apply control for rotation through the specification and execution of specific desired 'rotation indices' for whole-body orientation, angular velocity and angular momentum control. We account for the stylistic components of behaviors through reference posture control. The novelty of the described work includes control over behaviors with considerable rotational components as well as a number of characteristics useful for general control, such as flexible posture tracking and contact control planning.

2012 journal paper

Detail-Preserving Controllable Deformation from Sparse Examples

IEEE Transactions on Visualization and Computer Graphics

Haoda Huang KangKang Yin Ling Zhao
Yue Qi Yizhou Yu Xin Tong

This is the journal extension of a conference paper. I wrote all the code required to run new experiments, corrected flawed data, and generated the new results in this paper. I also implemented a new inverse kinematics method to reduce the number of input markers for training bone deformations, making the training faster.

Recent advances in laser scanning technology have made it possible to faithfully scan a real object with tiny geometric details, such as pores and wrinkles. However, a faithful digital model should not only capture static details of the real counterpart but also be able to reproduce the deformed versions of such details. In this paper, we develop a data-driven model that has two components; the first accommodates smooth large-scale deformations and the second captures high-resolution details. Large-scale deformations are based on a nonlinear mapping between sparse control points and bone transformations. A global mapping, however, would fail to synthesize realistic geometries from sparse examples, for highly deformable models with a large range of motion. The key is to train a collection of mappings defined over regions locally in both the geometry and the pose space. Deformable fine-scale details are generated from a second nonlinear mapping between the control points and per-vertex displacements. We apply our modeling scheme to scanned human hand models, scanned face models, face models reconstructed from multiview video sequences, and manually constructed dinosaur models. Experiments show that our deformation models, learned from extremely sparse training data, are effective and robust in synthesizing highly deformable models with rich fine features, for keyframe animation as well as performance-driven animation. We also compare our results with those obtained by alternative techniques.

2012 technical report

FingerWalk: Controlling Human Gait Animation using Finger Walk
on a Multi-touch Surface

Technical report

William Hutama Thai-Duong Hoang KangKang Yin Chi-Wing Fu

PDF BibTeX

This paper proposes analyzing finger touches on a conventional tablet computer to support locomotion control of virtual characters. Although human finger movements and leg movements are not fully equivalent kinematically and dynamically, we can successfully extract intended gait parameters from finger touches to drive virtual avatars to locomote in different styles. User studies with 4 participants show that their finger pattern walk for different gaits such as variants of walking (normal, fast, tip-toe) and running do not show significant variations in temporal profiles. This suggests that the finger walk gestures on multi-touch tablets may have the potential for more universal adoption as a convenient means for end users to control gait nuances of their avatar as they navigate in virtual worlds and interact with other avatars.

2012 journal paper, extended version of the MSSAO CGI'11 paper below

Efficient Screen-Space Approach to
High-Quality Multiscale Ambient Occlusion

The Visual Computer

Thai-Duong Hoang Kok-Lim Low

PDF Code BibTeX

We present a new screen-space ambient occlusion (SSAO) algorithm that improves on the state-of-the-art SSAO methods in both performance and quality. Our method computes ambient occlusion (AO) values at multiple image resolutions and combines them to obtain the final, high-resolution AO value for each image pixel. It produces high-quality AO that includes both high-frequency shadows due to nearby, occluding geometry and low-frequency shadows due to distant geometry. Our approach only needs to use very small sampling kernels at every resolution, thereby achieving high performance without resorting to random sampling. As a consequence, our results do not suffer from noise and excessive blur, which are common of other SSAO methods. Therefore, our method also avoid the expensive, final blur pass commonly used in other SSAO methods. The use of multiple resolutions also helps reduce errors that are caused by SSAO’s inherent lack of visibility checking. Temporal incoherence caused by using coarse resolutions is solved with an optional temporal filtering pass. Our method produces results that are closer to ray-traced solutions than those of any existing SSAO method’s, while running at similar or higher frame rates than the fastest ones.

2011 conference paper

Multi-Resolution Screen-Space
Ambient Occlusion

Computer Graphics International Conference (CGI)

Thai-Duong Hoang Kok-Lim Low

PDF Slides Code BibTeX

We present a new screen-space ambient occlusion (SSAO) algorithm that improves on the state-of-the-art SSAO methods in both speed and image quality. Our method computes ambient occlusion (AO) for multiple image resolutions and combines them to obtain the final high-resolution AO. It produces high-quality AO that includes details from small local occluders to low-frequency occlusions from large faraway occluders. This approach allows us to use very small sampling kernels at every resolution, and thereby achieve high performance without resorting to random sampling, and therefore our results do not suffer from noise and excessive blur, which are common of SSAO. We use bilateral upsampling to interpolate lower-resolution occlusion values to prevent blockiness and occlusion leaks. Compared to existing SSAO methods, our method produces results closer to ray-traced solutions, while running at comparable or higher frame rates than the fastest SSAO method.

This technique was used in Ubisoft's game Assassin's Creed III

2010 conference poster

Real-Time CSG Rendering
Using Fragment Sort

ACM Symposium on Virtual Reality Software and Technology (VRST)

Bo Peng Kok-Lim Low Thai-Duong Hoang

PDF BibTeX

We present a simple ray casting method to render a CSG product term. The primitives are input as polygon meshes. Our method uses the programmability on modern graphics hardware to capture at each pixel the depth values of all the primitives’ surfaces. By sorting these surface fragments in front-to-back order, we are able to find the CSG result at each pixel. Our method does not maintain additional fragment information, and deferred shading is used for the final rendering of the CSG result. Our method is general in that there is no special handling needed for intersection and subtraction operations, and non-convex primitives are treated equally as convex primitives.