Hardware-Assisted High-Efficiency Ray Casting of Unstructured Time-Varying Flows Using Temporal Coherence

Similar documents
Chain-linking and seasonal adjustment of the quarterly national accounts

Lab 10 OLS Regressions II

Normal Random Variable and its discriminant functions

Accuracy of the intelligent dynamic models of relational fuzzy cognitive maps

Improving Forecasting Accuracy in the Case of Intermittent Demand Forecasting

Deriving Reservoir Operating Rules via Fuzzy Regression and ANFIS

Fairing of Polygon Meshes Via Bayesian Discriminant Analysis

UNN: A Neural Network for uncertain data classification

Fugit (options) The terminology of fugit refers to the risk neutral expected time to exercise an

An Inclusion-Exclusion Algorithm for Network Reliability with Minimal Cutsets

The Financial System. Instructor: Prof. Menzie Chinn UW Madison

A valuation model of credit-rating linked coupon bond based on a structural model

Network Security Risk Assessment Based on Node Correlation

A Novel Particle Swarm Optimization Approach for Grid Job Scheduling

SOCIETY OF ACTUARIES FINANCIAL MATHEMATICS. EXAM FM SAMPLE SOLUTIONS Interest Theory

Batch Processing for Incremental FP-tree Construction

Section 6 Short Sales, Yield Curves, Duration, Immunization, Etc.

Online Technical Appendix: Estimation Details. Following Netzer, Lattin and Srinivasan (2005), the model parameters to be estimated

FITTING EXPONENTIAL MODELS TO DATA Supplement to Unit 9C MATH Q(t) = Q 0 (1 + r) t. Q(t) = Q 0 a t,

SkyCube Computation over Wireless Sensor Networks Based on Extended Skylines

Using Fuzzy-Delphi Technique to Determine the Concession Period in BOT Projects

Michał Kolupa, Zbigniew Śleszyński SOME REMARKS ON COINCIDENCE OF AN ECONOMETRIC MODEL

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation. Hongliang Yan 2017/06/21

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

Baoding, Hebei, China. *Corresponding author

EXPLOITING GEOMETRICAL NODE LOCATION FOR IMPROVING SPATIAL REUSE IN SINR-BASED STDMA MULTI-HOP LINK SCHEDULING ALGORITHM

PFAS: A Resource-Performance-Fluctuation-Aware Workflow Scheduling Algorithm for Grid Computing

American basket and spread options. with a simple binomial tree

ANFIS Based Time Series Prediction Method of Bank Cash Flow Optimized by Adaptive Population Activity PSO Algorithm

Tax Dispute Resolution and Taxpayer Screening

Correlation of default

Estimation of Optimal Tax Level on Pesticides Use and its

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Cryptographic techniques used to provide integrity of digital content in long-term storage

Methodology of the CBOE S&P 500 PutWrite Index (PUT SM ) (with supplemental information regarding the CBOE S&P 500 PutWrite T-W Index (PWT SM ))

Noise and Expected Return in Chinese A-share Stock Market. By Chong QIAN Chien-Ting LIN

Dynamic Relationship and Volatility Spillover Between the Stock Market and the Foreign Exchange market in Pakistan: Evidence from VAR-EGARCH Modelling

A Hybrid Method to Improve Forecasting Accuracy Utilizing Genetic Algorithm An Application to the Data of Operating equipment and supplies

Differences in the Price-Earning-Return Relationship between Internet and Traditional Firms

Unified Unit Commitment Formulation and Fast Multi-Service LP Model for Flexibility Evaluation in Sustainable Power Systems

Prediction of Oil Demand Based on Time Series Decomposition Method Nan MA * and Yong LIU

The UAE UNiversity, The American University of Kurdistan

A Novel Approach to Model Generation for Heterogeneous Data Classification

Albania. A: Identification. B: CPI Coverage. Title of the CPI: Consumer Price Index. Organisation responsible: Institute of Statistics

A Change Detection Model for Credit Card Usage Behavior

Time-domain Analysis of Linear and Nonlinear Circuits

The Virtual Machine Resource Allocation based on Service Features in Cloud Computing Environment

A Novel Application of the Copula Function to Correlation Analysis of Hushen300 Stock Index Futures and HS300 Stock Index

IFX-Cbonds Russian Corporate Bond Index Methodology

A Backbone Formation Algorithm in Wireless Sensor Network Based on Pursuit Algorithm

The Effects of Nature on Learning in Games

Optimal Combination of Trading Rules Using Neural Networks

A Hybrid Method for Forecasting with an Introduction of a Day of the Week Index to the Daily Shipping Data of Sanitary Materials

Bank of Japan. Research and Statistics Department. March, Outline of the Corporate Goods Price Index (CGPI, 2010 base)

Associating Absent Frequent Itemsets with Infrequent Items to Identify Abnormal Transactions

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

Floating rate securities

Lecture 11 Partial Differential Equations. Partial Differential Equations (PDEs). What is a PDE? Examples of Important PDEs. Classification of PDEs.

An improved segmentation-based HMM learning method for Condition-based Maintenance

Empirical analysis on China money multiplier

Cointegration between Fama-French Factors

Explaining Product Release Planning Results Using Concept Analysis

The Empirical Research of Price Fluctuation Rules and Influence Factors with Fresh Produce Sequential Auction Limei Cui

Financial Stability Institute

Privacy-preserving Top-K Query in Two-tiered Wireless Sensor Networks

Pricing and Valuation of Forward and Futures

OPERATIONS RESEARCH. Game Theory

Recall from last time. The Plan for Today. INTEREST RATES JUNE 22 nd, J u n e 2 2, Different Types of Credit Instruments

The Proposed Mathematical Models for Decision- Making and Forecasting on Euro-Yen in Foreign Exchange Market

Estimating intrinsic currency values

OCR Statistics 1 Working with data. Section 2: Measures of location

Online appendices from The xva Challenge by Jon Gregory. APPENDIX 14A: Deriving the standard CVA formula.

Multiagent System Simulations of Sealed-Bid Auctions with Two-Dimensional Value Signals

ESSAYS ON MONETARY POLICY AND INTERNATIONAL TRADE. A Dissertation HUI-CHU CHIANG

A Neural Network Approach to Time Series Forecasting

Some Insights of Value-Added Tax Gap

Open Access Impact of Wind Power Generation on System Operation and Costs

Terms and conditions for the MXN Peso / US Dollar Futures Contract (Physically Delivered)

DEA-Risk Efficiency and Stochastic Dominance Efficiency of Stock Indices *

SUMMARY INTRODUCTION. Figure 1: An illustration of the integration of well log data and seismic data in a survey area. Seismic cube. Well-log.

Macroeconomics II A dynamic approach to short run economic fluctuations. The DAD/DAS model.

Robust Classification of Remote Sensing Data for Green Space Analysis

Interest Rate Derivatives: More Advanced Models. Chapter 24. The Two-Factor Hull-White Model (Equation 24.1, page 571) Analytic Results

Financial Innovation and Asset Price Volatility. Online Technical Appendix

Quarterly Accounting Earnings Forecasting: A Grey Group Model Approach

Short-Term Load Forecasting using PSO Based Local Linear Wavelet Neural Network

Determinants of firm exchange rate predictions:

Optimum Reserve Capacity Assessment and Energy and Spinning Reserve Allocation Based on Deterministic and Stochastic Security Approach

Efficient groundwater pricing and intergenerational welfare: the Honolulu case

Numerical Evaluation of European Option on a Non Dividend Paying Stock

A New Method to Measure the Performance of Leveraged Exchange-Traded Funds

Keywords: School bus problem, heuristic, harmony search

Co-Integration Study of Relationship between Foreign Direct Investment and Economic Growth

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

A Multi-Periodic Optimization Modeling Approach for the Establishment of a Bike Sharing Network: a Case Study of the City of Athens

A Framework for Large Scale Use of Scanner Data in the Dutch CPI

Steganography in Inactive Frames of VoIP Streams Encoded by Source Codec

Optimal Fuzzy Min-Max Neural Network (FMMNN) for Medical Data Classification Using Modified Group Search Optimizer Algorithm

Documentation: Philadelphia Fed's Real-Time Data Set for Macroeconomists First-, Second-, and Third-Release Values

Pricing Model of Credit Default Swap Based on Jump-Diffusion Process and Volatility with Markov Regime Shift

Transcription:

Hardware-Asssed Hgh-Effcency Ray Casng of Unsrucured Tme-Varyng Flows Usng Temporal Coherence Qanl Ma, Lang Zeng, Huaxun Xu, Wenke Wang, Skun L Absrac Advances n compuaonal power are enablng hgh-precson numercal smulaons of unseady flows usng unsrucured grds. The dynamc ray casng echnque wh he ad of exure hardware can acheve hgh-accuracy volume renderng of unsrucured me-varyng daa from hese smulaons. However, he exsng approach does no pay enough aenon o emporal coherence, whch depresses he renderng rae. Besdes hs, he exure srucure used o sore he mesh daa resuls n a wase of GPU memory, whch lms he mesh scale of he renderng daa. Ths paper presens a hgh-effcency dynamc ray casng algorhm for renderng unsrucured me-varyng felds usng emporal coherence. Meanwhle, he pressure of GPU memory s effecually reduced by a well-desgned exure srucure. The analyss and expermens demonsrae ha our approach gans a much lower cos of boh me and space han he exsng mehod and allows renderng me-varyng daa on a larger mesh scale n real me. Index Terms Temporal coherence, unsrucured grds, me-varyng flows, GPU, ray casng I I. INTRODUCTION n he feld of CFD, unsrucured grds are wdely appled o solve 3D flows for a hgh-precson numercal smulaon. Advances n compuaonal power enable he smulaon of unseady flows ha produces me-varyng daa wh hundreds of me seps. Vsualzaon of hese unsrucured me-varyng daa offers he scenss powerful nsgh no he characersc of unseady flows and he relably of smulaon resuls. Volume renderng, whch s aken as he leadng and preferred mehod o vsualze 3D scalar felds, has many applcaons n flow vsualzaon[1,2,7,9,11,16]. However, s a challenge o render he unsrucured me-varyng volume daa n real me by reason ha: (1) volume renderng of even sac unsrucured-grd daa s expensve due o he large mesh scale and he complcaed opology, and (2) he dynamc (me-varyng) volume daa wh a large amoun of me seps (see Table 2) ncrease he dffculy n performng real-me renderng. The avalably of exure hardware suppor for Manuscrp receved December 8, 2010; revsed January 8, 2011. Ths work s suppored by he Naonal Basc Research Program (No. 2009CB723803) and he Naonal Scence Foundaon Program (No. 60873120) of Chna. All he auhors are wh he College of Compuer Scence and Technology, Naonal Unversy of Defense Technology, Chna. (e-mal: {maqanlemal, langzeng, huaxunxu, wenkewang, skunl}@gmal.com). Correspondng auhor: phn: +8613974936415, address: Team 7, College of Compuer Scence and Technology, Naonal Unversy of Defense Technology, Changsha 410073, Chna, e-mal: maqanlemal@gmal.com. volume renderng enables real-me vsualzaon of sac unsrucured-grd daa. The GPU-based ray casng (HRC)[1] and he Hardware-Asssed Vsbly Sorng (HAVS)[2] are wo of he fases volume renderng echnques usng exure hardware for sac unsrucured-grd daa. Recenly, Bernardon e al.[3] proposed an approach ha coupled a compresson scheme[4] wh hese wo echnques o render dynamc unsrucured-grd volume daa (we call hem he dynamc HRC and he dynamc HAVS). Then hey mproved he dynamc HAVS wh he ad of mulple processors[5]. However, hese approaches do no pay enough aenon o emporal coherence ha plays an mporan role n vsualzng me-varyng daa[7,8,9,10,11], whch depresses he performance. HAVS can render daa on a larger mesh scale (man memory scale) han HRC (GPU memory scale), whle HRC can lead o an mage wh hgher accuracy[2,3] whch s especally mporan for scenss o analyze he hgh-precson numercal resoluons. However, boh he sac and he dynamc HRC algorhms use a cell-based exure srucure o sore he whole mesh daa. Each cell exure ncludes all s verex daa alhough a cell verex s usually shared by a group of pon-neghborng cells. Ths cell-based layou resuls n an neffcen sorage snce many redundan verex daa are sored n GPU memory. Moreover, he number of he cells s much larger han ha of he verces for mos 3D unsrucured-grd daa from CFD smulaons[1,2,3,5,6], so he cell-based exure srucure ncreases he pressure of GPU memory even furher. Ths paper presens a novel dynamc ray casng algorhm o perform hgh-effcency renderng of unsrucured me-varyng daa usng emporal coherence wh he ad of exure hardware. Besdes hs, he pressure of GPU memory s effecually reduced by a well-desgned exure srucure. The analyss and expermens demonsrae ha our approach gans a much lower cos of boh me and GPU memory han he exsng mehod and acheves a real-me performance even for me-varyng daa on a large mesh scale. To summarze, he major conrbuons of hs paper are: We provde a mehod o qualavely analyze emporal coherence of boh he cell and he verex daa on unsrucured grds. Then he cell and he verex emporal ables are bul based on he analyss resul o acheve a lower me cos durng ray raversal. Takng he characersc of CFD unsrucured grds no accoun, we desgn a novel exure srucure ha separaes he verex daa from he cell daa o reduce he pressure of GPU

memory, allowng he sorage of a larger-scale daa se han he dynamc HRC. We propose o use 16 seps as a basc un for daa compresson whch enables a smarer codebook han he dynamc HRC, so ha he codebook can be loaded faser o avod renderng salls whle swchng codebooks. Moreover, snce here are wo codebooks (correspondng o consecuve 32-sep daa) n GPU memory a a momen, hey requre 32-b emporal ables whch can be ncely lad ou nsde he exures leadng o a compac and effcen sorage (dealed n Sec. V). II. RELATED WORK Research so far n me-varyng volume daa vsualzaon has prmarly ulzed emporal coherence for fas renderng daa on srucured grds[3,7,16]. To mprove he renderng performance, Shen[8] qualavely analyzed emporal coherence of each voxel on srucured grds and devsed a emporal herarchcal ndex ree for fas sosurface exracon n me-varyng felds. However, he ree does no manan he spaal localy of he voxels and can no be readly adoped for volume renderng. Shen[9] and Ellsworh[10] proposed a me-space paron (TSP) ree for a beer use of emporal and spaal coherence o acheve volume renderng of me-varyng scalar felds on srucured grds. They quanavely analyze emporal coherence of he subvolumes on each spaal level and only use he mean values of he subvolumes ha sasfy he emporal and spaal error olerance o perform renderng. As a resul, he amoun of daa requred o be loaded no he man memory s reduced. Ths enables he algorhm o render a large-scale me-varyng daa n real me. Ma[11] organzed he srucured me-varyng volume daa wh a group of ocrees and used emporal coherence o prune he branches for each ocree. Thus he demandng sorage space s reduced, makng possble o render me-varyng daa. Bernardon[3] compressed unsrucured me-varyng volume daa no several codebooks wh he vecor quanzaon(vq) approach[4]. Temporal coherence s used o gan a fas generaon of he codebooks. Because he compresson s done n a preprocessng sage, emporal coherence s no employed o save he me and space cos of he algorhm. In addon, an mporan dfference beween he sac and he dynamc HRC algorhms s he represenaon of he cell graden for reconsrucon purpose durng samplng. To reduce he usage of GPU memory, he dynamc HRC sores a graden marx [12] o compue he graden of he scalar feld n a cell (cell graden) on lne nsead of he pre-compued cell graden. III. TEMPORAL COHERENCE OF UNSTRUCTURED TIME-VARYING FLOWS Samplng s he major par of he ray casng algorhm. Durng samplng, HRC reconsrucs he feld a a sample wh he cell graden and a verex daa value[1,3,6,16]. Thus emporal coherence of he cell and he verex daa can be used o reduce he cos of samplng for hgh-effcency ray casng. To ulze emporal coherence, a mehod s presened o qualavely analyze he emporal coherence of he cell and he verex daa on unsrucured grds. In he preprocessng sage, he cell and he verex emporal ables are bul wh he ad of he analyss resul. Then hese emporal ables are used o reduce he me cos for samplng durng ray raversal. A. The span space The varaon of he cell exreme values over me can help o analyze emporal coherence of a cell[8]. The cell exreme values combned wh he maxmum and he mnmum among he whole verex daa values of he cell can be characerzed by he span space[13]. Snce erahedral meshes are he mos common forms of unsrucured grds, and oher ypes of unsrucured-grd cells can be effecually dvded no erahedra. Therefore we only consder erahedral meshes n he followng dscusson. For a erahedral cell, le S,,0 S,,1 S and,2,3 S be s four verex daa values a he h me sep. Then he maxmum value (denoed by mnmum value (denoed by S,mn S,max ) and he ) of cell a he h sep are obaned by S,max Max( S,0, S,1, S,2, S,3) and S Mn( S, S, S, S ) respecvely. In he span space, each,mn,0,1,2,3 cell s represened by a pon whose x coordnae represens s mnmum value and whose y coordnae represens s maxmum value. For a me-varyng feld, a cell has mulple correspondng pons n he span space, and each pon represens he wo exreme values of he cell a one me sep. Fg.1 shows an example of he span space of cell n he me nerval[0,15]. Fg. 1 The span space of cell n a me nerval [0,15] B. Cell emporal coherence and cell emporal able Gven a me nerval [, j ] (, j {0,1,..., n 1} and j ), a cell s emporal coherence s deermned by he spread of he cell s j 1 correspondng pons n he span space. The narrower he spread s, he lower emporal varaon and he sronger emporal coherence ha he cell has. To quanfy he spread, he lace subdvson scheme[14] s appled o he span space. The scheme subdvdes he span space no N N non-unformly spaced recangular elemens. The subdvson should ensure ha he pons are evenly dsrbued among he elemens. Fg.1 s an example of he lace subdvson of 8 8 lace elemens. Wh he ad of he lace subdvson, we can quanfy he spread wh K K lace elemens ( K {1,2,..., N}). A cell has srong emporal coherence n he me nerval [, j ] f s correspondng pons n hs nerval are locaed whn a spread of 2 2lace elemens. Usng hs srong emporal coherence condon, we can buld he cell s emporal (CT) able n he whole me nerval of a me-varyng feld. For a me-varyng feld wh n me seps, each cell has an n -b CT able wh bnary enres whose values are decded by he

followng prncple. Frs, we fnd a seres of consecuve subnervals (denoed by [0, n0 1], [ n0, n1 1],, [ nm 1, nm 1], [ nm, n 1] ) ha dvde he me nerval [0, n 1] no several pars. The dvson should make each subnerval nclude as many pons as possble as long as hey sasfy he srong emporal coherence condon. I guaranees ha he cell has srong emporal coherence whn each subnerval, and has weak emporal coherence beween wo consecuve subnervals. Then he cell s CT able can be creaed as shown n Fg. 2. Here, f he h b s flled wh 0, means ha he cell has srong emporal coherence beween he 1 h and he h me seps. Oherwse, means ha he cell has weak emporal coherence beween he wo seps. Fg. 2 A CT able n he me nerval[0, n 1] C. Verex emporal coherence and verex emporal able The verex emporal (VT) ables can be creaed from he CT ables. As menoned above, emporal coherence of a cell s characerzed by he varaon of he cell exreme values whch are he maxmum and he mnmum of he cell s verex daa values. I means ha f he cell has srong emporal coherence n he gven me nerval[, j ], each of s verces also has srong emporal coherence. In mos cases, comes o he concluson ha a verex has he same emporal able as he cell belongs o. However, a verex s usually shared by several cells. Consequenly, emporal coherence may be dfferen n srengh among hese pon-neghborng cells. In fac, here s usually srong spaal coherence among he neghborng cells by reason of he generaon scheme for 3D unsrucured grds[15,18]. I resuls n he smlar VT ables among he pon-neghborng cells. However, when here are dsconnuy phenomena (e.g., shock waves) n flows, he sae of he flud as descrbed by he densy, pressure and oher prmve varables can change radcally across he dsconnuy boundary. Ths also means ha spaal coherence wll be locally broken when a dsconnuy arses durng he developmen of an unseady flow, whch resuls n weak spaal coherence among he pon-neghborng cells near he dsconnuy boundary. To solve hs conflc, we spulae ha when here are wo or more pon-neghborng cells wh dfferen emporal coherence a he h me sep (correspondng o he h b of a CT able), he shared verex has weak emporal coherence wh a VT able whose h b s 1. Suppose cell 1 and cell 2 are pon-neghbors sharng verex v. Gven her CT ables {1000 0110 0011 1100} and {1000 0110 0010 0000}, he VT able of verex v s {1000 0110 0011 1100}. IV. TEMPORAL COHERENCE BASED DYNAMIC HRC ALGORITHM We devse a hgh-effcency dynamc ray casng algorhm for renderng unsrucured me-varyng daa usng emporal coherence. On each vewng ray, he algorhm does samplng once a cell durng ray raversal (he sample s a ray-cell nersecon) and ransfers he reconsrucon resul (he feld value a he sample) no color (RGBA) whch s accumulaed o he relevan pxel o form he mage. Here, we focus on usng emporal coherence o reduce he me cos of reconsrucon whch s he kernel par of samplng. The overvew of our algorhm s llusraed n Fg. 3. Gven he curren me sep and cell, bascally carres ou he followng seps: Sep 1: Compue he locaon of a new sample. Sep 2: Decompress he verex daa value(s) and compue he cell-graden. Sep 2.1: Evaluae he necessy of graden compuaon usng he CT able. If necessary, jump o Sep 2.3. Sep 2.2: Evaluae he necessy of daa decompresson for he reference verex of cell usng he relevan TV able. If necessary, do decompresson for he reference verex, oherwse jump o Sep 3. Fg. 3 Algorhm overvew (samplng for one cell) Sep 2.3: Evaluae he necessy of daa decompresson for all he verces of cell usng he VT ables and decompress he verex daa value(s). Then compue he cell-graden wh he graden marx[12] and he decompressed verex daa. Sep 3: Reconsruc he feld a he sample. Do reconsrucon usng he compued cell graden (or he cell graden a he prevous me sep) and he decompressed verex daa value (or he verex daa value a he prevous me sep) by he lnear graden reconsrucon mehod. Sep 4: Do color ransfer and accumulaon. A. Lnear graden reconsrucon mehod Fg. 4 Prncple of he lnear graden reconsrucon The locaon of he sample (ray-cell nersecon) can be obaned by usng radal-polyhedron nersecon [17]. Then he feld a he sample s reconsruced by he lnear graden reconsrucon mehod [6] (llusraed n Fg. 4) whch s employed by he sac and he dynamc HRC algorhms.

Suppose he nersecon S s he sample of he curren cell. Gven he sample locaon r S, he feld (denoed by Q S ) a he sample can be reconsruced by he followng lnear graden reconsrucon equaon: ( r r ) (1), S 0 S 0 where he vecor Q s he cell graden wh he hree componens Q x,, Q y, and Q z,, r0 and Q0 are respecvely he locaon and he daa value of a cell verex (called he reference verex). B. Temporal coherence based me-varyng daa reconsrucon Snce he sac HRC algorhm already uses exure memory o sore he daa, addng he me-varyng daa consume even more GPU memory. To reduce he memory consumpon, he dynamc HRC algorhm uses he compressedq 0 and he on-lne compued Q nsead of he orgnal Q0 and he pre-compued Q o perform reconsrucon. Ths does asss n reducng he pressure of GPU memory. However, he on-lne graden compuaon and daa decompresson make reconsrucon cos more me, whch depresses he renderng rae. To maxmze he rae, we use he CT and VT ables o accelerae graden compuaon and daa decompresson durng reconsrucon. Wh he CT able, we can evaluae he necessy of graden compuaon for reconsrucon of a new sample. Ths helps o reduce he mes of boh graden compuaon and daa decompresson. Gven he me sep and cell, f he h b of cell s CT able s 0, he graden of cell a he h sep (denoed by Q ) s approxmaely equal o he one a he 1 h sep (denoed by ). Thus he graden can be reused o perform reconsrucon a he h sep nsead of on-lne graden compuaon. Wh he VT able, we can evaluae he necessy of daa decompresson for he relevan verces. Ths can also help o accelerae he graden compuaon requrng he decompressed verex daa. Smlarly, f he h b of verex v k s VT able s 0, he daa value of verex v k a he h sep (denoed by a he 1 value k Q k ) s approxmaely equal o he one h sep (denoed by k ). Thus he daa can be reused o perform reconsrucon or graden compuaon a he h sep nsead of on-lne daa decompresson. V. HARDWARE-ASSISTED IMPLEMENTATION The mesh scale of he daa ha HRC can render s lmed by he capacy of GPU memory. Ths also means ha specal care mus be aken when choosng how o layou he daa nsde exures. Moreover, a remarkable dfference beween he srucured-grd and he unsrucured-grd daa s ha he number of he cells s much larger han ha of he verces for mos unsrucured-grd daa from CFD smulaons [1,2,3,5,6,15]. Keepng hs n mnd, we desgn a novel daa srucure so ha he me-varyng felds can be ncely lad ou and f n he exures o save GPU memory space, allowng he sorage of a larger mesh scale daa se. Our daa srucure separaes he verex daa from he cell daa n a dfferen manner from boh he sac and he dynamc HRC algorhms ha merge he verex daa wh he cell daa nsde he exures. Ths s very mporan for reducng he pressure of GPU memory. In addon, me-varyng daa wh a large amoun of seps make daa loadng (from he hard dsk o GPU memory) he boleneck of he volume renderng ppelne. We employ he same VQ approach[4] as he dynamc HRC does o compress he unsrucured me-varyng felds. An mporan dfference s he scheme of daa loadng. We propose 32 seps as a basc un (dfferen from he dynamc HRC usng 64 seps) for daa loadng whch need he emporal ables 32 bs n lengh and hus leads o a compac and effcen exure srucure (dealed n Sec. V) A. Daa compresson and managemen In he preprocessng sage, he VQ approach s employed o do he compresson. I dvdes he me-varyng daa no several groups, each of whch ncludes daa whn m consecuve me seps (where m s consdered o be a square number for smplcy). Then he daa n each group are compressed no a codebook (packed wh 2D exures). Durng renderng, he codebook s loaded no GPU memory and accessed by s wo ndces for daa decompresson. To avod renderng salls whle loadng he codebook, he frs wo codebooks (correspondng o he frs wo groups of daa) and he 32-b emporal ables are loaded no GPU memory a he begnnng of renderng. Afer he las me sep daa of he frs codebook are accessed, he exure references are swapped o he second one whch s already n GPU memory. The renderng process connues, whle he nex codebook and emporal ables are loaded n place of he frs ones, so ha he exure daa of he nex me sep can be prepared before s requred. The dynamc HRC algorhm uses 64 seps ( m 64 ) per group as a basc un for daa loadng. Each codebook uses 72KB 256 64 4B+256 8 4B [3]. Insead, we propose 16 seps per group ( m 16 ) whose codebook uses 20KB 256 16 4B+256 4 4B. Fg. 5 shows he layou of he codebook exure. Ths mporan change brngs hree man advanages. Frs, reduces usage of GPU memory snce he me of renderng 16-sep me-varyng daa s enough o perform loadng of he nex group daa. Second, here are always 32-sep daa be n GPU memory a a momen (here are wo codebooks correspondng o wo consecuve groups n GPU memory a a momen) ha need a 32-b emporal ables and hus leads o a compac and effcen exure srucure (dealed n Sec. V.B). Thrd, compared o he 64-sep daa per group, he 16-sep daa can be compressed no a smarer codebook leadng o faser daa loadng whch can help o avod renderng salls whle swchng codebooks. Fg. 5 Layou of he codebook exure for 16-sep daa B. GPU exure srucure As menoned above, he exure srucures of boh he sac and he dynamc HRC algorhms merge he cell daa wh he

verex daa and use he cell as a basc un o sore he felds[1,3]. They sore he locaons and he feld values (or he codebook ndces) of he cell s verces and he cell graden ogeher n each cell exure. However, hs exure srucure s exravagan for HRC for a large amoun of verex daa redundanly sored n GPU memory. To reduce he memory consumpon, a exure srucure s desgned o separae he verex daa from he cell daa as shown n able 1. The cell and he verex exures respecvely nclude he CT and he VT ables wh a lengh of 32 bs (see he green par). The CT able s used o evaluae he necessy of graden compuaon durng reconsrucon. If no necessary, he graden (cell s graden a he ( 1) h sep) can be reused o perform reconsrucon a he h sep. So he cell graden (12B) should be sored n he cell exure and be updaed wh he lapse of me (see he red par n Table 1.(a)). I s combned wh he 32-b CT able (4B), ncely fng n a exure vecor (16B), whch leads o a compac and effcen exure srucure. Besdes hs, he graden marx[12] n he dynamc HRC s employed for on-lne compuaon of he cell graden. Therefore, he marx should be sored n he cell exure (64B). In addon, he exure coordnaes of he relevan verces (for buldng he relaonshp beween a cell and s verces) and he face-neghborng cells (for ray raversal) should also be sored n he cell exure. Smlarly, we use he VT able o evaluae he necessy of daa decompresson. So he verex daa value a he prevous me sep (denoed by ) should be sored n he verex N exure and be updaed wh he lapse of me (see he red par n Table 1.(b)). Besdes hs, he verex exure should sore he locaon of he verex (12B) o compue he sample locaon [17] and he cell graden. We combne wh N (4B) o form a exure vecor. To decompress he verex daa value, we use 12B o sore he codebook ndces whch are combned wh he 32-b VT able jus o form a exure vecor. Table 1 GPU exure srucure used n our algorhm (a) Cell exure (for one cell) (b) Verex exure (for one verex) he erahedral mesh. Then he sorage of he mesh daa s gven by c 144B v 32B. For 16-sep daa per group, he codebook akes up he sorage of 20KB (menoned n Sec. V.A). A a momen, here are wo codebooks correspondng o wo consecuve groups (32 seps) n GPU memory. As a resul, he space cos of our approach s gven by c 144B v 32B 40KB. The dynamc HRC[3], whch combnes he cell daa wh he verex daa and use he cell as a basc un o sore he mesh daa, coss 192B sorage per erahedron. Snce uses 64-sep daa per group, a a momen, he codebooks of wo groups uses 144KB 2 72KB. So he space cos of he dynamc HRC s gven by c 192B 144KB. As menoned above, for mos 3D unsrucured-grd daa from CFD smulaons, he number of he cells s much larger han ha of he verces. As a resul, our approach acheves a lower cos of GPU memory han he dynamc HRC, whch allows he sorage of dynamc daa on a larger mesh scale. Moreover, from he expermenal resuls (Sec. VI), s easy o fnd ha he renderng rae can be consderably mproved by usng emporal coherence of me-varyng flows. Table 2 Comparsons of he renderng raes beween our approach and he Dynamc HRC VI. EXPERIMENTS Our algorhm s mplemened on Red Had Enerprse Lnux 5 wh an nvidia GeForce GTS 250 graphcs card (1024MB) and a 2.67GHz Inel Core 7 920 processor (2048MB RAM). To es he valdy of our approach, we render he followng daa from CFD smulaons by our algorhm and he dynamc HRC. Table 2 shows he comparsons of he performances beween hese wo algorhms. The expermenal resuls demonsrae ha our approach gans a much hgher renderng rae and allows renderng me-varyng daa on a larger mesh scale han he dynamc HRC. A. Forward sep shocks The flow of forward sep shocks s a classc unseady flow n he wnd unnel expermens. The ulrasonc flow comes from lef and form an arched shock before he sep (see Fg. 6). The shock s refleced back from op o boom for s grea srengh. Afer hree mes of reflecon, he fnal sae of he unseady flow forms as shown n Fg. 6. C. Analyss of he space cos Wh our exure srucure, he daa sored per cell use 144B = 9 16 B, and he daa sored per verex use 32B = 2 16 B. Suppose here are c cells and v verces n Fg. 6 The fnal sae of he forward sep shocks The me-varyng daa from he smulaon of he forward sep shocks are rendered by our approach. The user s allowed

o slow down or pause he dynamc renderng for furher analyss of he felds. Fg.7 shows he renderng resuls (pressure felds) of some mporan seps when he flow pauses. B. Pchng NACA 0012 arfol Fg. 8 dsplays he renderng resuls of he me-varyng densy felds from he smulaon of he unseady ransonc flow pas a pchng NACA 0012 arfol. Ths s a benchmark case ha ncludes hundreds of me seps, some of whch are shown here. C. Supersonc arcraf Fg. 9 shows he renderng resuls ( u velocy felds) of he flow felds around a supersonc arcraf. The flow rounds he arcraf and develops no complcaed swrlng vorces a he al. The me-varyng daa on a large mesh scale of 892K cells and 207K verces can no be rendered wh he dynamc HRC due o memory lmaons of sorng he mesh on GPU. challengng problem n flow vsualzaon. To maxmze he renderng rae, emporal coherence of he me-varyng daa should be effecvely ulzed. However, research so far has prmarly ulzed emporal coherence o render me-varyng daa on srucured grds. In hs paper, we devse a scheme for usng emporal coherence o acheve hgh-effcency volume renderng of dynamc unsrucured-grd daa. We choose o perform renderng on he framework of he ray casng echnque by reason of s hgh accuracy, whch s especally mporan for flow vsualzaon. Unforunaely, he mesh scale of he daa ha GPU-based ray casng algorhm can render s lmed by he capably of exure memory. To make full use of GPU memory, a exure srucure s desgned o separae he verex daa from he cell daa, whch allows renderng me-varyng daa on a larger mesh scale. The expermens demonsrae ha our approach acheves a much hgher performance on boh me and space, and allows renderng a larger mesh-scale me-varyng daa han he exsng mehod. VII. CONCLUSION Volume renderng of dynamc unsrucured-grd felds s a (a) Sep 45 (b) Sep 90 (c) Sep 135 (d) Sep 170 (e) Sep 210 (f) Sep 240 Fg. 7 Renderng resuls of dfferen me seps usng our approach (forward sep shocks) (a) Sep 120 (b) Sep 150 (c) Sep 180 (d) Sep 210 (e) Sep 240 (f) Sep 270 (g) Sep 300 (h) Sep 330 Fg. 8 Renderng resuls of dfferen me seps usng our approach (pchng NACA 0012 arfol)

(a) Sep 30 (b) Sep 90 (c) Sep 120 (d) Sep 180 (e) Sep 230 (f) Sep 290 Fg. 9 Renderng resuls of dfferen me seps usng our approach (supersonc arcraf) REFERENCES [1] F.F. Bernardon, C.A. Pago, J.L.D. Comba, C.T. Slva. Gpu-based led raycasng usng deph peelng. Journal of Graphcs Tools, 2006, 11(4): 1 16. [2] S.P. Callahan, M. Iks, J.L.D. Comba, C.T. Slva. Hardware-Asssed Vsbly Sorng for Unsrucured Volume Renderng. IEEE Transacons on Vsualzaon and Compuer Graphcs, 2005, 11(3): 285 295. [3] F.F. Bernardon, S.P. Callahan, J.L.D. Comba, C.T. Slva. Volume renderng of me-varyng scalar felds on unsrucured meshes. Techncal Repor UUSCI-2005-006, SCI Insue, 2005. [4] J. Schneder, R. Wesermann. Compresson doman volume renderng. In Proceedngs of IEEE Vsualzaon 2003: 293-300. [5] F.F. Bernardon, S.P. Callahan, C.T. Slva. An adapve framework for vsualzng unsrucured grds wh me-varyng scalar felds. Parallel Compung 2007, 33(6):391-405. [6] M. Weler, M. Kraus, M. Merz, T. Erl. Hardware-based ray casng for erahedral meshes. In Proceedngs of IEEE Vsualzaon 2003: 333 340. [7] K.-L. Ma. Vsualzng me-varyng volume daa. Compung n Scence and Engneerng, 2003, 5(2): 34 42. [8] H.-W. Shen. Isosurface exracon n me-varyng felds usng a emporal herarchcal ndex ree. In Proceedngs of IEEE Vsualzaon 1998: 159 166. [9] H.-W. Shen, L.-J. Chang, and K.-L. Ma. A fas volume renderng algorhm for me-varyng feld usng a me-space paronng (sp) ree. In Proceedngs of IEEE Vsualzaon 1999: 371 377. [10] D. Ellsworh, L.-J. Chang, H.-W. Shen. Accelerang me-varyng hardware volume renderng usng sp rees and color-based error mercs. In Proceedngs of Volume Vsualzaon Symposum 2000: 119 128. [11] K.-L. Ma, H.-W. Shen, Compresson and Acceleraed Renderng of Tme-Varyng Volume Daa. Inernaonal Compuer Symposum Workshop on Compuer Graphcs and Vrual Realy, 2000: 82 89. [12] C. Lurg, R. Grosso, T. Erl. Implc Adapve Volume Ray Casng. In Proceedngs of he Inernaonal Conference on Compuer Graphcs and Vsualzaon 1997: 114 120. [13] Y. Lvna, H.-W. Shen, C.R. Johnson. A near opmal sosurface exracon algorhm usng he span space. IEEE Transacons on Vsualzaon and Compuer Graphcs, 1996, 2(1): 73-84. [14] H.-W. Shen, C.D. Hansen, Y. Lvna, C.R. Johnson. Isosurfacng n span space wh umos effcency(issue). In Proceedngs of IEEE Vsualzaon 1996: 287 294. [15] Dmr J. Mavrpls. Unsrucured-mesh dscrezaons and solvers for compuaonal aerodynamcs. AIAA Journal, 2008, 46(6): 1281-1298. [16] C.T. Slva, J.L.D. Comba, S.P. Callahan, F.F. Bernardon. A survey of GPU-based volume renderng of unsrucured grds, Brazlan Journal of Theorec and Appled Compung, 2005, 12(2): 9 29. [17] Schneder, P. J., Eberly, D. H.: Geomerc Tools for Compuer Graphcs. Morgan Kaufmann, 2003: 9-16. [18] Joe F. Thompson, B. K. Son, N. P. Weaherll. Handbook of grd generaon, CRC Press, 1999: 693-701.