You are on page 1of 8

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

Proceedings of the 38th National Conference on Fluid Mechanics and Fluid Power December 15-17, 2011, MANIT, Bhopal CFD-17

PERFORMANCE STUDY OF A PARALLELIZED LEVEL-SET METHOD BASED 3D TRANSIENT SOLVER ON VARIOUS TWO-PHASE FLOW PROBLEMS
Vishesh Aggarwal Department of Mechanical Engineering, Indian Institute of Technology Bombay Mumbai 400 076 Email: vishesh.a@iitb.ac.in Vinesh H. Gada Department of Mechanical Engineering, Indian Institute of Technology Bombay Mumbai 400 076 Email: vinesh_gada@iitb.ac.in Atul Sharma* Department of Mechanical Engineering, Indian Institute of Technology Bombay Mumbai 400 076 Email: atulsharma@iitb.ac.in

ABSTRACT A level-set method based two-phase flow solver is parallelized using a unidirectional domain decomposition approach. It employs a finite volume formulation for discretizing the conservation equations and a finite difference formulation for discretizing the level-set advection equation, over a staggered grid in Cartesian/cylindrical co-ordinates. The domain is mapped over a distributed memory parallel architecture using domain decomposition, with overlapping boundary cells which exchange data using MPI. The parallel code is validated against a strategic set of test cases (ranging from laminar pipe flow to film boiling) which are also used to quantify the parallel performance of the code across a range of problems. The parallel code is run on a 64-bit Xeon cluster for up to 16 processors. Numerical predictions from the parallelized code bear an excellent agreement with those from the serial code, with parallel efficiencies ranging up to 99%. Keywords: Level-Set Method, Two-Phase Flow, MPI, Parallel Speedup, Domain Decomposition INTRODUCTION The need to keep computational time within practical time-frames (particularly for multiphase flows), coupled with an easy access to parallel computing hardware, has given impetus to parallel implementation of CFD solvers. This work is motivated towards parallelizing an existing serial two-phase flow solver for a distributed memory parallel architecture. From a literature survey, it is found that majority parallel solvers are implemented and tested over single phase flow problems. Fewer studies have delved into applying these techniques to simulate multiphase flows, as shown in Table 1. In all of these studies, the parallel speedup has been addressed by varying grid sizes on a particular problem. However, this may not be sufficient to demonstrate the complete capability of a parallelization method. The parallel speedup, particularly in multiphase problems, has a bearing not only on the phenomenon under consideration but also on the physical properties of the interacting fluids. For example, a higher density ratio of the two fluids results in a stiffer coefficient matrix of the pressure Poisson equation. This increases the overall computation time to convergence, which in turn may affect the parallel performance, either adversely or favorably. Moreover, few studies have explicitly evaluated the order of communication and idle times spent by each processor and its effect on parallel speedup. The present study employs a novel technique for the level-set method, as discussed by Gada and Sharma, 2011. The parallelization is implemented using a single directional

* Address all correspondence to this author

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

domain decomposition, which incurs minimum modification in the corresponding serial code. Pseudo boundary cells are created on each partitioned sub-domain which exchange data across processors using MPI (Message Passing Interface). We evaluate the scalability of the proposed method over various two-phase flow problems, each being tested on 1, 2, 4, 8 and 16 processors. A preliminary single phase flow problem is also tested, which forms the basis for comparing performance across the different two-phase problems. Each test case is chosen to employ a different combination of solvers

and/or fluid properties. The range of test cases serves two motives. First, it aids in investigating the effect of problem stiffness on scalability. Second, it helps to trace the limitation on parallel performance for a particular problem to a bottleneck in the scheme of solvers, which include Navier-Stokes (velocity prediction and pressure projection), level-set and energy equation. It can be further related to the percentage of inter-processor communication time for each of the solvers. Thus, besides evaluating scalability, such a study illuminates the potential areas for improvement.

Table 1. Summary of literature review on distributed memory MPI based parallel two-phase flow solvers
Authors George and Warren, 2002 Sussman, 2005 Wang et al., 2006 Fortmeier and Bucker, 2010 Hajihashemi and Shenawee, 2010 Agbaglah et al., 2011 Fortmeier and Bucker, 2011 Zuzio and Estivalezes, 2011
a

(Np)max 24 16 64 256 400 512 128 256

Problems tested for parallel speedup Dendritic growth wobbly bubble Dendritic growth Bubble rise in quiescent fluid Reconstruction of star, ellipse, cylinder shapes Lid driven cavity Re-initialization of cube slices, sphere Damped surface wave oscillation

Time criteria used in evaluating parallel speedup Total run times Average run time per time step Run time for 500 time steps Run time for a single time step Total run times Run time for 100 time steps Total run times Average time per iteration

2D/3D 3D 3D 2D 3D 2D 3D 3D 2D

Numerical methoda Phasefield CLSVOF LS LS LS VOF LS LS

LS Level set; VOF volume-of-fluid; CLSVOF Combined LS and VOF

PHYSICAL DESCRIPTION OF TEST PROBLEMS AND CODE VALIDATION The test problems considered in this study are enlisted in Table 2. The grid sizes are selected such that the ratio of cells involved in MPI to the total number of cells per sub-domain is nearly the same across different test cases. This normalizes the effect of communication overheads on parallel performance across the set of problems, which would have otherwise induced a bias in the comparison. Within each

test case, critical numerical parameters (such as the grid size, time step, final time and user specified error tolerances) are kept identical for both serial and parallel codes. Single-phase Flow in a Pipe This problem is executed considering two sub-cases, 1A: Hydrodynamically developing isothermal flow and 1B: Hydrodynamically and thermally developing flow in a pipe maintained at a constant wall temperature.

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

Table 2. Test cases used for validating 3D transient parallel solver and comparing parallel performance
Case 1A 1B 2 3A 3B 4 5
b

Description Single-phase isothermal flow in a pipe Single-phase non-isothermal flow in a pipe Two-phase stratified flow in a pipe Rise of n-butanol bubble in quiescent water Rise of an air bubble in a quiescent liquid Jet formation in quiescent water Film boiling over a flat surface

Solvers Invokedb NS NS+EE NS+LS NS+LS (with ST) NS+LS (with ST) NS+LS (with ST) NS+LS+EE+PC (with ST)

Grid Size 3092306 3092306 3092306 15442354 6842354 11232354 6666290

% grid using MPIc 20.9 20.9 20.9 18.2 18.2 18.2 22.2

NS - Navier-Stokes solver, LS - level-set advection and re-initialization, EE - energy equation solver, PC - phase change related modules, ST - surface tension source term
c

Evaluated for Np = 16

Uniform flow and temperature conditions are applied at the inlet, with Re = 50 and Pr = 0.7. A pipe length to diameter ratio L/D = 5 is taken which allows the flow to reach a fully developed condition near the exit. The domain is discretized using a cylindrical grid of 3092306. In the fully developed region, friction factor is obtained as 1.286 and 1.279, while the Nusselt number is 3.677 and 3.678 for Np = 1 and 16, respectively. They are in excellent agreement with the analytical values of f = 1.28 and Nu = 3.657. Two-phase Stratified Flow in a Pipe Instead of single-phase flow considered in case 1, here, a two-phase stratified flow in a pipe is simulated. A uniform velocity condition, with an ideally flat interface, is assumed to exist at the inlet. Further, a hold-up ratio (the ratio of flow area occupied by the lighter fluid to the total flow area) equal to 0.5 is taken at the inlet. The two fluids are assumed immiscible and having a density ratio, = 1 and viscosity ratio, = 5.326. Similar to case 1, we take L/D = 5 and Re1 = 50. The analytical solution and numerically predicted iso-contours of w-velocity at the pipe exit are compared in Fig. 1. The numerical value of wmax (obtained along the vertical line of symmetry) agrees within 2% of its analytical

value. Further, the variation of wmax is within 0.5% across 1 to 16 processors.

Figure 1. Comparison of analytical and numerical w-velocity iso-contours in fully developed twophase stratified flow (case 2, Np = 16)

Bubble Rise in a Quiescent Liquid Column Here, we consider two sub-cases of bubble rise in a stagnant liquid, each with a different fluid combination. A higher density ratio demands a lower time step and also increases the stiffness of pressure Poisson equation. It is conjectured that this may affect the proportion of communication overheads in a parallel run. Therefore, such a comparison can narrow down the target areas for improvement in parallel performance.

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

Case 3A represents the rise of n-butanol (fluid 2) droplet in water (fluid 1), while case 3B deals with the rise of an air (fluid 2) bubble in a liquid (fluid 1). Fluid properties are taken as 1 = 986.51, 2 = 845.44, 1 = 1.3910-3, 2 = 3.2810-3, = 1.6310-3 for case 3A; and 1 = 875.5, 2 = 1, 1 = 0.118, 2 = 110-3, = 32.210-3 for case 3B. Initially, the bubble is assumed to be perfectly spherical and at rest inside a cylindrical domain. A drop diameter (Db) of 0.002m and 0.0122m, with length scales of Db and 0.5Db, are taken for cases A and B, respectively. The velocity scales are taken as 0.058m/s and 0.215m/s for the respective cases. The non-dimensional domain size for both the cases is taken as L = 15 and D = 6. Free slip boundary condition is applied on the side and bottom walls, while outflow condition is applied at the top wall of the domain. Figure 2 shows an excellent agreement between the present and published results for the instantaneous bubble shapes. For case A, the terminal velocity is obtained as u/uc = 0.991, which is within 0.8% of that reported by Bertakis et al., 2010. For case B, the terminal velocity reaches a steady value of u/uc = 0.933. The results published by Sussman and Smereka, 1997 are with the far field boundary condition on the side walls, which is reported to give a terminal velocity higher by about 9% when compared to the free slip boundary condition used here. Jet Formation in Quiescent Water Unlike the previous test case, jet formation ensures a more uniform interface presence throughout the domain, thereby distributing the computational burden more uniformly across the partitioned sub-domains. This is conjectured to affect the parallel performance favorably. Here, we simulate the breakup of a paraffinkerosene (fluid 2) jet injected vertically upwards into stagnant water (fluid 1), which is similar to the system 3-2 of Kitamura et al., 1982. Fluid properties are taken as 1 = 998, 1 = 1.0310-3, 2 = 848, 2 = 1.8810-2, = 40.410-3. Nozzle injection diameter (Db) is taken as 0.122m.

(a)

(b) Figure 2. Evolution of rising bubble shapes (case 3, Np = 16); (a) Rise n-butanol bubble in water (b) Rise of air bubble in liquid (dotted pattern represents the results from present study, superimposed on those reported by Sussman and Smereka, 1997)

Taking Lc = Db and uc = 0.35m/s (average jet injection velocity), we get Re1 = 414, We = 3.7, Fr = 3.2. Further, the non-dimensional domain size is taken as L = 40 and D = 13. The injection velocity mimics a fully developed velocity profile. No-slip boundary conditions are applied on the side and the bottom walls. Figure 3 shows the predicted flow pattern. The diameter of droplets, averaged from those between 20D to 35D, is 3.32 whereas the jet breakup length is 5.3. These results compare within 4% and 21% of the published results (Kitamura et al., 1982), respectively. Film Boiling over a Flat Surface A film boiling problem applies all the solvers employed in the present code and aids in complete evaluation of the parallel performance. The problem consists of a liquid pool with a thin vapour film present over the bottom surface.

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

Tryggvason, 2004; the vapour film is initially perturbed using Eq. (2).
d3

1 2 x cos 5 d3

cos

2 y
d3

(2)

Figure 3. Jet pattern evolution (case 4, Np = 16)

A constant wall superheat is imposed on the bottom surface. The minimum domain size required to capture the phenomenon is dictated by the most dangerous Taylor wavelength for a three-dimensional simulation, d3 (Esmaeeli and Tryggvason, 2004), evaluated using Eq. (1).

Figure 4 shows the interface shapes at = 50 and 100. The temporal variation of surfaceaveraged Nu on the superheated surface is shown in Fig. 5. The time-averaged value of Nu from = 110 to 300 is 1.459. The deviation is 13.6% and 16.5% compared to the average Nu calculated from correlations given by Berenson, 1961; Son and Dhir, 1998; respectively. Similar deviations in the numerically predicted values of Nu have been reported in literature.

d3

2 2

3 g
1 2

(1)

A domain size of 0.5d30.5d3d3 with a grid size of 6666130 is selected to benchmark the numerical results, whereas a domain of 0.5d30.5d32.25d3 with a grid size of 6666290 is employed to compare the parallel performance. While the former is sufficient to capture the phenomena, the latter maintains the ratio of MPI cells to interior cells similar to the other cases. The computational domain considered here is a quarter of the complete domain (d3d3d3) shown in Fig. 4. Thus, this quarter domain captures a quarter of the bubbles released in the node and anti-node modes on the pair of diagonally opposite corners. The characteristic length scale is taken equal to the capillary length, Lc = [ /g(1-2)]1/2, and the characteristic velocity scale, uc = (gLc)1/2. The property ratios are taken as = 0.603, = 0.693, = 0.987, = 1.615. The governing parameters are obtained as Re1 = 18.81, Pr1 = 2.79, Fr = 1, We = 1.06 and Ja = 0.57. Similar to the initialization method adopted by Esmaeeli and

Figure 4. Interface evolution in film boiling, with bubble formation at node and anti-node locations

Figure 5. Temporal variation of surface averaged Nusselt number (case 5, Np = 16)

PARALLEL PERFORMANCE Each test case is run on a 20 node cluster, with each node having 2Gb memory and 8 dual core Intel Xeon 2.4GHz processors. The code is compiled using the C++ library of MPICH2.

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

The parallel speedup (Sn) and the parallel efficiency (En) are defined by Eq. (3) and (4), respectively. Ideally, the parallel speedup should equal the number of parallel processors.

Sn

Computation time with N p Computation time with N p Sn En 100% n

1 n

(3) (4)

Figure 6(a) and (b) show the parallel performance obtained in cases 1A, 1B, 2 and 5. The efficiency for case 1A is similar to that reported by Johnson and Cross, 1991; within a range of 70 - 78% for up to 16 processors. Overall, case 1B, which is essentially the inclusion of energy equation solver, is found to give a higher speedup compared to case 1A. On the other hand, the addition of level-set module to the problem, in case 2, is found to give similar levels of speedup as its corresponding case 1A. Thus, invoking the energy solver is found to improve the parallel performance while adding the level-set solver does not produce a significant effect. For test case 5, the parallel

efficiency falls off with increasing processors at a very rapid rate, unlike any other case reported. This can be attributed to two effects: first, the increased size of problem in terms of memory requirement; second, the use of phase change solver. The phase change solver increases the burden of communication overhead with increasing processors. Figure 6(c) and (d) show the parallel performance for test cases 3 and 4. Between cases 3A and 3B, case 3B is found to give a higher overall performance. The higher density ratios tackled in problem 3B can be attributed with this improvement. For the problem setup same as case 3A, Fortmeier and Bucker, 2010, have reported a speedup of about 85% and 77% in the range of 8 to 16 processors using their Scat strategy (one MPI per processor of a node) of MPI allocation. The method adopted here is capable of corresponding efficiencies of 72% and 78% without imposing any restrictions on the placement of MPI. Case 4 gives a higher speedup for 2 and 8 processors and an abruptly low value of speedup for 4 and 16 processors compared to the bubble

Figure 6. Parallel performance (a) speedup and (b) efficiency on cases 1,2,5; (c) speedup and (d) efficiency on cases 3,4. Note that (a), (c) are plotted on log-log axis; whereas (b), (d) are plotted on semi-log axis

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

rise test case 3A (even though the density ratio of the two phases are nearly same in the cases 3A and 4). This abrupt nature of performance for test case 4 is due to an increased number of iterations taken by Np = 4 and 16 compared to those taken by Np = 1, 2 and 8. Therefore, the denominator of Eq. (3) has peaks for Np = 4 and 16, which results in very low values of S4 and S16. While parallel efficiency shows a generally reducing trend with increasing number of processors, it is seen that cases 1A, 2 and 3A show a marginal improving trend at E16. This improvement can be attributed to a better memory performance and cache availability for Np = 16. However, for cases 1B and 5, such an advantage gets offset by an increased size of data storage required by the heat transfer module variables, thereby giving a substantially reducing trend at E16 compared to the other cases. Effect of the Dominant Solvers on En Figure 7 shows the relative computational time taken by different solvers. For each case, the time is evaluated as the average value from the runs for Np = 2 to Np = 16. Between cases 1A, 1B and 2, test case 1B has a higher contribution from the pressure Poisson equation (PPE) solver and also gives a better performance. Similarly, case 3B has a higher contribution from PPE solver and also a better performance compared to case 3A. This is due to a stiffer PPE in the former. Comparing case 3A and 4, although case 4 has a lower contribution from the PPE solver, it gives a better performance for Np = 2 and 8 due to the reduced idle times associated with MPI. This is due to a better computational load distribution in case 4. CONCLUDING REMARKS A level-set based two-phase solver is parallelized using a domain decomposition procedure, based on the data-parallel model, which incurs minimum modifications in the corresponding serial code.

Figure 7. Relative computational effort per solver for all the test problems

The parallel processes are coupled via data exchange/update on the boundary cells of each sub-domain using MPI. The parallelized code has been validated with various test problems against analytical/published results. The parallel performance is evaluated on 2 to 16 processors. For a fixed number of processors, the performance shows a considerable variation across the set of test cases. This suggests that a single problem with varying grid sizes may not be sufficient to fully exhibit the parallel scalability of an algorithm, especially in twophase flow problems. Factors such as the property ratio of the fluid, relative distribution of the light and heavy fluid over the domain, the uniformity in fluid action being handled by various processors and the nature of iterative solvers being invoked contribute to the variation observed in the parallel scaleup. In the present scheme of solvers, the PPE solver is found to give the best improvement in scaling, while the Gauss-Seidel velocity predictor can be improvised. On the other hand, the Gauss-Seidel energy equation solver has a favorable effect on the parallel performance. The phase change solver is conjectured to increase the communication time at a faster rate with increasing number of processors compared to other problems. Further, a problem with uniform fluid action over the domain is found to have a better load balance.

38th National Conference on Fluid Mechanics and Fluid Power (FMFP 2011)

NOMENCLATURE D Diameter [m] En Parallel efficiency with n processors f Friction factor Fr Froude number [ (uc/gLc)1/2] g Acceleration due to gravity [m/s2] Ja Jacob number [cp2(Tw-Tsat)/h12] k Thermal conductivity [W/m.K] L Length [m] Np Number of processors in parallel Nu Nusselt number [hLc/k] Prf Prandtl number [f cpf /kf] Ref Reynolds number [f ucLc/f] Sn Parallel speedup with n processors t Time [s] T Temperature [K] u Velocity [m/s] w Axial velocity [m/s] We Weber number [uc2Lc/] Greek Ratio of specific heat [cp2/cp1] Ratio of thermal conductivity [k2/k1] Viscosity ratio [2/1] d3 Critical wavelength for 3D boiling [m] Viscosity [Pa.s] Density [kg/m3] Non-dimensional time Density ratio [2/1] Subscripts b Bubble/Injection Nozzle c Characteristic scale f Phase (f = 1 for heavier fluid; f = 2 for lighter fluid) w Wall REFERENCES
Agbaglah G, Delaux S, Fuster D, Hoepffner J, Josserand C, Popinet S, Ray P, Scardovelli R, Zaleski S, 2011. Parallel simulation of multiphase flows using octree adaptivity and VOF method, Comptes Rendus Mcanique 339, 194-207. Berenson PJ, 1961. Film boiling heat transfer from a horizontal surface, J Heat Transfer 83, 351-362. Bertakis E, Gross S, Grande J, Fortmeier O, Reuksen A, Pfennig A, 2010. Validated simulation of droplet sedimentation with finite-element and

level-set methods, Chemical Engineering Science 65, 2037-2051. Esmaeeli A, Tryggvason G, 2004. Computations of film boiling. Part I: numerical method, Int J Heat and Mass Transfer 47, 5451-5461. Fortmeier O, Bucker HM, 2010. A parallel strategy for a level set simulation of droplets moving in a liquid medium, Proceedings of VECPAR, 200-209. Fortmeier O, Bucker HM, 2011. Parallel reinitialization of level set functions on distributed unstructured tetrahedral grids, J Computational Physics 230, 4437-4453. Gada VH, Sharma A, 2011. On a novel dual-grid level-set method for two-phase flow simulation, Numerical Heat Transfer Part B: Fundamentals 59, 26-57. George WL, Warren JA, 2002. A parallel 3D dendritic growth simulator using the Phase-Field Method, J Computational Physics 177, 264-283. Hajihashemi MR, El-Shenawee M, 2010. High performance computing for the level-set reconstruction algorithm, J Parallel and Distributed Computing 70, 671-679. Johnson SP, Cross M, 1991. Mapping structured grid three-dimensional CFD codes onto parallel architectures, Applied Mathematical Modelling 15, 394-405. Kitamura Y, Mishima H, Takahashi T, 1982. Stability of jets in liquid-liquid systems, The Canadian J Chemical Engineering 60, 723-731. Son G, Dhir VK, 1998. Numerical simulation of film boiling near critical pressures with a level-set method, J Heat Transfer 120, 183-192. Sussman M, 2005. A parallelized, adaptive algorithm for multiphase flows in general geometries, Computers and Structures 83, 435-444. Sussman M, Smereka P, 1997. Axisymmetric free boundary problems, Journal of Fluid Mechanics 341, 269-294. Wang K, Chang A, Kale LV, Dantzig JA, 2006. Parallelization of a level set method for simulating dendritic growth, J Parallel and Distributed Computing 66, 1379-1386. Zuzio D, Estivalezes JL, 2011. An efficient block parallel AMR method for two phase interfacial flow simulations, Computers and Fluids 44, 339-357.

You might also like