| Horn Antenna Using MPI |
|
Using MPI enables XFdtd to both speed up calculations and make calculations for meshes much larger than can fit within the 2 GByte limit of a 32 bit computer. The horn geometry is shown again in Figure 1. For the first set of calculations we will use the same mesh parameters as for the other examples using this horn. Output includes the far zone patterns, one plane of transient field slices, and a plane of complex electric fields. The summary of the mesh parameters is shown in Figure 2. The mesh requires about 250 Mbytes of RAM for the calculation. The calculation is first run on a dual processor desktop computer running Windows XP. The computer is equipped with 2.8 GHz Xeons, 2.0 GB total RAM, and is configured for hyperthreading. Results for the desktop are:
The calculation is then run on a Linux cluster with a head node and 15 compute nodes. The compute nodes each have two 3.06 GHz Xeon processors and 1 GByte RAM. A summary of the results for the horn geometry is in the table below:
For this relatively small mesh the scalability is reduced as the number of nodes increases. When this small mesh is spread over more nodes the ratio of cells per node to cells on internode boundaries is reduced, so the internode communications become a larger factor in run time. To demonstrate this a larger mesh is used. This mesh is obtained for the same horn geometry by meshing again with smaller cells. The cell size is reduced from 1.61 mm to 0.45 mm, with the problem summary shown in Figure 3. This larger mesh requires about 3.7 GBytes of RAM for the calculation. This is much too large for a 32 bit desktop computer. It is just too large for running on 4 nodes of the test cluster, but runs were made with 8, 12, and 15 nodes and the results are in the following table:
The scalability for the two different meshes is shown in Figure 4. The scalability for the smaller mesh is good as the number of nodes is increased from 1 to 2 and reasonable for 4 nodes, but the small mesh does not scale well as the number of nodes increases beyond 4. As expected, the scalability for the larger mesh is much better for a greater number of nodes. So for a relatively small mesh, two to four nodes of the cluster can be used and results obtained much more quickly than with a typical dual processor computer. For these smaller mesh calculations the cluster can be utilized efficiently by running several different calculations at the same time. For larger meshes, even those too large for a 32 bit computer, results can be obtained with the MPI version of XFdtd using more nodes, and with good scalability as the number of nodes assigned to the large calculation is increased.
|
|||||||||||||||||||||