|
XStream 3.0 Hardware FDTD |
|
XStream® Hardware FDTD uses the latest GPU Computing
technology to provide dramatic speed increases in XFdtd® calculation. Utilizing the ability of the GPU (Graphics
Processing Unit) in modern computer graphics cards to stream floating point
calculations, XFdtd achieves extremely fast calculation speeds via the XStream
Hardware FDTD option. XStream Hardware FDTD is now based on the NVIDIA FX 5600
GPU with 1.5 GBytes of accelerated memory.
There are three versions of XStream 3.0 available from Remcom: The basic XStream Single GPU, the XStream MicroCluster, and the XStream MiniCluster. XStream provides the calculation speed of a
computer cluster at a fraction of the cost. Details of each version of XStream are below.
XStream V 3.0
Single GPU with 1.5 GB of GPU
memory for up to 50 Million FDTD cells fully-accelerated*
System Requirements for computer
workstation to host the XStream V 3.0 single GPU:
- Available PCI Express x16 slot
for Graphics Card (x16 physical, x16 electrical recommended / x8 electrical
required)
- Windows XP Professional 64,
or Red Hat Enterprise 64 bit Linux v4
- Sufficient available power
from host computer workstation
-
For the XStream V 3.0 single
GPU Remcom will supply the modified graphics card and drivers, and
the user is responsible for their installation.
XStream V 3.0 MicroCluster
Two XStream V
3.0 GPUs in a Quadro Plex 1000 chassis with its own power supply
provides 3 GB of GPU memory for up to 100 Million FDTD cells fully-accelerated*
The XStream
V 3.0 MicroCluster and the XStream V 3.0
MiniCluster are shipped to the customer as a complete computer workstation
with XStream hardware including:
- Two Opteron Dual Core Processors
- 8 – 16 GBytes of computer
RAM
- One or Two Quadro Plex 1000
each containing two modified NVIDIA GPUs
- Windows XP Professional 64
or Red Hat Enterprise 64 bit Linux v4

XStream V 3.0 MiniCluster
Four XStream V
3.0 GPUs in two Quadro Plex 1000 chassis with their own power supplies
provides 6 GB of GPU memory for up to 200 Million FDTD Cells fully-accelerated*
The XStream
V 3.0 MicroCluster and the XStream V 3.0
MiniCluster are shipped to the customer as a complete computer workstation
with XStream hardware including:
- Two Opteron Dual Core Processors
- 8 – 16 GBytes of computer
RAM
- One or Two Quadro Plex 1000
each containing two modified NVIDIA GPUs
- Windows XP Professional 64
or Red Hat Enterprise 64 bit Linux v4
There are some XFdtd features currently not
available for the XStream FDTD cards.
How fast is Version 3.0 XStream Hardware FDTD?
Results depend on the size of the FDTD mesh, but for calculations that fit
within the memory constraints of the GPUs, 3Gbytes for a Micro-Cluster and 6
GBytes, for Mini-Cluster, calculation times are on the order of, or faster
than, a 32 node computer cluster and much faster than running the calculation
on even a Dual Processor Dual Core computer work station. This is illustrated here using a horn antenna geometry
meshed with two different cell sizes, one with about 64 Million mesh cells
requiring about 2.7 GBytes for calculation, and again with about 146 million
mesh cells requiring about 5.7 GBytes for calculation.
XFdtd calculations for these two meshes were made on a
computer work station with two AMD Opteron 2216 Dual Core processors. A baseline calculation was made with a single
processor using one core, another with two processors using both cores in each
processor. The smaller mesh was also run on an XStream
Micro-Cluster, and both meshes were run
on an XStream Mini-Cluster. The
relative calculation speeds for the different hardware choices for each mesh,
normalized to one processor running with one core, are shown in the following
graph normalized to the single processor-single core calculation time.
The increase in computation speed with the Mini-Cluster is
nearly 60 for the larger mesh and almost 35 for the smaller mesh. For the smaller mesh the Micro-Cluster
provides a speed increase of 29. Even
relative to full utilization of the full Dual Processor Dual Core workstation
hardware, the Mini-Cluster provides a speedup of over 16 times for the larger
mesh. So a calculation requiring 8 hours on the Dual
Processor Dual Core work station is completed in less than 30 minutes on a
Mini-Cluster. The potential savings in
engineering time alone justify purchase of the appropriate Micro- or
Mini-Cluster.
|