xflogo
remcom_logo_super_small_resolution
XStream 3.0 Hardware FDTD

dreamingcomputer.jpg

XStream® Hardware FDTD uses the latest GPU Computing technology to provide dramatic speed increases in XFdtd® calculation.  Utilizing the ability of the GPU (Graphics Processing Unit) in modern computer graphics cards to stream floating point calculations, XFdtd achieves extremely fast calculation speeds via the XStream Hardware FDTD option. XStream Hardware FDTD is now based on the NVIDIA FX 5600 GPU with 1.5 GBytes of accelerated memory. 

There are three versions of XStream 3.0 available from Remcom:  The basic XStream Single GPU, the XStream MicroCluster, and the XStream MiniCluster. XStream provides the calculation speed of a computer cluster at a fraction of the cost.  Details of each version of  XStream are below.


  

Single Card

XStream V 3.0

Single GPU with 1.5 GB of GPU memory for up to 50 Million FDTD cells fully-accelerated*

System Requirements for computer workstation to host the XStream V 3.0 single GPU:

  • Available PCI Express x16 slot for Graphics Card (x16 physical, x16 electrical recommended / x8 electrical required)
  • Windows XP Professional 64, or Red Hat Enterprise 64 bit Linux v4
  • Sufficient available power from host computer workstation
  • For the XStream V 3.0 single GPU Remcom will supply the modified graphics card and drivers, and the user is responsible for their installation.

MicroCluster

XStream V 3.0 MicroCluster

Two XStream V 3.0 GPUs in a Quadro Plex 1000 chassis with its own power supply provides 3 GB of GPU memory for up to 100 Million FDTD cells fully-accelerated*

The XStream V 3.0 MicroCluster and the XStream V 3.0 MiniCluster are shipped to the customer as a complete computer workstation with XStream hardware including:  

  • Two Opteron Dual Core Processors
  • 8 – 16 GBytes of computer RAM
  • One or Two Quadro Plex 1000 each containing two modified NVIDIA GPUs
  • Windows XP Professional 64 or Red Hat Enterprise 64 bit Linux v4

MiniCluster

xstream3cluster.jpg

XStream V 3.0 MiniCluster

Four XStream V 3.0 GPUs in two Quadro Plex 1000 chassis with their own power supplies provides 6 GB of GPU memory for up to 200 Million FDTD Cells fully-accelerated*

The XStream V 3.0 MicroCluster and the XStream V 3.0 MiniCluster are shipped to the customer as a complete computer workstation with XStream hardware including:  

  • Two Opteron Dual Core Processors
  • 8 – 16 GBytes of computer RAM
  • One or Two Quadro Plex 1000 each containing two modified NVIDIA GPUs
  • Windows XP Professional 64 or Red Hat Enterprise 64 bit Linux v4

There are some XFdtd features currently not available for the  XStream FDTD cards. 

Speed Comparisons

How fast is Version 3.0 XStream Hardware FDTD? Results depend on the size of the FDTD mesh, but for calculations that fit within the memory constraints of the GPUs, 3Gbytes for a Micro-Cluster and 6 GBytes, for Mini-Cluster, calculation times are on the order of, or faster than, a 32 node computer cluster and much faster than running the calculation on even a Dual Processor Dual Core computer work station.   This is  illustrated here using a horn antenna geometry meshed with two different cell sizes, one with about 64 Million mesh cells requiring about 2.7 GBytes for calculation, and again with about 146 million mesh cells requiring about 5.7 GBytes for calculation. 

xstream.jpg

XFdtd calculations for these two meshes were made on a computer work station with two AMD Opteron 2216 Dual Core processors.  A baseline calculation was made with a single processor using one core, another with two processors using both cores in each processor.   The smaller mesh was also run on an XStream Micro-Cluster,  and both meshes were run on an XStream Mini-Cluster.   The relative calculation speeds for the different hardware choices for each mesh, normalized to one processor running with one core, are shown in the following graph normalized to the single processor-single core calculation time.

The increase in computation speed with the Mini-Cluster is nearly 60 for the larger mesh and almost 35 for the smaller mesh.  For the smaller mesh the Micro-Cluster provides a speed increase of 29.   Even relative to full utilization of the full Dual Processor Dual Core workstation hardware, the Mini-Cluster provides a speedup of over 16 times for the larger mesh.   So a calculation requiring 8 hours on the Dual Processor Dual Core work station is completed in less than 30 minutes on a Mini-Cluster.  The potential savings in engineering time alone justify purchase of the appropriate Micro- or Mini-Cluster.


 
Privacy
Copyright 2007 Remcom Inc.