Data Crunching Server Build

So I'm building a server at work for data crunching massive amounts of data (up to 1.5 TB at a time). Our normal data crunching computers - which run Windows 10 with 512 GB RAM, two Titan X's, 2 CPU's (Intel, forget what kind, but they're good) aren't cutting it in terms of doing the work in a timely matter. One adjustment in the tweaking of a data set takes our computers about 5 minutes until we see the effects of the tweak.

So what I'm asking you all, is what equipment would be included in a data-processing server that could crunch these types of data sets? My budget is under $200K

Pic related

Other urls found in this thread:

aws.amazon.com/hpc/
volumegraphics.com/fileadmin/user_upload/flyer/vgstudiomax30_system_requirements_en.pdf
twitter.com/SFWRedditGifs

With those specs maybe you should be looking at your algorithm efficiency instead of trying to upgrade your hardware.

I call bull. No one working with that much data would be using Titans, they'd use something created specifically for GPGPU.

I'd just like to interject for moment. What you're refering to as Windows 10, is in fact, GNU/Windows 10, or as I've recently taken to calling it, GNU plus Windows 10. Windows 10 is not an operating system unto itself, but rather another free component of a fully functioning GNU system made useful by the GNU corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX.
Many computer users run a modified version of the GNU system every day, without realizing it. Through a peculiar turn of events, the version of GNU which is widely used today is often called Windows 10, and many of its users are not aware that it is basically the GNU system, developed by the GNU Project.
There really is a Windows 10, and these people are using it, but it is just a part of the system they use. Windows 10 is the kernel: the program in the system that allocates the machine's resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Windows 10 is normally used in combination with the GNU operating system: the whole system is basically GNU with Windows 10 added, or GNU/Windows 10. All the so-called Windows 10 distributions are really distributions of GNU/Windows 10!

>data crunching
>windows 10
what the FUCK are you doing, user

anyway with those filesizes, SSDs are an absolute requirement if you don't want to kill your runtime with file IO. also, i dont know what your problem is, but if you can make your computations easily parallelizable, you're a good candidate for gpu computation.

but to be absolutely honest - don't build this shit yourself. take the money you'd spend on this in-house equipment, and get a subscription to an actual server farm. something like aws.amazon.com/hpc/

For the cost, Titan X's are better than using 6000's.

We ARE using SSD's mostly. We recently purchased PCIE Hard Drives, and we are focused on maximizing GPU performance for computations.

We can't use server farms as the acquired data must remain within closed networks.

So my question remains, what components would I need to build a server that would do the heavy-lifting of the data crunching.

If you're spending 200k you shouldn't be asking us for what parts to get.

most useless fucks here, I swear.

What specifically are you processing? If your image in the OP is any indication, you're running finite element simulations, which is most definitely *not* "data crunching." The type of hardware you need will hugely vary depending on wtf you're doing

>data crunching computers
>run Windows 10
What the fuck are you doing

Windows 7 has a max ram of 192 GB
Windows 10 has 512 GB

Computed Tomography. It is not for finite element analysis. It is most assuredly data crunching because I said it is. How else would you recreate a 3D image based off of thousand of 2D xray images?

Can someone just answer the fucking question without questioning the question, jesus fuck

What software package are you using for model reconstruction? They usually offer guidelines for supported hardware/ software.

>Did you expect me to use Windows 7!?
...how can you miss the point so hard.

They sure do, now I want something better. Hence my question on servers to handle large amounts of data.

Apparently you missed my point numbnuts. We can only use those two OS's.

>CT data processing / reconstruction is not "data crunching".

Whaat?

>We can only use those two OS's.
Yes, that was why I made my first post, numbnuts.

>Only choose 2 OSes
Modern CPUs can virtualize operating systems quite well if you would need to.

...

So what software package is it so that we can help you? FFS, it might not even support multicore or GPU's depending on what it is

VGStudio Max

I said in my OP that we use 2 gpus, and 2 cpus.

It uses OpenCL to do the reconstruction, so you'll want to buy an AMD GPGPU. A titanX would just be an expensive office warmer in this case since Nvidia only supports CUDA mainly

Overall, you'll want an SSD, high memory bandwidth, and a powerful OpenCL enabled GPU.

OpenCL is supported on both AMD and Nvidia. The only thing is that CUDA is Nvidia's in-house alternative to OpenCL and as such it runs faster on Nvidia hardware.

Also, most people prefer using CUDA over OpenCL because CUDA is just easier to write.

Titan X's are normally fine. That shit is listed as a recommended card in their manual (page 3).
volumegraphics.com/fileadmin/user_upload/flyer/vgstudiomax30_system_requirements_en.pdf

OP I suggest contacting the VGStudio people and asking them directly. There may be optimizations you can do related to your data set with regards how it's sent to the GPU. I don't know shit about VGStudio so I can't help you.

>OpenCL is supported on both AMD and Nvidia. The only thing is that CUDA is Nvidia's in-house alternative to OpenCL and as such it runs faster on Nvidia hardware.
OpenCL support on Nvidia hardware is garbage. It exists, but you'll spend twice as much on a GPU with a third the performance.

>Also, most people prefer using CUDA over OpenCL because CUDA is just easier to write.
Who gives a fuck? That's not what the software supports.

Even if you were a retard and went with Nvidia, you would want one of their Tesla GPGPU's, not some NEET gaymer shit.

>We can only use those two OS's.
What the fuck shithole company are you working at that won't let you install a *NIX variant on your custom server.

CUDA is closer to the hardware in the sense that you can assume more about the underlying architecture which can give you some better aim at code efficiency.

I said earlier, on our computers we can only use those 2 (for security reasons). The server is different, am I wrong?

There's literally no security reason why you couldn't use a UNIX/LINUX system on your desktops. That's a sign of an incompetent and/or lazy systems admin.

>a third the performance.
Citation please. I know CUDA outperforms OpenCL on Nvidia cards but I don't think it's by that much anywhere.

>NEET gaymer shit
Say what you will but the Titan X is way cheaper than a Tesla GPGPU and it has a decent amount of ram for consumer grade data crunching.

It is strange that OP's work chose Titan X's for their computers but I still think it's a better idea to first contact VGStudio and ask if there's some configuration they're overlooking or if this performance is typical for the current set up. Also, they could probably provide better feedback as to what hardware to upgrade to (even OpenCL can be finicky when it comes to running on lots of different hardware, so performance may vary in arbitrary and unintuitive ways).

Yes. That was what I was implying. CUDA will always outperform OpenCL on Nvidia simply by virtue of it being designed for that hardware.

We use Titan X's because we've experimented with them and found they have comparable performance for what we're doing but significantly cheaper than the Quadro 6000's we were using. Plus we have 2 GPU's per computer times 10 computers, so we save a lot of money going with Titan X's.

why the hell is calculating a tomography so computationally expensive?
Are you using some new, experimental and horribly non-optimized algorithm?
Hire an assembly/parallel processing programmer to fix this.

>t. retard

>Nvidia
Maybe that's why you're not satisfied with the current level of performance though.

According to benchmarks the r9 fury performs comparably on OpenCL benchmarks to the titan X. The difference is that the r9 fury costs $500 and the titan x costs $2000.

Op should just get a whole bunch of AMD GPU's for cheap.

>calls me a retard
>uses terms such as "data crunching computer" and "crunching 1.5TB at a time"
>uses some stupidly unoptimized off-the-self software without knowing anything about it
>scales operations with the said software
>asks Veeky Forums on how to spend $200k on equipment
>doesn't even know the class of intel CPUs they currently use

Clearly you aren't the person responsible to make the choice, so suck my dick and go back to not knowing how to write a better and faster reconstruction algorithm.
>industrial x-ray tomography of an undamaged car
I bet you're a QC retard or worse.

>implying I'm him
>implying you can just make NP-complete problems not NP-complete with a little bit of assembly code
>hurr I know everything because I took java 101

>completeness being relevant to this conversation
kys you dumb CS freshman
The optimization class is next semester.

Just ignore him. He's retarded

If it's so easy then why don't you go out and write your own CT postprocessor that can somehow run on a regular laptop? You would make millions and save tons of lives.

Oh right, you can't because you don't know what you're doing.

You have to stitch millions of noisy 1D X-ray interferometry images into a 3D reconstruction of the object. No shit it's going to be challenging to process.

You should both lose your computer science license for not realizing this

...