I've spent a lot of time working with EC2 and I would not really recommend it for this purpose without putting a lot of effort into planning and considering all the options. First of all, EC2 can be more expensive than purchasing your own hardware unless you do it right. There are two billing types of Amazon Machine Image (AMI) instances; on demand and reserved. On demand instances are intended to be up for the short term - from a few hours to days. Their pricing per hour reflects this. Reserved instances are cheaper to run per hour (3 cents compared to 10 cents for certain instances) since you pay a chunk of money up front. Throwing your infrastructure in the cloud is not always cost effective unless you plan it correctly. (There are companies that do this - I work for one) Keep in mind that after a year or two of hardcore EC2 usage, you might have spent enough to have purchased your own cluster; all expenses after that point is wasted money. The other issues are designing your infrastructure over non-persistent storage. You might need to set up your own AMIs to ease some of the initial configuration (application installation and cluster management software). While you can use the many gigabytes EC2 instances come with for scratch space, you will need a combination of Simple Storage Service (S3) and Elastic Block Storage (EBS) for persistent storage. Each of these services has their own limitations. S3 can store an unlimited amount of files but maximum file size is around 5GB. An EBS volume can only be mounted by one instance at a time (for now). An EBS volume is also only available to EC2 instances in the same availability zone. You can think of availability zones as data centers in the same geographic region (although this isn't necessarily correct). While data transfers are free between EC2 instances (over local IP addresses), they are not when your are using the public IP, even if it is between EC2 instances as I've heard. If you're transferring gigabytes or even terabytes of data to be computed or resulting from computation, this can be an expensive and slow process. Amazon provides a service (AWS Import/Export) where you can send in storage devices and they'll copy the data over to S3. If you have a lot of devices, it can be very expensive. Amazon does provide a nice and simple calculator for this - http://awsimportexport.s3.amazonaws.com/aws-import-export-calculator.html - so that you can pick which option works best. They also have another calculator for their other services like EC2 and S3 - http://calculator.s3.amazonaws.com/calc5.html The biggest flaw with EC2 is that while you do have guaranteed CPU and memory resources, there is no guarantee of memory bandwidth. This means if there is a separate instance from a different AWS account sharing the same physical machine as your compute job, the other instance could be taking up all or most of the memory bandwidth thus making your job run slower. Not only does your job take longer to finish, it is actually more expensive. Since the infrastructure for power, space, and cooling already exists for you, it might be a better route to go with purchasing your own hardware. The biggest issue I see with deciding how many cores to put in a system is the network architecture you choose to purchase. If you choose to go with gigabit Ethernet, it doesn't make a huge difference. If you're thinking of using high speed interconnects like Infiniband, the number of systems you have is crucial since the switches and adapters can cost quite a bit of money. While a 24 port switch can be reasonably cheap (around $5000), a 48 port switches may not be ($20k-50k - http://www.provantage.com/scripts/search.dll?QUERY=Infiniband+switch ) so you would need to buy multiple smaller switches to get the right number of ports, and then add the right amount of switches to that so you can have good enough bisection bandwidth. For the current Intel Xeon (non-Nehalem) processors, you shouldn't really get more than 8 cores in the system as if you go over that count, there isn't enough memory bandwidth to keep them all well fed with work. Dell and sometimes Sun offer good deals to academic groups, so you might benefit from that. Both companies also offer free trials of hardware so you can benchmark your applications on each and pick which is best. While you could get more AMD nodes that have same or equal power for about the same price of a single Intel node, keep in mind the costs of having many less powerful systems opposed to few very powerful ones can be a financial hit in the future. steve ulrich wrote: > mike - > > building out your own compute infrastructure is so 2002. ;) > > i've used amazon EC2 for a very similar application where i've been > running large simulations on their infrastructure with my own VM image > that i use for my purposes. you can simply dial up the number of > processors that you purchase and use. you're charged by the hour for > the the number of CPU instances you use. > > instead of buying hardware yourself that you have to power up, replace > HDDs, etc. for and manage connectivity for you let someone pay for > that and simply use their resources on demand. > > On Tue, Jul 7, 2009 at 9:29 AM, Mike Miller<mbmiller+l at gmail.com> wrote: > >> We want to put together a few computers to make a little "farm" for doing >> our statistical analyses. It would be good to have 50-100 cores. What is >> the cheapest way to go? About 4GB RAM per core should be more than >> enough. I'm thinking quad-core chips are going to be cheaper. How many >> sockets per mobo? I guess 1-, 2- and 4-socket mobos are available. We >> don't need SMP, but we'll take it if it is cheap (which I doubt). We'll >> use cloned HDDs in these boxes. My first thought is "blade" but maybe >> blades are more expensive than somewhat less convenient ways of housing >> the mobos. >> >> We have people here to house it and manage it and to pay for >> electricity(!). They also will have ideas about what we should buy. >> >> Any ideas? >> >> Which CPU gives the most flops/dollar these days? >> >> Mike >> >> _______________________________________________ >> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota >> tclug-list at mn-linux.org >> http://mailman.mn-linux.org/mailman/listinfo/tclug-list >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20090707/753b6639/attachment.htm