UCF ARCC

UCF ARCC

Share

The UCF Advanced Research Computing Center (ARCC) maintains a high performance computing cluster which available for use by students and faculty.

Researchers and faculty at the University of Central Florida utilize the UCF ARCC, individually and in collaboration with other institutions, for highly calculation-intensive tasks. Research areas include molecular modeling, storm surge analysis, quantum optics, non-linear optics, turbine design, computer vision, and nanotechnology.

05/12/2022

The ARCC has completed the Summer maintenance cycle and Stokes and Newton are back in operation. Please remember that the ARCC brings both clusters down for one week, twice per year: once after Spring semester (this downtime) and once after fall semester.

Summary of Changes:

- Slurm (job scheduler) upgraded to version 21.08.8-2
- NVIDIA GPU Drivers on Newton upgraded to version 510.47.03
- We have retired all 16-core Aspen nodes that were still in use.
- Thanks to the Jump Start Fund of President Cartwright's 2021-2022 Strategic Investment Program, we have installed 36 brand-new 48-core nodes (each with 256GB of RAM and 100Gbps HDR Infiniband) to Stokes!

Even after the retirement of the old nodes above, these new nodes increase the size of Stokes and now provide a total of 5,816 cores!

12/16/2021

Stokes and Newton have returned to operation! Please remember that we have two such scheduled maintenance downtimes per year, one after Fall term (the one we just completed) and one after Spring term.

Please take a moment to read over the changes:

We upgraded the Infiniband network infrastructure to HDR (most nodes now run at 100 gigabits per second — up from 56 gigabits per second!). All new switches and all new cabling were installed within the past week!

Due to hardware requirements and impending additions, we have retired some of our oldest nodes. ec297 to ec324 have gone to a well-earned retirement after serving ARCC users for 11 years!

We appreciate your patience and wish you the best with your research!

12/13/2021

All the new HDR Infiniband networking has been installed! We are now bringing the system up to do internal testing.

12/09/2021

Our Fall downtime has commenced! First up is a power upgrade to support our modern nodes better. The higher core counts demand more amps!

We will also be doing a major network infrastructure update during this downtime!

11/15/2021

UCF ARCC Maintenance Outage: Thursday, December 9 through Wednesday, December 15, 2021

Stokes and Newton will be taken down per our bi-annual routine maintenance cycle near the end of Summer term. Specifically, the clusters will be unavailable from Thursday, December 9 through Wednesday, December 15.

The primary objectives during this downtime is to install a new, faster high-speed Infiniband network.

Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring/Summer. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes and Newton prior to the downtime.

News 12/17/2020

ARCC resources are back online after our Fall maintenance window. See our new page for changes during the maintenance period.

News Stokes and Newton have returned to operation early! Please remember that we have two such scheduled maintenance downtimes per year, one after Fall term (the one we just completed) and one after Spring term.

11/27/2020

UCF ARCC Maintenance Outage: Friday, December 11, 2020 - Friday, December 18, 2020:

Stokes and Newton will be taken down per our bi-annual routine maintenance cycle near the end of Fall term. Specifically, the clusters will be unavailable from Friday, December 11 through Friday, December 18.

The primary objectives during this downtime is to complete some re-wiring we need to do, install some new support equipment, and update some software and data on the clusters.

Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes and Newton prior to the downtime.

SLURM Jobs marked as “Requested Node not available, Reserved for maintenance”?

In preparation for the maintenance windows, the ARCC staff place a reservation on compute nodes starting at the beginning of the maintenance window. This ensures that jobs are not running when the staff need to begin the work to shut down the cluster. If you submit a job requesting more time than there is between the time the job is submitted and when the reservation begins, your job will stay in the queue with the status Requested Node not available, Reserved for maintenance. Jobs in this state will not start until after the maintenance window. If you believe your job can finish before the maintenance window, cancel the job and resubmit with a shorter time request. As with all resource requests, providing reasonable estimates to SLURM for run time will ensure the most efficient operation of the cluster.

08/01/2020

Stokes and Newton are now down for regularly scheduled maintenance per our notifications over the Summer. We expect that the work will be completed by Friday.

04/27/2020

Don't forget Stokes & Newton are going down for A/C system replacement tomorrow morning (Tue.28.Apr).

04/20/2020

Stokes and Newton will be *DOWN* starting Tuesday, April 28 at 5a for approximately 4 days so that the air conditioning system for the data center can be replaced. Please plan accordingly.

01/16/2020

Stokes and Newton are operational again.

01/16/2020

Unfortunately, there was a power loss in the Partnership III building last night for about two hours. The good news is that the infrastructure functioned as it was designed: The critical servers (file system, management, etc.) remained up and functioning, and the UPS performed as it was designed. The bad news is that the time exceeded our UPS limits, so nearly all compute nodes powered off -- the jobs were lost.

We are working to bring the nodes back online now. We apologize for the inconvenience and appreciate your patience.

Want your school to be the top-listed School/college in Orlando?

Click here to claim your Sponsored Listing.

Location

Telephone

Address


3039 Technology Pkwy
Orlando, FL
32826