Lustre System Admin HPC Engineer

Job Definition.

Do IT Now is looking for a highly skilled High Performance Computing Engineer to join the team. His main role will be to answer user questions and solve technical problems by phone, e-mail.

Join our dynamic and innovative team as a HPC Engineer! Be part of our cutting-edge projects where you’ll collaborate seamlessly with cross-functional teams to ensure the reliability, perfor-mance, and scalability of our infrastructure and services, with a special focus on our High-Perfor-mance Computing (HPC) environments and AI-driven applications.

Main Responsibilities:

  • Designing, implementing, and maintaining high-performance computing infrastructure
  • Optimizing network configurations for efficient data processing and storage.
  • Managing cloud computing resources and ensuring scalability and reliability.
  • Providing support for users’ applications including troubleshooting and performance tun-ing.
  • Collaborating with cross-functional teams to ensure seamless integration of HPC systems.

Skills and Experience.

Essential skills

The ideal candidate will have a strong background in infrastructure, networking, cloud computing, and application support, with a focus on supporting users’ application.

We are searching for a person with a solid Linux system administration skills and a deep specialization in the Lustre parallel file system, typically in High-Performance Computing (HPC) environments.

  • Linux System Administration: Proven expertise in managing large Linux server installations (specifically distributions used in HPC, such as RedHat EL, SLES, etc.). Preferred Require-ment: Linux certification.
  • Networking: Switching, VLANs, Firewalls, IB Network (InfiniBand Network).
  • Parallel File Systems: Experience with Parallel File Systems (Lustre, BeeGFS, GPFS, CEPH, etc.).
  • Storage Hardware: Knowledge of storage hardware used in HPC environments (RAID, LVM, Fibre Channel, high-speed disks) and how to interface them with Lustre.
  • Storage Management and Maintenance: Experience with storage vendors/platforms like DDN, HPE, NetApp, Isilon, Qumulo.
  • Monitoring Systems: Experience with Zabbix, Prometheus, Grafana.
  • Scripting and Automation: Strong knowledge of Bash and scripting languages like Python or Perl to automate management tasks.

Preferential requirements.

  • Container Technologies: Docker, Singularity, R.H. Container (Red Hat Containers).
  • Remote Visualization: Administration and maintenance of remote visualization on Linux systems (VNC / VirtualGL).
  • HPC Schedulers: Experience with HPC Schedulers like PBS, SLURM, etc.
  • Database Knowledge: Basic knowledge of main relational databases (MySQL, Oracle – as-suming ‘Ost’ was a typo for Oracle, Postgres).
  • Deployment Tools: Experience with Deployment Tools such as Xcat, Bright Cluster Man-ager.
  • Server Platforms: Knowledge of DELL, HP, or Lenovo Server platforms and engineering Linux-based solutions on these platforms.

LANGUAGE SKILLS.

  • Good understanding in English.
  • Ideally, the candidate should be familiar with one of the following languages: Italian, Spanish or French.

Personal Attributes.

It is essential that he lives in the EMEA region.

  • Teamwork attitude.
  • Leadership and ability to innovate.
  • Proactivity and good communication.
  • Focus on achieving results.

About Do IT Now.

Do IT Now is a joint venture of three European experts in HPC services and solutions market as they are Do IT Systems, from Italy, HPCNow!, from Spain, and UCit, from France. The objective is to help customers get the most out of HPC by providing efficient and straightforward supercomputer usage, based on the deep understanding of the most advances HPC technologies, a high-quality customer service and an outstanding and reputed user support.

WHY WORK WITH US.

  • Technology-driven company culture.
  • 100% remote work opportunity.
  • Rapid company growth and career advancement possibilities.
  • Continuous learning and development programs.
  • Startup and Multicultural environment.
  • Regular team building activities.

Join us in our mission to build and maintain highly reliable, scalable, and efficient systems that power our business. If you’re passionate about automation, problem-solving, and creating robust infrastructure, we want to hear from you!

Conditions: – Full remote

Remuneration : We offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living from where the candidate is based.

Regular business travel.

Apply

You are interested in the position? Send us your updated CV and a short motivation letter.

"*" indicates required fields

This field is hidden when viewing the form
Drop files here or
Max. file size: 128 MB.