Deskripsi Pekerjaan
Informasi lengkap tentang posisi dan persyaratan
Ringkasan Yukerja
Lowongan ?? Datacenter Hardware & Network Support Technician (Remote, from Brazil) di GECI Int. kami kurasi dari Himalayas (kategori Teknologi & IT). Posisi ini ditandai sebagai remote — pastikan timezone dan syarat lokasi kandidat di deskripsi resmi. Yukerja.com bukan pemberi kerja — lamaran diproses di situs sumber resmi.
Context
AS+ provides run support for GPU clusters operated by a cloud infrastructure partner. We are building a support team to handle day-to-day incidents on these clusters. This first role focuses on weekday coverage. The work sits low in the stack — hardware and network diagnosis — rather than high-level HPC or application support.
Responsibilities
Diagnose and triage incidents on GPU compute clusters, determining whether a fault originates on our side or the client's.
Investigate hardware failures: collect and analyze hardware logs, identify failed components, and document findings for resolution or RMA.
Diagnose GPU hardware faults (failure detection and isolation — not performance tuning or porting).
Configure and troubleshoot network connectivity, including InfiniBand fabric.
Work directly with the client as first line of support, in English.
Required skills
Solid system and network fundamentals — low-level networking and connectivity diagnosis.
Hands-on hardware troubleshooting, ideally on Dell server hardware.
Ability to diagnose GPU hardware failures (no deep GPU expertise required).
InfiniBand knowledge (important).
Fluent English (all client communication is in English).
Not required
No advanced OS administration.
No Slurm or workload-scheduler expertise.
No HPC application or GPU-porting background.
Setup
Full remote.
Weekday coverage (first hire; the team will expand to cover a wider window).
Originally posted on Himalayas