Job Description
Full details about the role and requirements
Yukerja Summary
The Principal AI Engineer role at SMARTM2M Indonesia is curated from Glints (category Keuangan & Perbankan). Note the work location (Sumur Bandung) before applying. Yukerja.com is not the employer — applications are handled on the official source site.
Role Summary
Lead the design, training, evaluation, and deployment of production-grade, on-premise AI systems with an emphasis on fine-tuned and multi-agent LLM solutions, safety/red-teaming, and scalable MLOps in secure or air-gapped environments. Work with open-source model families and local inference stacks to deliver reliable, secure, and cost-efficient services on-site in Bandung or/and Busan.
Key Responsibilities
- Lead end-to-end development of LLM systems: dataset curation, SFT/LoRA/QLoRA, DPO/RLHF, evaluation, and on-prem deployment.
- Design and implement multi-agent orchestration and tool-use pipelines (e.g., LangGraph/LangChain/AutoGen), including function calling, RAG, structured outputs, fallbacks, and recovery strategies.
- Build rigorous red-teaming and safety evaluation harnesses; simulate jailbreaks, prompt injection, data exfiltration, and model manipulation; implement guardrails, policies, and moderation.
- Conduct adversarial and robustness testing for NLP/CV models; assess distribution shift, perturbations, poisoning risks; implement mitigations and hardening.
- Architect retrieval-augmented systems with vector databases; optimize chunking, embeddings, indexing, hybrid search, re-ranking, and latency for reliable grounding.
- Own performance and cost optimization: quantization (GGUF, GPTQ, AWQ), batching, KV cache management, speculative decoding, caching, and GPU utilization.
- Develop production APIs/services with FastAPI or gRPC; implement observability, tracing, canarying, and human-in-the-loop feedback loops; monitor quality drift and handle incidents.
- Contribute to internal AI infrastructure, tooling, and reusable components; enforce reproducibility and governance with MLflow, model registries, and artifact stores.
- Deploy and operate models on-prem (VMs/Kubernetes), including versioning, rollback, autoscaling, and secure upgrade paths for air-gapped sites.
- Collaborate with product, engineering, and domain teams to scope experiments and deliverables; produce clear design docs, threat models, and runbooks.
- Mentor junior engineers; drive best practices, code reviews, and knowledge sharing.
Requirements
- Bachelor’s or Master’s in Computer Science, Artificial Intelligence, or related field, or equivalent experience.
- 5+ years in applied ML/AI and 10+ years in software engineering.
- Proficient in Python; hands-on with PyTorch (and/or TensorFlow).
- Demonstrated LLM fine-tuning experience: SFT, LoRA/QLoRA, DPO or RLHF; dataset preparation, synthetic data generation, and large-scale evaluation.
- Self-hosted model experience with at least one open-source family (e.g., Llama, Qwen, Mistral) and on-prem inference stacks (vLLM, TGI, TensorRT-LLM, Ollama).
- Multi-agent design and tool-use orchestration; function calling, tool/plugin integration, structured outputs, error handling, and retries.
- RAG pipelines with vector stores (pgvector, Milvus, Weaviate); embedding model selection and retrieval quality evaluation.
- MLOps expertise: Docker, Kubernetes, Git, CI/CD, experiment tracking (MLflow), model registry, data/version management.
- Production monitoring and observability: logging, tracing, metrics; quality and safety evaluation frameworks; SLOs and alerting.
- Security and safety practices: prompt-injection defenses, PII handling, RBAC, secrets management, audit logging; familiarity with regulated/on-prem environments and local data protection requirements.
- Excellent problem-solving and debugging skills; comfortable in a fast-paced, collaborative environment.
- Willing to work on-site in Bandung; fluent in English and comfortable with Bahasa Indonesia.
Nice to Have
- Adversarial ML and robustness background; secure model deployment in government or critical-infrastructure contexts.
- Familiarity with the Hugging Face ecosystem and optimized inference (quantization toolchains, tensor parallelism).
- Deep understanding of transformer internals, tokenization, and quantization strategies.
- Experience with multimodal or CV pipelines; streaming data and real-time inference.
- Knowledge graphs (e.g., Neo4j) and graph-augmented retrieval.
- Interest in cybersecurity challenges or CTFs.
- GPU systems expertise (CUDA, NCCL, MIG) and performance profiling.