Principal AI Engineer — SMARTM2M Indonesia

Job Description

Full details about the role and requirements

Yukerja Summary

The Principal AI Engineer role at SMARTM2M Indonesia is curated from Glints (category Keuangan & Perbankan). Note the work location (Sumur Bandung) before applying. Yukerja.com is not the employer — applications are handled on the official source site.

Role Summary

Lead the design, training, evaluation, and deployment of production-grade, on-premise AI systems with an emphasis on fine-tuned and multi-agent LLM solutions, safety/red-teaming, and scalable MLOps in secure or air-gapped environments. Work with open-source model families and local inference stacks to deliver reliable, secure, and cost-efficient services on-site in Bandung or/and Busan.

Key Responsibilities

Lead end-to-end development of LLM systems: dataset curation, SFT/LoRA/QLoRA, DPO/RLHF, evaluation, and on-prem deployment.
Design and implement multi-agent orchestration and tool-use pipelines (e.g., LangGraph/LangChain/AutoGen), including function calling, RAG, structured outputs, fallbacks, and recovery strategies.
Build rigorous red-teaming and safety evaluation harnesses; simulate jailbreaks, prompt injection, data exfiltration, and model manipulation; implement guardrails, policies, and moderation.
Conduct adversarial and robustness testing for NLP/CV models; assess distribution shift, perturbations, poisoning risks; implement mitigations and hardening.
Architect retrieval-augmented systems with vector databases; optimize chunking, embeddings, indexing, hybrid search, re-ranking, and latency for reliable grounding.
Own performance and cost optimization: quantization (GGUF, GPTQ, AWQ), batching, KV cache management, speculative decoding, caching, and GPU utilization.
Develop production APIs/services with FastAPI or gRPC; implement observability, tracing, canarying, and human-in-the-loop feedback loops; monitor quality drift and handle incidents.
Contribute to internal AI infrastructure, tooling, and reusable components; enforce reproducibility and governance with MLflow, model registries, and artifact stores.
Deploy and operate models on-prem (VMs/Kubernetes), including versioning, rollback, autoscaling, and secure upgrade paths for air-gapped sites.
Collaborate with product, engineering, and domain teams to scope experiments and deliverables; produce clear design docs, threat models, and runbooks.
Mentor junior engineers; drive best practices, code reviews, and knowledge sharing.

Requirements

Bachelor’s or Master’s in Computer Science, Artificial Intelligence, or related field, or equivalent experience.
5+ years in applied ML/AI and 10+ years in software engineering.
Proficient in Python; hands-on with PyTorch (and/or TensorFlow).
Demonstrated LLM fine-tuning experience: SFT, LoRA/QLoRA, DPO or RLHF; dataset preparation, synthetic data generation, and large-scale evaluation.
Self-hosted model experience with at least one open-source family (e.g., Llama, Qwen, Mistral) and on-prem inference stacks (vLLM, TGI, TensorRT-LLM, Ollama).
Multi-agent design and tool-use orchestration; function calling, tool/plugin integration, structured outputs, error handling, and retries.
RAG pipelines with vector stores (pgvector, Milvus, Weaviate); embedding model selection and retrieval quality evaluation.
MLOps expertise: Docker, Kubernetes, Git, CI/CD, experiment tracking (MLflow), model registry, data/version management.
Production monitoring and observability: logging, tracing, metrics; quality and safety evaluation frameworks; SLOs and alerting.
Security and safety practices: prompt-injection defenses, PII handling, RBAC, secrets management, audit logging; familiarity with regulated/on-prem environments and local data protection requirements.
Excellent problem-solving and debugging skills; comfortable in a fast-paced, collaborative environment.
Willing to work on-site in Bandung; fluent in English and comfortable with Bahasa Indonesia.

Nice to Have

Adversarial ML and robustness background; secure model deployment in government or critical-infrastructure contexts.
Familiarity with the Hugging Face ecosystem and optimized inference (quantization toolchains, tensor parallelism).
Deep understanding of transformer internals, tokenization, and quantization strategies.
Experience with multimodal or CV pipelines; streaming data and real-time inference.
Knowledge graphs (e.g., Neo4j) and graph-augmented retrieval.
Interest in cybersecurity challenges or CTFs.
GPU systems expertise (CUDA, NCCL, MIG) and performance profiling.

Yukerja Summary

Tips for Applying to Principal AI Engineer