AI Research Engineer/Scientist
Palo Alto, CA, United States
About the Company: One of our well-established start-ups in Palo Alto, CA is looking to hire AI Research Engineer/scientist to train, optimize, scale, and deploy a variety of generative AI models such as large language models, voice/speech foundation models, vision and multi-modal foundation models using cutting-edge techniques and frameworks. In this role, you will conduct advanced research and development to push the boundaries of what is possible with generative AI and language models.
Responsibilities:
Research, architect, and deploy new generative AI methods such as autoregressive models, causal models, and diffusion models
Refine foundation model infrastructure to support the deployment of optimized AI models with a focus on C/C++, CUDA, and kernel-level programming enhancements
Implement state-of-the-art optimization techniques, including quantization, distillation, sparsity, streaming, and caching, for model performance enhancements
Design and develop novel large language models and corresponding architectures by leveraging transformers, Mixture-of-experts, attention mechanisms (FlashAttention-2 (w/ MQA, GQA), MLA (Multi-head Latent Attention) and state-of-the-art architectures
Implement large multimodal models following latest architectures - early fusion (e.g. NExT-GPT, Unified IO-2) or deep fusion (e.g. Zipper, Mirasol 3B) or similar
Train or finetune speech / audio models for representation (like, W2V-BERT, SONAR, AST), generation (like, Hi-Fi GAN, VQ-GAN, AudioLDM), multilingual multitask models (like, SeamlessM4T)
Train or fine-tune vision models for representation (like, ViT, Q-Former, CLIP, SigLIP.), generation (like, Stable diffusion, Stable cascade), video representation (like, Video-Swin transformer)
Drive innovations in NLP techniques like text generation, summarization, translation, question answering, etc. enabled by generative models
Integrate and tailor frameworks such as PyTorch, TensorFlow, DeepSpeed, Lightening, Habana and FSDP for the advancement of super-fast model training and inference
Advance the deployment infrastructure with MLOps frameworks such as KubeFlow, MosaicML, Anyscale, and Terraform, ensuring robust development and deployment cycles
Publish papers at top-tier AI/ML conferences like NeurIPS, ICML, ICLR on new research contributions
Collaborate with engineering teams to productionize research advancements into scalable services and products
Qualifications:
Ph.D. or MS with 2+ years of research / applied research experience in LLMs, NLP, CV, Reinforcement Learning, Voice, and Generative models
Demonstrated expertise in high-performance computing with proficiency in Python, C/C++, CUDA, and kernel-level programming for AI applications
Extensive experience in the optimization of training and inference for large-scale AI models, including practical knowledge of quantization, distillation, and LLMOps
Prior experience with large-scale distributed training and fine-tuning of foundation models such as GPT-3, LLaMA2, AlphaFold, and DALL-E
Experience with language modeling evaluation, prompt tuning and engineering, instruction tuning, and/or RLHF
Research contributions in NLP, generative modeling, LLMs demonstrated through publications and products
Strong programming skills and proficiency in Python, TensorFlow/PyTorch, and other ML frameworks and tools
Experience in Information Extraction, Question Answering, Conversational Agents (Chatbots), Data Visualization and/or text-to-image models
Excellent communication and collaboration skills to work cross-functionally with various teams
Please connect with Jia at jia@ f or more details and setup sometime to discuss the role and client.
#J-18808-Ljbffr