Lead Data Engineer
San Francisco, CA, United States
Who we are
At Twelve Labs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.
With a remarkable $77 million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.
We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.
About the role
As the ML Data Infrastructure Lead at Twelve Labs, you will lead the data team, managing data infrastructure and preparing high quality video data for our training runs. Unlike text or image, video is complex to process (because of size and decoding), multimodal (visual and audio), and has a temporal aspect. Information can become easily redundant while being dependent on earlier information (like text). Because of the complexity of data processing at Twelve Labs, this role will have a significant impact on the quality of our models. In this role, you will Acquire and deliver massive and high-quality datasets for our large training runs.
Develop and implement best practices and data pipelines (ingest, annotate, and incorporate high-quality datasets into model training and evaluation) by working with internal and external data partners.
Improve our data infrastructure (e.g., management, versioning) by collaborating with software engineers and security engineers.
Collaborate with modeling and product teams to evaluate the impact of the data on our models and continuously improve the data quality.
Hire, provide career growth guidance, coaching, and training for engineers on your team.
Work across teams to understand and manage project priorities and product deliverables, evaluate trade-offs, and drive technical initiatives from execution to landing.
You may be a good fit if you have 5+ years of experience in managing unstructured and/or human-annotated data (e.g., collecting or assessing sample quality)
Owned data initiatives such as data cleaning, data validation, data augmentation, and image or video processing
Proficiency in Python
Experience with ML frameworks such as Pytorch and Tensorflow
2+ years people management experience
Desired experience MS, PhD in Computer Science or a related field.
Experience with creating large-scale datasets or RLHF-based dataset creation.
Interview and Onboarding Process:
Recruiter Phone Screen -> Hiring Manager Call -> Technical Interview and/or Take Home Assignment -> Culture Interview -> Reference Checks
We're also excited to share that we'll do global onboarding in Seoul for all new hires (company-sponsored travel).
Even if there are a few checkboxes that aren’t ticked through your prior experience, we still encourage you to apply! If you are a 0-to-1 achiever, a ferocious learner, and a kind and fun team player who motivates others, you will find a home at Twelve Labs.
We welcome applicants from all walks of life and are committed to equal-opportunity employment. We cherish and celebrate diversity not just because it is the right thing to do, but because it makes our company much stronger.
Benefits and Perks
An open and inclusive culture and work environment. Work closely with a collaborative, mission-driven team on cutting-edge AI technology. Full health, dental, and vision benefits ✈️ Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years. Remote-flexible, offices in San Francisco and Seoul and coworking stipend VISA support (such as H1B and OPT transfer for US employees)