Participate in data collection and preprocessing for AI projects
Develop and test machine learning models under supervision
Analyze datasets to extract actionable insights for business problems
Present results to technical and non-technical stakeholders
Contribute to documentation and knowledge sharing within the team
Comfortable to interact with:
Big data sets (databases) where the typical database table records will reference at least a million or more individual data entries.
APIs – basic configuration and debugging.
Ideal candidate personalities will be individuals who:
likes to ask questions, are technical inclined, hands-on, enjoy working in small teams, are fond of problem solving and self-study to remain current with the ever-developing world of data science and AI technical landscape(s), displays confidence, self-awareness and the ability to work independently.
Successful candidates: will work either remotely or onsite at our Sandton office.
Display a deliverable orientation (outcome focused) – meaning, their nature should be inclined to express an intent to get things done!
Requirements:
National Diploma or Degree(s) majoring in Data Science, Computer Science, Statistics, or related fields
Additionally, the individual should ideally also be working towards (or alternatively already obtained) multiple certifications in the use of SQL, data science, AI, or AI applications using data labs.
Prior exposure with the ability to conduct AI model prompt training (enhancements) and or model-local hosting methods e.g. Ollama, LM studio, MS Co-Pilot studio or similar (will be advantageous)
Relevant coursework should include data quality (DQ) management principles, setup and monitoring of data pipelines, using machine learning (ML) methos, MS or MYSQL, interpreting data modelling outcomes to select reliable data training scenarios, perform data mining and profiling, conduct statistical analysis with data storytelling, AI model selection, re-enforcement learning, prompting and design.
Selection and or configuration of AI virtual agents and patterns – select, design, operate, monitor and support.
Tools: MS Teams, Python, Kaggle, Linux, Jupyter IDE, Git, Excel. SQL. Postman, data handling and manipulation utilities
Proficient in programming languages such as:
Python, Fedora-Linux, Windows server, R, MS SQL, PowerShell scripting PowerBI, data cloud platforms inter alia Amazon AWS, MS Azure, Google, Snowflake with hands-on experience performing data analysis, predictive modelling and data storytelling supported by data visualization methods
Technical Skills
Programming Proficiency: Emphasize strong skills in Python, as it is the industry standard for AI, along with familiarity in R or Java.
These languages are essential for building algorithms, automating data tasks, and working with AI frameworks
Machine Learning & Deep Learning: Highlight foundational knowledge in machine learning algorithms, deep learning models, and neural networks.
Understanding supervised and unsupervised learning, as well as practical experience with model training and evaluation
Data Literacy & Analytics: Ability to collect, clean, preprocess, and analyze data is crucial. Familiarity with tools such as pandas, NumPy, SQL, and data visualization libraries (e.g. Matplotlib, Seaborn, Tableau) is important for communicating insights and supporting model development
Mathematics and Statistics: A solid grasp of linear algebra, calculus, probability, and statistics is foundational for understanding and building AI models
Familiarity with AI Tools & Frameworks: Experience with data management, data virtualisation, machine learning libraries s), cloud platforms and big data tools
Between 5 - 7 Years
Report job