·
AI & ML interests
LLM, RL, DL, ML, AGI. Developing LLMs (preferably fully fine tuned ) for various use cases.
Recent Activity
reacted
to
their post with 👍 about 5 hours ago Cpp-Code-Large
Dataset: https://huggingface.co/datasets/ajibawa-2023/Cpp-Code-Large
Cpp-Code-Large is a large-scale corpus of C++ source code comprising more than 5 million lines of C++ code. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, software engineering automation, and static program analysis for the C++ ecosystem.
By providing a high-volume, language-specific corpus, Cpp-Code-Large enables systematic experimentation in C++-focused model training, domain adaptation, and downstream code understanding tasks.
Cpp-Code-Large addresses the need for a dedicated C++-only dataset at substantial scale, enabling focused research across systems programming, performance-critical applications, embedded systems, game engines, and large-scale native software projects. reacted
to
their post with 🚀 about 5 hours ago Cpp-Code-Large
Dataset: https://huggingface.co/datasets/ajibawa-2023/Cpp-Code-Large
Cpp-Code-Large is a large-scale corpus of C++ source code comprising more than 5 million lines of C++ code. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, software engineering automation, and static program analysis for the C++ ecosystem.
By providing a high-volume, language-specific corpus, Cpp-Code-Large enables systematic experimentation in C++-focused model training, domain adaptation, and downstream code understanding tasks.
Cpp-Code-Large addresses the need for a dedicated C++-only dataset at substantial scale, enabling focused research across systems programming, performance-critical applications, embedded systems, game engines, and large-scale native software projects. reacted
to
their post with 🔥 about 5 hours ago Cpp-Code-Large
Dataset: https://huggingface.co/datasets/ajibawa-2023/Cpp-Code-Large
Cpp-Code-Large is a large-scale corpus of C++ source code comprising more than 5 million lines of C++ code. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, software engineering automation, and static program analysis for the C++ ecosystem.
By providing a high-volume, language-specific corpus, Cpp-Code-Large enables systematic experimentation in C++-focused model training, domain adaptation, and downstream code understanding tasks.
Cpp-Code-Large addresses the need for a dedicated C++-only dataset at substantial scale, enabling focused research across systems programming, performance-critical applications, embedded systems, game engines, and large-scale native software projects. View all activity
Organizations