Modelling the Employability of Management Graduates: Complementing Parametric Approaches with Machine Learning on Small Social Data
DOI:
https://doi.org/10.24297/ijct.v25i.9795Keywords:
Multivariate statistics, Machine learning, Gradient descent, BoostingAbstract
This study investigates how supervised and unsupervised machine learning algorithms can complement
traditional statistical methods in the analysis of social survey data. Social science datasets are typically small,
noisy, and heterogeneous, which makes robustness and interpretability more important than computational
efficiency.
Using data from a 2024 survey on the employability of management graduates in Antananarivo, the study compares machine learning approaches with classical multivariate techniques. The objectives are to provide a statistical description of a social reality and to establish criteria for selecting algorithms suited to small-sample contexts.
The methodological framework integrates statistical tools such as Chi-square tests, analysis of variance, and multiple regression with exploratory approaches including association rules and clustering. It also incorporates supervised models such as neural networks trained via gradient descent and its variants. Beyond these models, ensemble methods based on decision trees—bagging, random forests, and gradient boosting—are evaluated to highlight their relative strengths.
Findings show that gradient boosting offers the most consistent predictive performance while remaining relatively simple to implement. This makes it particularly effective for analysing small and heterogeneous datasets, thereby providing practical value for applied social science research.
Downloads
References
Bottou, L. (1991). Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91(8), 12.
Breiman, L. (2001a). Random Forests. Machine Learning, 45(1), 5‑32. https://doi.org/10.1023/A:1010933404324
Breiman, L. (2001b). Statistical modeling : The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199‑231.
Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (2017). Classification and Regression Trees. Chapman and Hall/CRC. https://doi.org/10.1201/9781315139470
Chen, T., & Guestrin, C. (2016). XGBoost : A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785‑794. https://doi.org/10.1145/2939672.2939785
Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367‑378.
Krabel, T. M., Tran, T. N. T., Groll, A., Horn, D., & Jentsch, C. (2020). Random boosting and random^2 forests—A random tree depth injection approach (No. arXiv:2009.06078). arXiv. https://doi.org/10.48550/arXiv.2009.06078
Krzywinski, M., & Altman, N. (2017). Classification and regression trees. Nature methods, 14(8), 757‑758. https://doi.org/10.1038/nmeth.4370
Mayr, A., Binder, H., Gefeller, O., & Schmid, M. (2014). The Evolution of Boosting Algorithms : From Machine Learning to Statistical Modelling. Methods of Information in Medicine, 53(06), 419‑427. https://doi.org/10.3414/ME13-01-0122
Moumen, A., Bouchama, E. H., & EL IDIRISSI, Y. E. B. (2020). Data mining techniques for employability : Systematic literature review. 2020 IEEE 2nd International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS), 1‑5. https://ieeexplore.ieee.org/abstract/document/9314555/
Tibshirani, R. J. (2021). Equivalences between sparse models and neural networks. Working Notes. URL https://www. stat. cmu. edu/ryantibs/papers/sparsitynn. pdf. https://www.stat.berkeley.edu/~ryantibs/papers/sparsitynn.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ravelonahina An drianjaka Hasina, Robinson Matio, Andriamanohisoa Hery Zo

This work is licensed under a Creative Commons Attribution 4.0 International License.
