Spectral Clustering Approximation For Large Scale Crew Disruption Data Of An Airline Company For Intelligent Crew Recovery
DOI:
https://doi.org/10.31181/jscda11202315Keywords:
Airline Crew Disruptions, Data Science, Machine Learning, Graph Clustering, Spectral ClusteringAbstract
In the airline industry, after fuel costs, the crew costs con- stitute airlines’ second-highest cost items. For this reason, an airline needs to manage the valuable crew resource effi- ciently. Deviations from plans are fact in airline business and fixing deviations from crew schedules that occurred during operations by minimizing the crew-related delays and associated costs is one of the most important opera- tional burdens of airlines. In this context, the analysis of crew disruption data is vital in order to find disruption characteristics. Clustering analysis is one of the key meth- ods for analyzing the disruption characteristics. In this context, although there have been satisfactory studies in the literature and applications in the industry for small and medium-sized airlines, there is no good solution or industry practice for airlines with extensive networks and fleets. This study aims to analyze and categorize large- scale crew disruption data of a European airline. The relationship between categories of crew disruption and variables such as flight and crew types etc., are determined, and the disruption characteristics are revealed. For this purpose, clusters hidden in the large data set are extracted by spectral clustering. Due to the large size of the input data, a new approximation approach for spectral clustering is introduced. With the help of this new approximation approach, spectral clustering techniques are applied within a limited computational power and time frame as most real world scenario require. Even if the data set is gathered from one airline, the characteristics that are derived from the data is representing most of the cases an airline may face today. and will serve as a basis for further estimation and analysis of crew disruption.
Downloads
References
Mitsokapas, E., Schafer, B., Harris, R., & Beck, C. (2021). Statistical characterization of airplane delays. Scientific Reports, 11, 1-11. https://doi.org/10.1038/s41598-021-87279-8.
IATA IATA Forecast Predicts 8.2 billion Air Travelers in 2037. (2018), https://www.iata.org/pressroom /pr/Pages/2018-10-24-02.aspx, [Online; accessed 30-July-2021].
Jimenez Serrano, F., & Kazda, A. (2017). Airline disruption management: yesterday, today and tomorrow. Transportation Research Procedia, 28 3-10. https://doi.org/10.1016/j.trpro.2017.12.162.
Kohl, N., Larsen, A., Larsen, J., Ross, A., & Tiourine, S. (2007). Airline disruption manage- ment—Perspectives, experiences and outlook. Journal Of Air Transport Management , 13 149-162. https://doi.org/10.1016/j.jairtraman.2007.01.001.
Deveci, M., & Demirel, N. (2018). A Survey of the literature on airline crew scheduling. Engineering Applications Of Artificial Intelligence , 74 54-69. https://doi.org/10.1016/j.engappai.2018.05.008.
Schaefer, A., & Johnson. (2005). Airline Crew Scheduling Under Uncertainty. Transportation Science , 39 340-348. https://doi.org/10.1287/trsc.1040.0091.
Novianingsih, K., Hadianti, R., Uttunggadewa, S., & Soewono, E. (2015). A Solution Method for Airline Crew Recovery Problems. I nternational Journal Of Applied Mathematics And Statistics , 53 (4) 137-149.
Castro, A., Rocha, A., & Oliveira, E. (2014). A New Approach for Disruption Management in Airline Operations Control. Studies In Computational Intelligence , 562. http://dx.doi.org/10.1007/978-3-662- 43373-7.
Eurocontrol All-Causes Delay and Cancellations to Air Transport in Europe-2019. (2019), https://www.eurocontrol.int/publication/all-causes-delay-and-cancellations-air-transport-europe-2019, [Online; accessed 30-July-2021].
Khaksar, H., & Sheikholeslami, A. (2019). Airline delay prediction by machine learning algorithms. Transactions On Civil Engineering (A), 26 (5) 2689-2702. http://dx.doi.org/10.24200/sci.2017.20020.
Hewitt, M., & Frejinger, E. (2020). Data-driven optimization model customization. European Journal Of Operational Research, 287 438-451. https://doi.org/10.1016/j.ejor.2020.05.010.
Clausen, J., Larsen, A., Larsen, J., & Rezanova, N. (2010). Disruption management in the airline industry—Concepts, models and methods. Computers & Operations Research , 37 809-821. https: //doi.org/10.1016/j.cor.2009.03.027.
Xu, P., Corman, F., & Peng, Q. (2016). Analyzing Railway Disruptions and Their Impact on Delayed Traffic in Chinese High-Speed Railway. IFAC-PapersOnLine, 49 (3) 84-89. https://doi.org/10.1016/j.ifac ol.2016.07.015.
Ionescu, L., Gwiggner, C., & Kliewer, N. (2016). Data Analysis of Delays in Airline Networks. Business & Information Systems Engineering. 58 119-133. https://doi.org/10.1007/s12599-015-0391-3.
Hoeben, N. (2017). Dynamic Crew Pairing Recovery. Delft University of Technology
Vos, H., Santos, B., & Omondi, T. (2015). Aircraft Schedule Recovery Problem – A Dynamic Modeling Framework for Daily Operations. Transportation Research Procedia, 10 931-940. https: //doi.org/10.1016/j.trpro.2015.09.047.
Goverde, R. (2005). Punctuality of railway operations and timetable stability analysis. Netherlands TRAIL Research School.
Dunn, S., & Wilkinson, S. (2016). Increasing the resilience of air traffic networks using a network graph theory approach. Transportation Research Part E: Logistics And Transportation Review, 90 39-50. https://doi.org/10.1016/j.tre.2015.09.011.
Vathy-Fogarassy, A., & Abonyi, J. (2013). Graph-Based Clustering and Data Visualization Algorithms. S pringer.
Sohouenou, P., Christidis, P., Christodoulou, A., Neves, L., & Presti, D. (2020). Using a random road graph model to understand road networks robustness to link failures. International Journal Of Critical Infrastructure Protection, 29 100353. https://doi.org/10.1016/j.ijcip.2020.100353.
Schaeffer, S. (2007). Graph clustering. Computer Science Review , 1 (1) 27-64. https://doi.org/10.1 016/j.cosrev.2007.05.001.
Nascimento, M., & Carvalho, A. (2011). Spectral methods for graph clustering – A survey. European Journal Of Operational Research , 211 221-231. https://doi.org/10.1016/j.ejor.2010.08.012.
Luisa, B. (1995). Handbook of Combinatorics Volume 1. North Holland
Luxburg, U. (2007). A tutorial on spectral clustering. Stat Comput , 17 395-416. https://doi.org/10 .1007/s11222-007-9033-z.
Beauchemin, M. (2015). On affinity matrix normalization for graph cuts and spectral clustering. Pattern Recognition Letters , 68 90-96. https://doi.org/10.1016/j.patrec.2015.08.020.
Tautenhain, C., & Nascimento, M. (2020). An ensemble based on a bi-objective evolutionary spectral algorithm for graph clustering. Expert Systems With Applications , 141 112911. https: //doi.org/10.1016/j.eswa.2019.112911.
Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: analysis and an algorithm. Advances In Neural Information Processing Systems, 14 849-856.
Guimera, R., Mossa, S., Turtschi, A., & Amaral, L. (2005). The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proceedings Of The National Academy Of Sciences , 102 7794-7799. https://doi.org/10.1073/pnas.0407994102.
Benson, A., Gleich, D., & Leskovec, J. (2016). Higher-order organization of complex networks. Science , 353 163-166. https://doi.org/10.1126/science.aad9029.
Boughanem, M., Berrut, C., Mothe, J., & Soule-Dupuy, C. (2009). Advances in Information Retrieval. 31th European Conference On IR Research , 31.
Dana, K. (2018). Computational Texture and Patterns: From Textons to Deep Learning. Morgan & Claypool.
Zhang, X. (2020). A Matrix Algebra Approach to Artificial Intelligence. Springer. https://doi.org/10 .1007/978-981-15-2770-8.
Gopal, M. (2018). Applied Machine Learning. McGraw Hill Education.
Favati, P., Lotti, G., Menchi, O., & Romani, F. (2020). Construction of the similarity matrix for the spectral clustering method: Numerical experiments. Journal Of Computational And Applied Mathematics , 375 112795. https://doi.org/10.1016/j.cam.2020.112795
Barnhart, C., Belobaba, P., & Odoni, A. (2003). Applications of Operations Research in the Air Transport Industry. Transportation Science . 37(4) 368-391. https://doi.org/10.1287/trsc.37.4.368.23276.
Alami, N., Meknassi, M., En-nahnahi, N., El Adlouni, Y., & Ammor, O. (2021). Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling. Expert Systems With Applications, 172 114652. https://doi.org/10.1016/j.eswa.2021.114652.
Janani, R., & Vijayarani, S. (2019). Text document clustering using spectral clustering algorithm with particle swarm optimization. Expert Systems With Applications, 134 192-200. https://doi.org/10.1 016/j.eswa.2019.05.030.
Andrews, N., & Fox, E. (2007). Recent developments in document clustering. Department of Computer Science, Virginia Polytechnic Institute & State.
Steinbach, M., Ertöz, L., & Kumar, V. (2004). The challenges of clustering high dimensional data. New Directions In Statistical Physics, 273-309.
Almeida, H., Guedes, D., Meira, W., & Zaki, M. (2011). Is there a best quality metric for graph clusters?. Joint European Conference On Machine Learning And Knowledge Discovery In Databases, 44-59.
Afzalan, M. & Jazizadeh, F., (2019). An automated spectral clustering for multi-scale data. Neurocomputing, 347 94-108. https://doi.org/10.1007/s12594-019-1275-9.
Correa, C., & Lindstrom, P. (2012). Locally-scaled spectral clustering using empty region graphs. Proceedings Of The 18th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining. 1330-1338. https://doi.org/10.1016/j.jspi.2011.12.010.
Tremblay, N., & Loukas, A. (2020). Approximating spectral clustering via sampling: a review. Sampling Techniques For Supervised Or Unsupervised Tasks. 129-183. https://doi.org/10.48550/arXiv.1 901.10204.
Liu, J., Wang, C., Danilevsky, M., & Han, J. (2013). Large-scale spectral clustering on graphs. Twenty-Third International Joint Conference On Artificial Intelligence.
Cadot, M., Lelu, A., & Zitt, M. (2018). Benchmarking seventeen clustering methods on a text dataset. LORIA.
Wang, K., Wang, B., & Peng, L. (2009). CVAP: validation for cluster analyses. Data Science Journal. 0904220071-0904220071. http://dx.doi.org/10.2481/dsj.007-020.
Li, M., Lian, X., Kwok, J., & Lu, B. (2011). Time and space efficient spectral clustering via column sampling. CVPR 2011, 2297-2304.
Talebi, H., Peeters, L., Mueller, U., Tolosana-Delgado, R., & Boogaart, K. (2020). Towards geostatistical learning for the geosciences: A case study in improving the spatial awareness of spectral clustering. Mathematical Geosciences, 52 1035-1048. https://doi.org/10.1007/s11004-020-09867-0.
Duan, L., Ma, S., Aggarwal, C., & Sathe, S. (2021). Improving spectral clustering with deep embedding, cluster estimation and metric learning. Knowledge And Information Systems, 63 675-694. https://doi.org/10.1007/s10115-020-01530-8.
Thalamuthu, A., Mukhopadhyay, I., Zheng, X., & Tseng, G. (2006). Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 22 2405-2412. https://doi.org/10.1093/bi oinf ormatics/btl406.
Yan, D., Huang, L., & Jordan, M. (2009). Fast approximate spectral clustering. Proceedings Of The 15th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining. 907-916.
El-Bhissy, K., El-Faleet, F., & Ashour, W. (2014). Clustering Using Optimized Gaussian Kernel Function. I nternational Journal Of Artificial Intelligence And Application For Smart Devices IJAIASD, 2. https://doi.org/10.14257/ijaiasd.2014.2.1.04.
Wang, L., Leckie, C., Ramamohanarao, K., & Bezdek, J. (2009). Approximate spectral clustering. Pacific-Asia Conference On Knowledge Discovery And Data Mining, 134-146.
Assent, I. (2012). Clustering high dimensional data. W iley Interdisciplinary Reviews: Data Mining And Knowledge Discovery, 2 340-350. https://doi.org/10.1002/widm.1062.
Aggarwal, C., Hinneburg, A., & Keim, D. (2001). On the surprising behavior of distance metrics in high dimensional space. I nternational Conference On Database Theory 420-434.
Wu, S., Feng, X., & Zhou, W. (2014). Spectral clustering of high-dimensional data exploiting sparse representation vectors. Neurocomputing, 135 229-239. https://doi.org/10.1016/j.neucom.2013.12.027.
Cheng, M., Kusoemo, D., & Gosno, R. (2020). Text mining-based construction site accident classification using hybrid supervised machine learning. Automation In Construction, 118 103265. https://doi.org/10.1016/j.autcon.2020.103265.
Zhang, F., Fleyeh, H., Wang, X., & Lu, M. (2019) Construction site accident analysis using text mining and natural language processing techniques. Automation In Construction, 99 238-248. https: //doi.org/10.1016/j.autcon.2018.12.016.
Dörpinghaus, J., Schaaf, S., & Jacobs, M. (2018). Soft document clustering using a novel graph covering approach. BioData Mining, 11 1-20. https://doi.org/10.1186/s13040-018-0172-x.
Verma, D., & Meila, M. (2003). A comparison of spectral clustering algorithms. University Of Washington Tech Rep UWCSE030501, 1 1-18.
Lewis, D. (1999). Reuters-21578. http://www.daviddlewis.com/resources/testcollections/reuters21578/, [Online; accessed 30-July-2021].
Learn scikit-learn Machine Learning in Python. (2021). https://scikit-learn.org/stable/modules/classes. html#module-sklearn.cluster, [.Online; accessed 30-July-2021]
Kumhar, S., Kirmani, M., Sheetlani, J., & Hassan, M. (2021). Word Embedding Generation for Urdu Language using Word2vec model. Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2020.1 1.766.
Ruas, T., Ferreira, C., Grosky, W., França, F., & Medeiros, D. (2020). Enhanced word embeddings using multi-semantic representation through lexical chains. Information Sciences, 532 16-32. https: //doi.org/10.1016/j.ins.2020.04.048.
FastText Library for efficient text classification and representation learning. (2021). https://fasttext.cc, [Online; accessed 30-July-2021].
Alguliyev, R., Aliguliyev, R., & Sukhostat, L. (2021). Parallel batch k-means for Big data clustering. Computers & Industrial Engineering, 152 107023. https://doi.org/10.1016/j.cie.2020.107023.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 CC Attribution-NonCommercial-NoDerivatives 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.