Kerangka Kerja Penambangan Data yang Skalabel untuk Analisis Hukum Komputasional: Implementasi Pipeline Python dan Selenium pada Putusan Perkara Perdata Mahkamah Agung Indonesia
DOI:
https://doi.org/10.55606/jupti.v4i3.5634Keywords:
Computational Law, Data Mining, Legal Analytics, Python, SeleniumAbstract
The digitization of judicial records has introduced challenges in handling large-scale data, which traditional legal research methods cannot adequately address. This paper outlines the development and evaluation of an automated data mining framework designed to collect judicial decisions from the Indonesian Supreme Court's public directory. The aim is to create a data pipeline for analyzing civil litigation trends. The approach involves a multi-stage data acquisition process using a custom Python script and a headless Selenium WebDriver to navigate complex, JavaScript-rendered websites and handle asynchronous pagination. The BeautifulSoup library is used for efficient HTML parsing and metadata extraction. Data is structured and stored in a CSV file, ensuring data integrity during interruptions. The system successfully mined 21,780 civil case records from the 2024 period, achieving an extraction rate of 12 decisions per minute with a 75% success rate. This success rate was influenced by the website's responsiveness, requiring a 120-second Read Timeout and persistent retries. Descriptive analysis using the Pandas library identified unlawful acts, breach of contract, and land disputes as the most prevalent civil litigation categories. This research provides a scalable model for legal informatics and offers foundational data for future analyses, such as Natural Language Processing (NLP) on judicial texts.
References
Achmad, S. R., & Hadi, H. (2024). Identifikasi sifat kimia abu vulkanik dan upaya pemulihan tanaman karet terdampak letusan Gunung Kelud (Studi Kasus: Kebun Ngrangkah Pawon, Jawa Timur). Warta Perkaretan, 34(1), 19. https://doi.org/10.22302/ppk.wp.v34i1.60
Aini, L. N., Soenarminto, H., Hanudin, E., & Sartohadi, J. (2024). Plant nutritional potency of recent volcanic materials from the southern flank of Mt. Merapi, Indonesia. Bulgarian Journal of Agricultural Science, 25(3).
Anita, W. F., Jauhari, A., & Saptaria, L. (2022). Pengaruh fasilitas kantor, motivasi dan disiplin kerja terhadap kinerja pegawai pada Kelurahan Bawang Kota Kediri. Optimal Jurnal Ekonomi dan Manajemen, 2(4), 282-303. https://doi.org/10.55606/optimal.v2i4.755
Brackett, M. A., Palomera, R., Mojsa-Kaja, J., Reyes, M. R., & Salovey, P. (2010). Emotion-regulation ability, burnout, and job satisfaction among British secondary-school teachers. Psychology in the Schools, 47(4), 406–417. https://doi.org/10.1002/pits.20478
Chalkidis, I., Fergadiotis, M., Tsarapatsanis, D., Aletras, N., Androutsopoulos, I., & Malakasiotis, P. (2021). Paragraph-level rationale extraction through regularization: A case study on European Court of Human Rights cases (No. arXiv:2103.13084). arXiv. https://doi.org/10.48550/arXiv.2103.13084
Chalkidis, I., Garneau, N., Goanta, C., Katz, D. M., & Søgaard, A. (2023). LeXFiles and LegalLAMA: Facilitating English multinational legal language model development (No. arXiv:2305.07507). arXiv. https://doi.org/10.48550/arXiv.2305.07507
Dharma, P. Y., Widyawan, & Pratama, A. R. (2023). Legal judgment prediction: A systematic literature review. 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA), 691–696. https://doi.org/10.1109/ICAMIMIA60881.2023.10427855
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37. https://doi.org/10.1609/aimag.v17i3.1230
Francia, O. A. A., Nunez-del-Prado, M., & Alatrista-Salas, H. (2022). Survey of text mining techniques applied to judicial decisions prediction. Applied Sciences, 12(20), 10200.
Frankenreiter, J., & Livermore, M. A. (2020). Computational methods in legal analysis. Annual Review of Law and Social Science, 16(Volume 16, 2020), 39–57. https://doi.org/10.1146/annurev-lawsocsci-052720-121843
Glez-Peña, D., Lourenço, A., López-Fernández, H., Reboiro-Jato, M., & Fdez-Riverola, F. (2014). Web scraping technologies in an API world. Briefings in Bioinformatics, 15(5), 788–797. https://doi.org/10.1093/bib/bbt026
Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2024). GPT-4 passes the bar exam. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 382(2270), 20230254. https://doi.org/10.1098/rsta.2023.0254
Krotov, V., Johnson, L., & Silva, L. (2020). Tutorial: Legality and ethics of web scraping. Faculty & Staff Research and Creative Activity. https://doi.org/10.17705/1CAIS.04724
Medvedeva, M., Vols, M., & Wieling, M. (2020). Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law, 28(2), 237–266. https://doi.org/10.1007/s10506-019-09255-y
Moreno Schneider, J., Rehm, G., Montiel-Ponsoda, E., Rodríguez-Doncel, V., Martín-Chozas, P., Navas-Loro, M., Kaltenböck, M., Revenko, A., Karampatakis, S., Sageder, C., Gracia, J., Maganza, F., Kernerman, I., Lonke, D., Lagzdins, A., Bosque Gil, J., Verhoeven, P., Gomez Diaz, E., & Boil Ballesteros, P. (2022). Lynx: A knowledge-based AI service platform for content processing, enrichment and analysis for the legal domain. Information Systems, 106, 101966. https://doi.org/10.1016/j.is.2021.101966
Sivarajah, U., Kamal, M. M., Irani, Z., & Weerakkody, V. (2017). Critical analysis of big data challenges and analytical methods. Journal of Business Research, 70, 263–286. https://doi.org/10.1016/j.jbusres.2016.08.001
Vargiu, E., & Urru, M. (2012). Exploiting web scraping in a collaborative filtering-based approach to web advertising. Artificial Intelligence Research, 2(1), 44. https://doi.org/10.5430/air.v2n1p44
Zhao, B. (2017). Web scraping. In Encyclopedia of Big Data (pp. 1–3). Springer, Cham. https://doi.org/10.1007/978-3-319-32001-4_483-1
Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., & Sun, M. (2020). JEC-QA: A legal-domain question answering dataset. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9701–9708. https://doi.org/10.1609/aaai.v34i05.6519
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jurnal Publikasi Teknik Informatika

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





