Kerangka Kerja Penambangan Data yang Skalabel untuk Analisis Hukum Komputasional: Implementasi Pipeline Python dan Selenium pada Putusan Perkara Perdata Mahkamah Agung Indonesia

Authors

  • Lazuardi Fatahilah Hamdi Universitas Muhammadiyah Gombong
  • Aang Anwaruddin Universitas Muhammadiyah Gombong
  • Aditya Maulana Rizqi Universitas Muhammadiyah Gombong

DOI:

https://doi.org/10.55606/jupti.v4i3.5634

Keywords:

Computational Law, Data Mining, Legal Analytics, Python, Selenium

Abstract

The digitization of judicial records has introduced challenges in handling large-scale data, which traditional legal research methods cannot adequately address. This paper outlines the development and evaluation of an automated data mining framework designed to collect judicial decisions from the Indonesian Supreme Court's public directory. The aim is to create a data pipeline for analyzing civil litigation trends. The approach involves a multi-stage data acquisition process using a custom Python script and a headless Selenium WebDriver to navigate complex, JavaScript-rendered websites and handle asynchronous pagination. The BeautifulSoup library is used for efficient HTML parsing and metadata extraction. Data is structured and stored in a CSV file, ensuring data integrity during interruptions. The system successfully mined 21,780 civil case records from the 2024 period, achieving an extraction rate of 12 decisions per minute with a 75% success rate. This success rate was influenced by the website's responsiveness, requiring a 120-second Read Timeout and persistent retries. Descriptive analysis using the Pandas library identified unlawful acts, breach of contract, and land disputes as the most prevalent civil litigation categories. This research provides a scalable model for legal informatics and offers foundational data for future analyses, such as Natural Language Processing (NLP) on judicial texts.

References

Achmad, S. R., & Hadi, H. (2024). Identifikasi sifat kimia abu vulkanik dan upaya pemulihan tanaman karet terdampak letusan Gunung Kelud (Studi Kasus: Kebun Ngrangkah Pawon, Jawa Timur). Warta Perkaretan, 34(1), 19. https://doi.org/10.22302/ppk.wp.v34i1.60

Aini, L. N., Soenarminto, H., Hanudin, E., & Sartohadi, J. (2024). Plant nutritional potency of recent volcanic materials from the southern flank of Mt. Merapi, Indonesia. Bulgarian Journal of Agricultural Science, 25(3).

Anita, W. F., Jauhari, A., & Saptaria, L. (2022). Pengaruh fasilitas kantor, motivasi dan disiplin kerja terhadap kinerja pegawai pada Kelurahan Bawang Kota Kediri. Optimal Jurnal Ekonomi dan Manajemen, 2(4), 282-303. https://doi.org/10.55606/optimal.v2i4.755

Brackett, M. A., Palomera, R., Mojsa-Kaja, J., Reyes, M. R., & Salovey, P. (2010). Emotion-regulation ability, burnout, and job satisfaction among British secondary-school teachers. Psychology in the Schools, 47(4), 406–417. https://doi.org/10.1002/pits.20478

Chalkidis, I., Fergadiotis, M., Tsarapatsanis, D., Aletras, N., Androutsopoulos, I., & Malakasiotis, P. (2021). Paragraph-level rationale extraction through regularization: A case study on European Court of Human Rights cases (No. arXiv:2103.13084). arXiv. https://doi.org/10.48550/arXiv.2103.13084

Chalkidis, I., Garneau, N., Goanta, C., Katz, D. M., & Søgaard, A. (2023). LeXFiles and LegalLAMA: Facilitating English multinational legal language model development (No. arXiv:2305.07507). arXiv. https://doi.org/10.48550/arXiv.2305.07507

Dharma, P. Y., Widyawan, & Pratama, A. R. (2023). Legal judgment prediction: A systematic literature review. 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA), 691–696. https://doi.org/10.1109/ICAMIMIA60881.2023.10427855

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37. https://doi.org/10.1609/aimag.v17i3.1230

Francia, O. A. A., Nunez-del-Prado, M., & Alatrista-Salas, H. (2022). Survey of text mining techniques applied to judicial decisions prediction. Applied Sciences, 12(20), 10200.

Frankenreiter, J., & Livermore, M. A. (2020). Computational methods in legal analysis. Annual Review of Law and Social Science, 16(Volume 16, 2020), 39–57. https://doi.org/10.1146/annurev-lawsocsci-052720-121843

Glez-Peña, D., Lourenço, A., López-Fernández, H., Reboiro-Jato, M., & Fdez-Riverola, F. (2014). Web scraping technologies in an API world. Briefings in Bioinformatics, 15(5), 788–797. https://doi.org/10.1093/bib/bbt026

Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2024). GPT-4 passes the bar exam. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 382(2270), 20230254. https://doi.org/10.1098/rsta.2023.0254

Krotov, V., Johnson, L., & Silva, L. (2020). Tutorial: Legality and ethics of web scraping. Faculty & Staff Research and Creative Activity. https://doi.org/10.17705/1CAIS.04724

Medvedeva, M., Vols, M., & Wieling, M. (2020). Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law, 28(2), 237–266. https://doi.org/10.1007/s10506-019-09255-y

Moreno Schneider, J., Rehm, G., Montiel-Ponsoda, E., Rodríguez-Doncel, V., Martín-Chozas, P., Navas-Loro, M., Kaltenböck, M., Revenko, A., Karampatakis, S., Sageder, C., Gracia, J., Maganza, F., Kernerman, I., Lonke, D., Lagzdins, A., Bosque Gil, J., Verhoeven, P., Gomez Diaz, E., & Boil Ballesteros, P. (2022). Lynx: A knowledge-based AI service platform for content processing, enrichment and analysis for the legal domain. Information Systems, 106, 101966. https://doi.org/10.1016/j.is.2021.101966

Sivarajah, U., Kamal, M. M., Irani, Z., & Weerakkody, V. (2017). Critical analysis of big data challenges and analytical methods. Journal of Business Research, 70, 263–286. https://doi.org/10.1016/j.jbusres.2016.08.001

Vargiu, E., & Urru, M. (2012). Exploiting web scraping in a collaborative filtering-based approach to web advertising. Artificial Intelligence Research, 2(1), 44. https://doi.org/10.5430/air.v2n1p44

Zhao, B. (2017). Web scraping. In Encyclopedia of Big Data (pp. 1–3). Springer, Cham. https://doi.org/10.1007/978-3-319-32001-4_483-1

Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., & Sun, M. (2020). JEC-QA: A legal-domain question answering dataset. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9701–9708. https://doi.org/10.1609/aaai.v34i05.6519

Downloads

Published

2025-09-30

How to Cite

Fatahilah Hamdi, L., Aang Anwaruddin, & Aditya Maulana Rizqi. (2025). Kerangka Kerja Penambangan Data yang Skalabel untuk Analisis Hukum Komputasional: Implementasi Pipeline Python dan Selenium pada Putusan Perkara Perdata Mahkamah Agung Indonesia. Jurnal Publikasi Teknik Informatika, 4(3), 260–272. https://doi.org/10.55606/jupti.v4i3.5634