File size: 88,436 Bytes
8b017a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "94bbe043",
   "metadata": {},
   "source": [
    "# Document RAG System."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "367a3c60",
   "metadata": {},
   "outputs": [],
   "source": [
    "# install needed libraries to use\n",
    "!pip -q install langchain langchain-google-genai langchain-community google-genai faiss-cpu tiktoken python-dotenv pypdf langchain-huggingface sentence-transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "276de997",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Google Colab environment setup\"\"\"\n",
    "# import os\n",
    "\n",
    "# # set environment variables for google and huggingface\n",
    "\n",
    "# os.environ['GOOGLE_API_KEY'] = userdata.get(\"GOOGLE_API_KEY\")\n",
    "# os.environ['HUGGINGFACEHUB_ACCESS_TOKEN'] = userdata.get(\"HUGGINGFACEHUB_ACCESS_TOKEN\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2651b55f",
   "metadata": {},
   "source": [
    "### Load keys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7673d4e2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OpenAI key loaded: True\n",
      "\n",
      "Gemini key loaded: True\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()\n",
    "\n",
    "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
    "gemini_api_key = os.getenv(\"GEMINI_API_KEY\")\n",
    "\n",
    "print(\"OpenAI key loaded:\", bool(openai_api_key))\n",
    "# print(\"OpenAI key:\", openai_api_key)\n",
    "\n",
    "print(\"\\nGemini key loaded:\", bool(gemini_api_key))\n",
    "# print(\"Gemini key:\", gemini_api_key)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d1aa3528",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.12/dist-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "# import necessary libraries\n",
    "\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain_google_genai import GoogleGenerativeAIEmbeddings,ChatGoogleGenerativeAI,GoogleGenerativeAI\n",
    "from langchain_community.vectorstores import FAISS\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_community.document_loaders import PyPDFLoader\n",
    "from langchain_huggingface import HuggingFaceEmbeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72887907",
   "metadata": {},
   "source": [
    "### Test key with prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "181c0959",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E0000 00:00:1759189263.070101  109561 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "content='Hello! How can I help you today?' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []} id='run--0c2a5214-4f37-43b5-b432-836a3abe7058-0' usage_metadata={'input_tokens': 2, 'output_tokens': 46, 'total_tokens': 48, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 37}}\n"
     ]
    }
   ],
   "source": [
    "from langchain_google_genai import ChatGoogleGenerativeAI\n",
    "\n",
    "LLM = ChatGoogleGenerativeAI(\n",
    "    model=\"gemini-2.5-flash\",\n",
    "    google_api_key=gemini_api_key\n",
    ")\n",
    "\n",
    "response = LLM.invoke(\"Hello\")\n",
    "print(response)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "83bf1f6f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello! How can I help you today?\n"
     ]
    }
   ],
   "source": [
    "print(response.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d42c604",
   "metadata": {},
   "source": [
    "## Load document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a39b0bfb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load and read PDF file\n",
    "\n",
    "load_document = PyPDFLoader(\"dataset/ChenZhang_cropmapping_ReviewPaper.pdf\")\n",
    "document = load_document.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "44f61c67",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "29"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(document)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "46d97e70",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Review\n",
      "Remote sensing for crop mapping: A perspective on current and future \n",
      "crop-specific land cover data products\n",
      "Chen Zhang\n",
      "a , *\n",
      ", Hannah Kerner\n",
      "b\n",
      ", Sherrie Wang\n",
      "c\n",
      ", Pengyu Hao\n",
      "d\n",
      ", Zhe Li\n",
      "e\n",
      ", Kevin A. Hunt\n",
      "e\n",
      ",  \n",
      "Jonathon Abernethy\n",
      "e\n",
      ", Haoteng Zhao\n",
      "f\n",
      ", Feng Gao\n",
      "f\n",
      ", Liping Di\n",
      "a , *\n",
      ", Claire Guo\n",
      "a , g\n",
      ", Ziao Liu\n",
      "a\n",
      ",  \n",
      "Zhengwei Yang\n",
      "e\n",
      ", Rick Mueller\n",
      "e\n",
      ", Claire Boryan\n",
      "e\n",
      ", Qi Chen\n",
      "h\n",
      ", Peter C. Beeson\n",
      "i\n",
      ", Hankui K. Zhang\n",
      "j\n",
      ",  \n",
      "Yu Shen\n",
      "j , k\n",
      "a\n",
      "Center for Spatial Information Science and Systems, George Mason University, Fairfax, VA 22030, USA\n",
      "b\n",
      "School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA\n",
      "c\n",
      "Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA\n",
      "d\n",
      "Food and Agriculture Organization of the United Nations, Viale delle Terme di Caracalla, 00153 Rome, Italy\n",
      "e\n",
      "U.S. Department of Agriculture, National Agricultural Statistics Service, Washington, DC 20250, USA\n",
      "f\n",
      "U.S. Department of Agriculture, Agricultural Research Service, Hydrology and Remote Sensing Laboratory, Beltsville, MD 20705, USA\n",
      "g\n",
      "Thomas Jefferson High School for Science and Technology, Alexandria, VA 22312, USA\n",
      "h\n",
      "Department of Geography & Environment, University of Hawai ’ i at M Β―anoa, Honolulu, HI 96822, USA\n",
      "i\n",
      "U.S. Department of Agriculture, Economic Research Service, Washington, DC 20250, USA\n",
      "j\n",
      "Geospatial Sciences Center of Excellence, Department of Geography and Geospatial Sciences, South Dakota State University, Brookings, SD 57007, USA\n",
      "k\n",
      "Nicholas School of the Environment, Duke University, Durham, NC 27708, USA\n",
      "ARTICLE INFO\n",
      "Edited by Dr. Marie Weiss\n",
      "Keywords:\n",
      "Crop mapping\n",
      "Land use land cover\n",
      "Geospatial data product\n",
      "Systematic literature review\n",
      "Cropland data layer\n",
      "ABSTRACT\n",
      "Crop mapping is an indispensable application in agricultural and environmental remote sensing. Over the last \n",
      "few decades, the exponential growth of open Earth Observation (EO) data has significantly enhanced crop \n",
      "mapping and enabled the production of detailed crop-specific land cover data at national and regional scales. \n",
      "These data have served multiple purposes across a wide range of applications and research initiatives. However, \n",
      "there is currently no comprehensive summary of the crop mapping data products, nor is there a detailed dis -\n",
      "cussion of their uses in remote sensing studies. This paper provides the first in-depth review of remote sensing for \n",
      "crop mapping from the perspective of crop-specific land cover data by evaluating over 60 open-access opera -\n",
      "tional products, archival crop type map datasets, single-crop extent map datasets, cropping pattern datasets, and \n",
      "crop mapping platforms and systems. Using the Cropland Data Layer (CDL) – one of the most widely used \n",
      "products with over 25 years of continuous monitoring of U.S. croplands – as a case study, we also conduct a \n",
      "systematic literature review on the application of crop type maps in remote sensing science. Our analysis syn -\n",
      "thesizes 129 research articles through three core research questions: (1) What EO data are used with CDL; (2) \n",
      "What scientific problems and technologies are explored using CDL; and (3) What role does CDL play in remote \n",
      "sensing applications. Furthermore, we delve into the implications of our vision for new data products and \n",
      "propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products \n",
      "to improving global mapping reliability and developing operational in-season crop mapping systems. This review \n",
      "paper not only serves as a reference for stakeholders seeking to utilize crop-specific land cover data in their work, \n",
      "but also outlines the directions for future geospatial data product development.\n",
      "* Corresponding authors.\n",
      "E-mail addresses: czhang11@gmu.edu (C. Zhang), hkerner@asu.edu (H. Kerner), sherwang@mit.edu (S. Wang), pengyu.hao@fao.org (P. Hao), zhe.li@usda.gov\n",
      "(Z. Li), kevin.a.hunt@usda.gov (K.A. Hunt), jake.abernethy@usda.gov (J. Abernethy), haoteng.zhao@usda.gov (H. Zhao), feng.gao@usda.gov (F. Gao), ldi@gmu. \n",
      "edu (L. Di), zliu23@gmu.edu (Z. Liu), zhengwei.yang@usda.gov (Z. Yang), rick.mueller@usda.gov (R. Mueller), claire.boryan@usda.gov (C. Boryan), qichen@ \n",
      "hawaii.edu (Q. Chen), peter.beeson@usda.gov (P.C. Beeson), hankui.zhang@sdstate.edu (H.K. Zhang), yu.shen@duke.edu (Y. Shen). \n",
      "Contents lists available at ScienceDirect\n",
      "Remote Sensing of Environment\n",
      "journal homepage: www.else vier.com/loc ate/rse\n",
      "https://doi.org/10.1016/j.rse.2025.114995\n",
      "Received 12 December 2024; Received in revised form 11 August 2025; Accepted 22 August 2025  \n",
      "Remote Sensing of Environment 330 (2025) 114995 \n",
      "0034-4257/Β© 2025 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by- \n",
      "nc-nd/4.0/ ).\n"
     ]
    }
   ],
   "source": [
    "# First page of PDF\n",
    "print(document[0].page_content) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "fa96dcd3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the WoS database, including the publication title, abstract, or keywords. \n",
      "In our survey, we found that many papers introduced, discussed, or cited \n",
      "CDL, but did not directly use the data in their experiments. Therefore, \n",
      "IC1 could ensure that CDL has been applied in the selected publications, \n",
      "rather than simply mentioning it in passing.\n",
      "To narrow down the publications to those specifically related to \n",
      "remote sensing, IC2 states that the publication ’ s β€œ Category ” field in the \n",
      "WoS database must be labeled as β€œ remote sensing ” . However, many \n",
      "publications related to remote sensing were published in computer sci -\n",
      "ence, agricultural, or multidisciplinary journals, which were not cate -\n",
      "gorized as β€œ remote sensing ” . To include these publications in this \n",
      "review, we added a rule that requires the presence of certain terms, such \n",
      "as β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \n",
      "β€œ MODIS ” in any of the title, keywords, or abstract of the publication.\n",
      "To ensure the selected publications reflected the up-to-date research \n",
      "trends and avoided duplicate research items, IC3 limits the document \n",
      "type to only peer-reviewed articles that were published in journals \n",
      "indexed by the WoS Core Collection. Focusing on these high-impact \n",
      "journal articles guarantees that our review reflects the most represen -\n",
      "tative studies within the remote sensing field.\n",
      "The query string of inclusion criteria in the WoS data database is: \n",
      "ALL = ( β€œ Cropland Data Layer ” OR β€œ CDL ” ) AND (WC = β€œ Remote Sensing ” \n",
      "OR ALL = ( β€œ Remote Sensing ” OR β€œ Earth observation ” OR β€œ Landsat ” OR \n",
      "β€œ Sentinel ” OR β€œ MODIS ” )) AND DT = β€œ Article ” , where ALL represents all \n",
      "fields (title, abstract, keywords), WC represents WoS categories, and DT \n",
      "represents document type. After the initial screening process, we \n",
      "manually applied the three exclusion criteria to exclude publications \n",
      "where the full term β€œ CDL ” was not related to β€œ Cropland Data Layer ” , \n",
      "studies that did not use remote sensing data, and any review articles. \n",
      "These exclusion criteria were essential for ensuring the reliability of our \n",
      "selection results and for eliminating any irrelevant literature. The \n",
      "literature selection process from the CDL citations on the USDA NASS \n",
      "website adheres to the same inclusion and exclusion criteria. The \n",
      "eligible documents were combined with the screening results of WoS \n",
      "database, and any duplicate records were removed.\n",
      "3.3. Results\n",
      "The result of the literature screening process is illustrated in Fig. 4 . \n",
      "Applying the inclusion criteria, we screened 162 and 43 articles from the \n",
      "WoS database and the USDA NASS CDL website, respectively. We then \n",
      "excluded 48 and 8 non-qualified articles from the two sources. After \n",
      "removing the 20 duplicated records, we identified 129 qualified articles \n",
      "for use in this systematic literature review. The full literature list and \n",
      "surveyed features per selected publication are summarized in Table A1 .\n",
      "Table 8 summarizes the publication distribution of 129 qualified \n",
      "articles across over 40 scientific journals. It should be noted that these \n",
      "screening results only encompass representative articles related to the \n",
      "CDL in remote sensing science. As documents are searched based on \n",
      "Table 7 \n",
      "Document search criteria.\n",
      "ID Description\n",
      "Inclusion Criteria 1 (IC1) β€œ Cropland Data Layer ” OR β€œ CDL ” contained in any fields\n",
      "Inclusion Criteria 2 (IC2) Category in β€œ Remote Sensing ” or in other categories but contain β€œ Remote Sensing ” or β€œ Earth observation ” or β€œ Landsat ” or β€œ Sentinel ” or β€œ MODIS ” in any \n",
      "fields\n",
      "Inclusion Criteria 3 (IC3) Publication is a journal article\n",
      "Exclusion Criteria 1 \n",
      "(EC1)\n",
      "The full term of β€œ CDL ” is not related to β€œ Cropland Data Layer ”\n",
      "Exclusion Criteria 2 \n",
      "(EC2)\n",
      "No remote sensing data is used in the study\n",
      "Exclusion Criteria 3 \n",
      "(EC3)\n",
      "Publication is a review paper\n",
      "Fig. 4. Literature screening process.\n",
      "Table 8 \n",
      "Publication distribution of the qualified CDL-related remote sensing studies by \n",
      "journals.\n",
      "Journal Record \n",
      "Count\n",
      "Remote Sensing of Environment 27\n",
      "Remote Sensing 26\n",
      "International Journal of Applied Earth Observation and \n",
      "Geoinformation\n",
      "9\n",
      "ISPRS Journal of Photogrammetry and Remote Sensing 9\n",
      "Photogrammetric Engineering and Remote Sensing 4\n",
      "Agronomy Journal 3\n",
      "Computers and Electronics in Agriculture 3\n",
      "Remote Sensing Letters 3\n",
      "Agricultural Systems 2\n",
      "Agricultural Water Management 2\n",
      "Canadian Journal of Remote Sensing 2\n",
      "Earth System Science Data 2\n",
      "European Journal of Remote Sensing 2\n",
      "IEEE Journal of Selected Topics in Applied Earth Observations and \n",
      "Remote Sensing\n",
      "2\n",
      "International Journal of Remote Sensing 2\n",
      "Science of Remote Sensing 2\n",
      "Sensors 2\n",
      "Others (only one paper) 27\n",
      "Total 129\n",
      "C. Zhang et al.                                                                                                                                                                                                                                   Remote Sensing of Environment 330 (2025) 114995 \n",
      "10\n"
     ]
    }
   ],
   "source": [
    "# 10th page of PDF\n",
    "print(document[9].page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "31c4df34",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "et al., 2013 ). CDL data also have been used to delineate and stratify \n",
      "regions, such as U.S. soybean growing areas ( Song et al., 2017 ), which \n",
      "helps in understanding field size patterns for more effective agricultural \n",
      "resource management.\n",
      "Training samples: Beyond a crop type map, CDL is widely utilized \n",
      "as an authoritative geospatial benchmark to support field-level crop \n",
      "spectral signature training. The ML models trained with high-confidence \n",
      "pixels in CDL and associated products (e.g., CSB, Confidence Layer) can \n",
      "be applied to extend land cover classification while adjusting for factors \n",
      "such as hemisphere seasonality and evolving farming trends, which is \n",
      "invaluable for global crop monitoring. As discussed in RQ2, ML and DL \n",
      "are the main technologies in remote sensing studies, which rely on high- \n",
      "quality training data. Due to the extensive crop-specific land cover in -\n",
      "formation, CDL has been extensively used to label training samples in EO \n",
      "data. This enables the further supervised-learning-based training pro -\n",
      "cess for semantic segmentation models ( Du et al., 2022a ), ML models \n",
      "( Momm et al., 2020 ), DL models ( Cai et al., 2018 ; Xu et al., 2020 ), and \n",
      "transfer learning models ( Hao et al., 2020 ; Wei et al., 2022 ). Instead of \n",
      "directly using CDL as training samples, some works further optimized \n",
      "the training sample selection process by modeling crop rotation patterns \n",
      "in the historical CDL ( Zhang et al., 2022a ). Zhang et al. (2021) and Lin \n",
      "et al. (2022b) used DNNs to automatically recognize training samples \n",
      "from CDL time series to label Landsat and Sentinel-2 data for early and \n",
      "in-season crop mapping.\n",
      "Benchmark data: CDL is often adopted as benchmark data or \n",
      "reference data to validate new crop mapping methodologies and algo -\n",
      "rithms. The traditional ground-truthing process is usually labor- \n",
      "intensive, particularly when surveying extensive geographic areas. By \n",
      "comparing results against the CDL, researchers can efficiently assess \n",
      "model performance, detect areas for improvement, and refine their \n",
      "strategies to achieve optimal outcomes. However, despite its widespread \n",
      "use, it should be noted that CDL only represents a high-quality classifi -\n",
      "cation map rather than ground truth. Several studies have examined the \n",
      "uncertainty and potential biases associated with using CDL as bench -\n",
      "mark data for result validation. For example, Lark et al. (2021) found the \n",
      "average accuracy for all crop classes has improved from 87 % in 2008 to \n",
      "92 % in 2016. Kerner et al. (2022) showed 2019–2020 CDL had 89 % \n",
      "accuracy evaluated with independent ground truth data within the \n",
      "central US Corn Belt.\n",
      "Other uses: CDL and its derivative data products have been applied \n",
      "in addressing broader applications and scientific problems. Boryan et al. \n",
      "(2014) developed a stratification method for agricultural area sampling \n",
      "frame construction based on CDL. Gao et al. (2014) used CDL to assist in \n",
      "the creation of Bidirectional Reflectance Distribution Function (BRDF) \n",
      "look-up maps. Harmonic analysis techniques, such as linear and non- \n",
      "linear harmonic models, have been employed with CDL to model peri -\n",
      "odic patterns in time series data ( Roy and Yan, 2020 ; Wang et al., \n",
      "2020a ). Shao et al. (2016a) evaluated different time-series smoothing \n",
      "algorithms. Duveiller et al. (2015) developed a signal-to-noise ratio \n",
      "method to identify spatially homogeneous vegetation cover. CDL has \n",
      "also been utilized in GIS education ( Han et al., 2014 ) and as compared \n",
      "dataset for particular purposes ( Wickham et al., 2014 ; Kokkinidis et al., \n",
      "2017 ; Shi et al., 2018 ; Kraatz et al., 2023 ; Wang and Mountrakis, 2023 ).\n",
      "4. Visions for future data products\n",
      "As science and technology in remote sensing advances, the demand \n",
      "for enhanced crop-specific land cover data products becomes increas -\n",
      "ingly evident. This section explores vision and progress in improving \n",
      "spatiotemporal coverage and resolution of the current data products \n",
      "(Section 4.1), achieving reliable global mapping through robust training \n",
      "datasets and cropland extent data (Section 4.2 and 4.3), incorporating \n",
      "more crop-specific information (Section 4.4 and 4.5), and the develop -\n",
      "ment of operational in-season crop mapping systems (Section 4.6).\n",
      "4.1. Progress on enhanced coverage and resolution of current product\n",
      "Enhanced spatial coverage and resolution significantly benefit crop \n",
      "mapping, area estimation, and field size quantification by enabling more \n",
      "accurate identification of land cover features. Advancement in geo -\n",
      "spatial cloud computing platforms (e.g., GEE) and increasing availabil -\n",
      "ity of higher spatiotemporal resolution open EO data (e.g., Sentinel-1, \n",
      "Sentinel-2, HLS) have improved the efficiency and accuracy for pro -\n",
      "ducing regional and national crop type map data with resolution of 10-m \n",
      "or even higher ( Tran et al., 2022 ; Li et al., 2025 ). Such detailed field- \n",
      "level crop cover information will not only facilitate a more precise \n",
      "distinction between different types of vegetation and crops, but also \n",
      "provide opportunities for improved agricultural monitoring, better \n",
      "resource management, and informed decision-making to support sus -\n",
      "tainable agriculture and food security.\n",
      "As highlighted in the Section 3, the 30-m CDL has traditionally been \n",
      "essential for scientific problem solving with various EO data. However, \n",
      "the increasing availability of higher-resolution EO data from both open- \n",
      "access and commercial satellites requires more detailed crop mapping \n",
      "products. To meet this evolving need, the USDA NASS has been \n",
      "enhancing data accuracy and usability by implementing a 10-m reso -\n",
      "lution CDL. These improvements are vital, particularly given the \n",
      "increasing vulnerability of agriculture to natural disasters and extreme \n",
      "weather events. By utilizing the RF algorithm, enhanced stratified \n",
      "random sampling approaches, and localized image processing, the 10-m \n",
      "CDL provides a more accurate representation of diverse crop types for \n",
      "CONUS, particularly in regions with unique or specialty crops. This \n",
      "methodology reduces labor and workload while improving classification \n",
      "accuracy and spatial clarity for small-area and specialty crops compared \n",
      "to 30-m CDL. Fig. 8 shows the improvement achieved with the new 10-m \n",
      "CDL compared to the current 30-m CDL on croplands with complex \n",
      "landscapes.\n",
      "Currently, the CDL is available only for the CONUS. However, efforts \n",
      "are underway to extend coverage to other regions, such as Hawaii and U. \n",
      "S. territories like Puerto Rico and the U.S. Virgin Islands. Enhanced CDLs \n",
      "for these areas include the 2022 Beta version and the official 2024 \n",
      "release of the 10-m resolution CDL for CONUS ( Li et al., 2024b ), and the \n",
      "inaugural Hawaii Cropland Data Layer (HCDL) 2023 and 2024 ( Li et al., \n",
      "2024a ). These products leverage gap-filled 10-day image composites \n",
      "from Sentinel and Landsat sensors, processed through GEE. In devel -\n",
      "oping the HCDL, assorted ML and DL algorithms were evaluated, \n",
      "including RF, U-Net, ResNet50, VGG19, and DeepLabV3. The RF algo -\n",
      "rithm achieved the best results for mapping major and specialty crops in \n",
      "Hawaii. Fig. 9 illustrates the 10-m resolution HCDL 2023 V1.0 Beta, \n",
      "which utilizes a RF algorithm with 100 trees for mapping crops, \n",
      "including coffee, pineapple, macadamia nuts, commercial forest, citrus, \n",
      "papaya, and tropical fruits. The official release of HCDL 2023 and 2024 \n",
      "is anticipated in summer 2025. Future efforts will focus on creating a 10- \n",
      "m resolution annual CDL for CONUS and potentially extending to Puerto \n",
      "Rico and the U.S. Virgin Islands.\n",
      "4.2. Developing training dataset in data-sparse regions\n",
      "Lack of training data is a major barrier for developing crop type maps \n",
      "like CDL in regions outside of the United States or other countries that \n",
      "have instituted operational mapping programs (e.g., programs in \n",
      "Table 1 ). Researchers aim to overcome this barrier in two main ways: (1) \n",
      "developing more globally representative training datasets, and (2) \n",
      "developing algorithms that learn more efficiently from small amounts of \n",
      "training data.\n",
      "Globally representative training datasets: Globally representative \n",
      "reference data is essential for training modern data-hungry DL models \n",
      "and has been identified as a key priority in advancing AI applications in \n",
      "remote sensing ( Zhang et al., 2025a ). Collecting crop type data for \n",
      "training ML classifiers for crop mapping is challenging because col -\n",
      "lecting high-quality data typically requires ground-truthing ( Nakalembe \n",
      "C. Zhang et al.                                                                                                                                                                                                                                   Remote Sensing of Environment 330 (2025) 114995 \n",
      "14\n"
     ]
    }
   ],
   "source": [
    "# 14th page of PDF\n",
    "print(document[13].page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "0b15e82e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "and Kerner, 2023 ). Ground-truthing involves physically visiting agri -\n",
      "cultural fields and recording the type of crop growing in the field. This \n",
      "process is prohibitively expensive and logistically challenging for many \n",
      "organizations and regions.\n",
      "Currently available public reference samples are largely regional in \n",
      "scope ( Dufourg et al., 2023 ; Kondmann et al., 2021 ). Recent work has \n",
      "proposed novel methods of collecting ground-truth crop labels that \n",
      "reduce the cost of data collection. Paliyam et al. (2021) proposed a \n",
      "method called Street2Sat that uses computer vision (CV) techniques to \n",
      "transform roadside images of fields collected with car- and motorcycle \n",
      "helmet-mounted cameras into geo-referenced crop type labels of those \n",
      "fields. d’Andrimont et al. (2022) used CV techniques to extract crop type \n",
      "and phenology information from street-level images of fields taken with \n",
      "car-mounted cameras in the Netherlands. Yan and Ryu (2021) and Soler \n",
      "et al. (2024) used DL models to automatically create crop type labels \n",
      "from Google Street View images in California and Thailand, \n",
      "respectively.\n",
      "Other work leveraged crowd-sourced data from online and mobile \n",
      "platforms to collect ground-truth crop data. Wang et al. (2020b) used \n",
      "crop type data crowd-sourced from the Plantix mobile app (used to help \n",
      "farmers diagnose crop disease) for crop mapping in India. Fraisl et al. \n",
      "(2022) demonstrated the use of the mobile app Picture Pile to engage \n",
      "citizen scientists to annotate crop type labels in crowdsourced street- \n",
      "level images from Mapillary, which could later be converted to geo- \n",
      "referenced crop type labels for training crop mapping models. The \n",
      "CropObserve app facilitated the process on crop-specific ground truth -\n",
      "ing (e.g., crop types, phenological stage, visible damage, management \n",
      "practices) anywhere in the world ( IIASA, 2023 ).\n",
      "In parallel with data collection efforts, increasing attention is being \n",
      "paid to making crop type reference data more Findable, Accessible, \n",
      "Interoperable, and Reusable (FAIR). Major research initiatives (e.g., \n",
      "CropHarvest, WorldCereal, and EuroCrops) are actively working on \n",
      "harmonizing, standardizing, and openly publishing training datasets to \n",
      "enhance the FAIRness of crop reference data within the remote sensing \n",
      "and agricultural monitoring communities.\n",
      "Algorithms that learn more efficiently from small amounts of \n",
      "training data: To reduce the need for large labeled datasets to train \n",
      "effective crop mapping models, researchers have proposed methods for \n",
      "learning from a small amount of training data for a given location. Many \n",
      "of these methods involve learning from labeled data in locations other \n",
      "than the target region to supplement training. The WorldCereal project \n",
      "trained a CatBoost classifier with expert-designed features extracted \n",
      "from multiple satellite datasets using a reference database of globally \n",
      "distributed crop type labels ( Van Tricht et al., 2023 ). Other work has \n",
      "leveraged transfer learning, in which models are first β€œpre-trained” on a \n",
      "large labeled dataset for one task (e.g., crop mapping in region A) and \n",
      "then further trained (β€œfine-tuned”) on a smaller dataset for the target \n",
      "task (e.g., crop mapping in region B). Meta-learning algorithms are also \n",
      "used to learn efficiently from a small number of crop type examples in a \n",
      "new target region by learning from many globally-distributed crop type \n",
      "classification tasks in the CropHarvest dataset ( Tseng et al., 2021a, \n",
      "2022 ).\n",
      "Researchers have developed methods for learning generic features \n",
      "that are useful in diverse tasks (e.g., crop mapping, land cover mapping, \n",
      "tree species classification) from a large amount of unlabeled satellite EO \n",
      "data in a process called self-supervised learning. Similar to transfer \n",
      "learning discussed previously, after a model is pre-trained using self- \n",
      "supervised learning, it can be fine-tuned for a specific crop mapping \n",
      "task. For example, Tseng et al. (2024) proposed a self-supervised model \n",
      "called Presto (which stands for Pre-trained remote sensing transformer) \n",
      "that learns from unlabeled EO data from multiple satellite platforms and \n",
      "derived products. They showed that fine-tuning Presto on the Kenya \n",
      "maize classification task and Brazil coffee classification task in Cro -\n",
      "pHarvest achieved state-of-the-art performance. Both tasks required \n",
      "learning from small training data sizes of 1345 in Kenya and 203 in \n",
      "Brazil. In Phase II of the ESA WorldCereal project (2024-2026) ( ESA, \n",
      "2024 ), Presto was adopted for feature extraction for crop type mapping \n",
      "in place of the expert-designed features used to train a CatBoost classifier \n",
      "in Van Tricht et al. (2023) . With Presto’s robust algorithm for improving \n",
      "spatiotemporal transferability, this integration is key to WorldCereal’s \n",
      "aim of establishing a generic and customizable global crop mapping \n",
      "system.\n",
      "In recent years, foundation models have emerged to address the \n",
      "scarcity of labeled training data in remote sensing applications ( Jakubik \n",
      "et al., 2023 ; Xiao et al., 2025 ). For example, Google recently introduced \n",
      "AlphaEarth Foundations (AEF) for global mapping from sparse label \n",
      "data ( Brown et al., 2025 ). As a geospatial foundation model, AEF in -\n",
      "tegrates multi-source, multi-modal EO and geoinformation data into a \n",
      "time-continuous embedding space, and the resulting global dataset of \n",
      "analysis-ready embedding field layers could enable a wide range of \n",
      "mapping tasks. Such foundation models and analysis-ready data offer a \n",
      "promising solution for efficient production of cropland and crop type \n",
      "maps at a global scale.\n",
      "4.3. Improving consistency of cropland extent mask for global crop \n",
      "mapping\n",
      "From the perspective of global crop mapping, a reliable and consis -\n",
      "tent cropland extent map serves as the fundamental land cover category \n",
      "in the crop-specific land cover data production, which are crucial for the \n",
      "subsequent crop type classification process especially over the data- \n",
      "sparse regions. Various cropland extent mask data derived from EO \n",
      "data have been widely developed and validated over the past years. \n",
      "However, selecting the most appropriate cropland extent mask and \n",
      "conducting local validation of these data tailored to the specific re -\n",
      "quirements of the study remains challenging due to inconsistency and \n",
      "variability in their reported accuracies and cropland definitions.\n",
      "To improve consistency and transparency of cropland extent, \n",
      "Table 9 \n",
      "FAO land use categories for cropland.\n",
      "Land Use Category Definition\n",
      "Cropland Land used for cultivation of crops. The total of areas under Arable land and Permanent crops.\n",
      "Arable land Land used for cultivation of crops in rotation with fallow, meadows and pastures within cycles of up to 5 years. The total of areas under Temporary \n",
      "crops, temporary meadows and pastures, and temporary fallow. Arable land does not include land that is potentially cultivable but is not cultivated.\n",
      "Temporary crops Land used for crops with a less than 1-year growing cycle, which must be newly sown or planted for further production after the harvest. Some crops \n",
      "remaining in the field for more than 1 year may also be considered as temporary crops (e.g., asparagus, strawberries, pineapples, bananas, and sugar \n",
      "cane). Multiple-cropped areas are counted only once.\n",
      "Temporary fallow Land that is not seeded for one or more growing seasons. The maximum idle period is usually less than 5 years. This land may be in the form sown for \n",
      "the exclusive production of green manure. Land remaining fallow for too long may acquire characteristics requiring it to be reclassified as, for \n",
      "instance, permanent meadows and pastures if used for grazing or haying.\n",
      "Temporary meadows and \n",
      "pastures\n",
      "Land temporarily cultivated with herbaceous forage crops for mowing or pasture, as part of crop rotation periods of less than 5 years.\n",
      "Permanent crops Land cultivated with long-term crops which do not have to be replanted for several years (e.g., cocoa and coffee), land under trees and shrubs \n",
      "producing flowers (e.g., roses and jasmine), and nurseries (except those for forest trees, which should be classified under β€œforestry”). Permanent \n",
      "meadows and pastures are excluded from permanent crops.\n",
      "C. Zhang et al.                                                                                                                                                                                                                                   Remote Sensing of Environment 330 (2025) 114995 \n",
      "16\n"
     ]
    }
   ],
   "source": [
    "# 16th page of PDF\n",
    "print(document[15].page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ec8c146",
   "metadata": {},
   "source": [
    "## Split texts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "1f22bdbf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# split into chunks\n",
    "\n",
    "doc_split= RecursiveCharacterTextSplitter(\n",
    "    chunk_size=1000,\n",
    "    chunk_overlap=200,\n",
    ")\n",
    "chunks = doc_split.split_documents(document)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "94a200e7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "268"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(chunks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b519f56f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "crop mapping from the perspective of crop-specific land cover data by evaluating over 60 open-access opera -\n",
      "tional products, archival crop type map datasets, single-crop extent map datasets, cropping pattern datasets, and \n",
      "crop mapping platforms and systems. Using the Cropland Data Layer (CDL) – one of the most widely used \n",
      "products with over 25 years of continuous monitoring of U.S. croplands – as a case study, we also conduct a \n",
      "systematic literature review on the application of crop type maps in remote sensing science. Our analysis syn -\n",
      "thesizes 129 research articles through three core research questions: (1) What EO data are used with CDL; (2) \n",
      "What scientific problems and technologies are explored using CDL; and (3) What role does CDL play in remote \n",
      "sensing applications. Furthermore, we delve into the implications of our vision for new data products and \n",
      "propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products\n"
     ]
    }
   ],
   "source": [
    "# display 4th chunk\n",
    "print(chunks[3].page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "a62f7966",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products \n",
      "to improving global mapping reliability and developing operational in-season crop mapping systems. This review \n",
      "paper not only serves as a reference for stakeholders seeking to utilize crop-specific land cover data in their work, \n",
      "but also outlines the directions for future geospatial data product development.\n",
      "* Corresponding authors.\n",
      "E-mail addresses: czhang11@gmu.edu (C. Zhang), hkerner@asu.edu (H. Kerner), sherwang@mit.edu (S. Wang), pengyu.hao@fao.org (P. Hao), zhe.li@usda.gov\n",
      "(Z. Li), kevin.a.hunt@usda.gov (K.A. Hunt), jake.abernethy@usda.gov (J. Abernethy), haoteng.zhao@usda.gov (H. Zhao), feng.gao@usda.gov (F. Gao), ldi@gmu. \n",
      "edu (L. Di), zliu23@gmu.edu (Z. Liu), zhengwei.yang@usda.gov (Z. Yang), rick.mueller@usda.gov (R. Mueller), claire.boryan@usda.gov (C. Boryan), qichen@\n"
     ]
    }
   ],
   "source": [
    "# display 5th chunk\n",
    "print(chunks[4].page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d13d1af5",
   "metadata": {},
   "source": [
    "### Vector Store Creation.\n",
    "\n",
    "Generate document embeddings and build a FAISS vector store for efficient similarity-based retrieval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "fd8ae71d",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/wills/.local/lib/python3.12/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).\n",
      "  from pandas.core import (\n",
      "2025-09-30 00:19:35.389072: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.\n",
      "2025-09-30 00:19:37.212948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "/home/wills/.local/lib/python3.12/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.\n",
      "  warnings.warn(\"Unable to import Axes3D. This may be due to multiple versions of \"\n"
     ]
    }
   ],
   "source": [
    "embeds = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n",
    "vector_store = FAISS.from_documents(chunks, embeds)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb690b06",
   "metadata": {},
   "source": [
    "### Retrieval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "ae03425b",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vector_store.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 5})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "c691f892",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x76fcba598e00>, search_kwargs={'k': 5})"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "d4345de2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(id='50bfaa32-168f-4e5e-94b6-27d2668d4ef5', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='fields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we \\nmanually applied the three exclusion criteria to exclude publications \\nwhere the full term β€œ CDL ” was not related to β€œ Cropland Data Layer ” , \\nstudies that did not use remote sensing data, and any review articles. \\nThese exclusion criteria were essential for ensuring the reliability of our \\nselection results and for eliminating any irrelevant literature. The \\nliterature selection process from the CDL citations on the USDA NASS \\nwebsite adheres to the same inclusion and exclusion criteria. The \\neligible documents were combined with the screening results of WoS \\ndatabase, and any duplicate records were removed.\\n3.3. Results\\nThe result of the literature screening process is illustrated in Fig. 4 . \\nApplying the inclusion criteria, we screened 162 and 43 articles from the \\nWoS database and the USDA NASS CDL website, respectively. We then'),\n",
       " Document(id='864cf8ce-bfa8-426b-9bfe-390c8679f13a', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='as β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β€œ Cropland Data Layer ” OR β€œ CDL ” ) AND (WC = β€œ Remote Sensing ” \\nOR ALL = ( β€œ Remote Sensing ” OR β€œ Earth observation ” OR β€œ Landsat ” OR \\nβ€œ Sentinel ” OR β€œ MODIS ” )) AND DT = β€œ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we'),\n",
       " Document(id='b700e905-8222-4ec0-8131-bee3ac9f51ca', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s β€œ Category ” field in the \\nWoS database must be labeled as β€œ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β€œ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.'),\n",
       " Document(id='d6273845-9651-4f71-a5b0-3d3b72037a37', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 7, 'page_label': '8'}, page_content='preliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description'),\n",
       " Document(id='b58a38d1-950a-4e9e-b7f7-d1985985c0dd', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 8, 'page_label': '9'}, page_content='Fig. 3. The number of publications indexed by Scopus and Google Scholar (data accessed by January, 2024). The publications are filtered based on combined \\nkeywords β€œ Cropland Data Layer ” AND β€œ Remote Sensing ” and the single keyword β€œ Cropland Data Layer ” .\\nTable 6 \\nResearch questions.\\nID Research Question Objective Description\\nRQ1 What EO data are used with CDL? Identify common and suitable EO data in conjunction with crop type maps in remote sensing field\\nRQ2 What scientific problems and technologies are explored \\nusing CDL?\\nUnderstand the state of the science and main technologies in remote sensing that are applied with crop type \\nmaps\\nRQ3 What role does CDL play in remote sensing applications? Help researchers to recognize the significance of crop type maps and consider how to incorporate into these \\ndata their own work')]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# test retriever \n",
    "retriever.invoke(\"what is the main topic of the document?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45f8260f",
   "metadata": {},
   "source": [
    "## Augmentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb4d1435",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E0000 00:00:1759188738.019868  109561 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.\n"
     ]
    }
   ],
   "source": [
    "LLM_gen = GoogleGenerativeAI(model=\"models/gemini-1.5-flash\", google_api_key=gemini_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b6ebacf",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = PromptTemplate(\n",
    "    template = \"\"\"\n",
    "    You are a helpful assistant.\n",
    "    Answer ONLY from the provided transcript context.\n",
    "    If the context IS INSUFFICIENT, say you don't know.\n",
    "\n",
    "    {context}\n",
    "\n",
    "    Question: {question}\n",
    "    \"\"\",\n",
    "    input_variables=[\"context\",\"question\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "17adfd08",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"Is the aspect of stars mentioned in this document provided? If yes, explain what was discussed?\"\n",
    "retrieved_documents = retriever.invoke(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "0c21e23b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(id='b700e905-8222-4ec0-8131-bee3ac9f51ca', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s β€œ Category ” field in the \\nWoS database must be labeled as β€œ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β€œ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.'),\n",
       " Document(id='864cf8ce-bfa8-426b-9bfe-390c8679f13a', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='as β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β€œ Cropland Data Layer ” OR β€œ CDL ” ) AND (WC = β€œ Remote Sensing ” \\nOR ALL = ( β€œ Remote Sensing ” OR β€œ Earth observation ” OR β€œ Landsat ” OR \\nβ€œ Sentinel ” OR β€œ MODIS ” )) AND DT = β€œ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we'),\n",
       " Document(id='cf0e5d6a-656a-4073-b1ee-1e7fce2b5952', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 17, 'page_label': '18'}, page_content='early growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 – 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping'),\n",
       " Document(id='c4a9043a-fe10-459d-b714-4b1083160bac', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 23, 'page_label': '24'}, page_content='jag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 – 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm Β΄ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 – 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,'),\n",
       " Document(id='d6273845-9651-4f71-a5b0-3d3b72037a37', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 7, 'page_label': '8'}, page_content='preliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description')]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retrieved_documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "7993ff0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "content_texts = \"\\n\\n\".join(document.page_content for document in retrieved_documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "d923bfda",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s β€œ Category ” field in the \\nWoS database must be labeled as β€œ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β€œ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.\\n\\nas β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β€œ Cropland Data Layer ” OR β€œ CDL ” ) AND (WC = β€œ Remote Sensing ” \\nOR ALL = ( β€œ Remote Sensing ” OR β€œ Earth observation ” OR β€œ Landsat ” OR \\nβ€œ Sentinel ” OR β€œ MODIS ” )) AND DT = β€œ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nearly growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 – 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping\\n\\njag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 – 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm Β΄ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 – 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description'"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "content_texts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "32fe4db1",
   "metadata": {},
   "outputs": [],
   "source": [
    "final_prompt = prompt.invoke({\"context\":content_texts,\"question\":question})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "49c857d3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StringPromptValue(text=\"\\n    You are a helpful assistant.\\n    Answer ONLY from the provided transcript context.\\n    If the context IS INSUFFICIENT, just say you don't know and probably need more information.\\n\\n    the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication ’ s β€œ Category ” field in the \\nWoS database must be labeled as β€œ remote sensing ” . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β€œ remote sensing ” . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.\\n\\nas β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β€œ Cropland Data Layer ” OR β€œ CDL ” ) AND (WC = β€œ Remote Sensing ” \\nOR ALL = ( β€œ Remote Sensing ” OR β€œ Earth observation ” OR β€œ Landsat ” OR \\nβ€œ Sentinel ” OR β€œ MODIS ” )) AND DT = β€œ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nearly growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 – 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping\\n\\njag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 – 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm Β΄ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 – 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description\\n\\n    Question: Is the aspect of stars mentioned in this document provided? If yes, explain what was discussed?\\n    \")"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "final_prompt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f838d0a6",
   "metadata": {},
   "source": [
    "## Answer Generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "d8730f3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = LLM.invoke(final_prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "a6989473",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"I don't know and probably need more information, as the provided transcript does not mention the aspect of stars.\""
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response.content"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5299ae94",
   "metadata": {},
   "source": [
    "## Build chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "cf902ef7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import libraries for chain building\n",
    "from langchain_core.runnables import RunnableParallel,RunnablePassthrough,RunnableLambda\n",
    "from langchain_core.output_parsers import StrOutputParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reformat_doc(retrieved_documents):\n",
    "  content_texts = \"\\n\\n\".join(document.page_content for document in retrieved_documents)\n",
    "  return content_texts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "834931d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "parallel_chain = RunnableParallel({\n",
    "    \"context\": retriever | RunnableLambda(reformat_doc),\n",
    "    \"question\": RunnablePassthrough()\n",
    "}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "91125159",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'context': 'as β€œ Remote Sensing ” , β€œ Earth observation ” , β€œ Landsat ” , β€œ Sentinel ” , or \\nβ€œ MODIS ” in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β€œ Cropland Data Layer ” OR β€œ CDL ” ) AND (WC = β€œ Remote Sensing ” \\nOR ALL = ( β€œ Remote Sensing ” OR β€œ Earth observation ” OR β€œ Landsat ” OR \\nβ€œ Sentinel ” OR β€œ MODIS ” )) AND DT = β€œ Article ” , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we \\nmanually applied the three exclusion criteria to exclude publications \\nwhere the full term β€œ CDL ” was not related to β€œ Cropland Data Layer ” , \\nstudies that did not use remote sensing data, and any review articles. \\nThese exclusion criteria were essential for ensuring the reliability of our \\nselection results and for eliminating any irrelevant literature. The \\nliterature selection process from the CDL citations on the USDA NASS \\nwebsite adheres to the same inclusion and exclusion criteria. The \\neligible documents were combined with the screening results of WoS \\ndatabase, and any duplicate records were removed.\\n3.3. Results\\nThe result of the literature screening process is illustrated in Fig. 4 . \\nApplying the inclusion criteria, we screened 162 and 43 articles from the \\nWoS database and the USDA NASS CDL website, respectively. We then\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe article’s title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description\\n\\ncompag.2022.106866 .\\nDanielson, P., Yang, L., Jin, S., Homer, C., Napton, D., 2016. An assessment of the \\ncultivated cropland class of NLCD 2006 using a multi-source and multi-criteria \\napproach. Remote Sens 8, 101. https://doi.org/10.3390/rs8020101 .\\nDefourny, P., Bontemps, S., Bellemans, N., Cara, C., Dedieu, G., Guzzonato, E., \\nHagolle, O., Inglada, J., Nicola, L., Rabaute, T., Savinaud, M., Udroiu, C., Valero, S., \\nB Β΄egu Β΄e, A., Dejoux, J.-F., El Harti, A., Ezzahar, J., Kussul, N., Labbassi, K., \\nLebourgeois, V., Miao, Z., Newby, T., Nyamugama, A., Salh, N., Shelestov, A., \\nSimonneaux, V., Traore, P.S., Traore, S.S., Koetz, B., 2019. Near real-time agriculture \\nmonitoring at national scale at parcel resolution: performance assessment of the \\nSen2-Agri automated system in various cropping systems around the world. Remote \\nSens. Environ. 221, 551 – 568. https://doi.org/10.1016/j.rse.2018.11.007 .\\n\\nCRediT authorship contribution statement\\nChen Zhang: Writing – original draft, Project administration, \\nMethodology, Conceptualization. Hannah Kerner: Writing – original \\ndraft. Sherrie Wang: Writing – original draft. Pengyu Hao: Writing – \\noriginal draft. Zhe Li: Writing – original draft. Kevin A. Hunt: Writing – \\noriginal draft. Jonathon Abernethy: Writing – original draft. Haoteng \\nZhao: Writing – original draft. Feng Gao: Writing – original draft. \\nLiping Di: Writing – review & editing, Supervision, Funding acquisition. \\nClaire Guo: Writing – review & editing, Validation, Investigation. Ziao \\nLiu: Writing – review & editing, Investigation. Zhengwei Yang: Writing \\n– review & editing, Resources. Rick Mueller: Writing – review & edit -\\ning, Resources. Claire Boryan: Writing – review & editing, Resources. \\nQi Chen: Writing – review & editing, Resources. Peter C. Beeson: \\nWriting – review & editing, Resources. Hankui K. Zhang: Writing –',\n",
       " 'question': 'Quickly and briefly summarize the document'}"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parallel_chain.invoke('Quickly and briefly summarize the document')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "9fa7a8aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "parse = StrOutputParser()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "f16763a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "main_chain = parallel_chain | prompt | LLM | parse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "8a92eb28",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This document outlines a methodology for selecting relevant literature on \"Cropland Data Layer\" (CDL) within the remote sensing field. It details specific inclusion and exclusion criteria, keywords used for searching (\"Remote Sensing\", \"Earth observation\", \"Landsat\", \"Sentinel\", \"MODIS\", \"Cropland Data Layer\", \"CDL\"), and the databases utilized (Web of Science Core Collection and USDA NASS CDL website). The process involved screening, manually applying exclusion criteria, and removing duplicate records, ultimately identifying a specific number of articles from each source.\n"
     ]
    }
   ],
   "source": [
    "print(main_chain.invoke(\"Quickly and briefly summarize the document\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "7133b4da",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "*   The document lists numerous authors and their specific contributions to the work, including writing, project administration, methodology, supervision, funding acquisition, validation, and providing resources.\n",
      "*   It describes the methodology for screening qualified publications related to the Cropland Data Layer (CDL) in the remote sensing field.\n",
      "*   The literature screening used the Web of Science (WoS) Core Collection and the USDA NASS website.\n",
      "*   Inclusion criteria required specific keywords related to \"Cropland Data Layer\" or \"CDL\" and remote sensing terms (e.g., \"Remote Sensing\", \"Landsat\", \"MODIS\") in the title, abstract, or keywords, focusing on peer-reviewed articles.\n",
      "*   Exclusion criteria were applied to remove irrelevant publications, such as those where \"CDL\" was not related to \"Cropland Data Layer,\" studies not using remote sensing data, or review articles.\n",
      "*   The initial screening process yielded 162 articles from the WoS database and 43 from the USDA NASS CDL website.\n"
     ]
    }
   ],
   "source": [
    "print(main_chain.invoke(\"Quickly and briefly summarize the document. Put them in bullet format now.\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}