Spaces:
Sleeping
Sleeping
File size: 88,436 Bytes
8b017a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 |
{
"cells": [
{
"cell_type": "markdown",
"id": "94bbe043",
"metadata": {},
"source": [
"# Document RAG System."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "367a3c60",
"metadata": {},
"outputs": [],
"source": [
"# install needed libraries to use\n",
"!pip -q install langchain langchain-google-genai langchain-community google-genai faiss-cpu tiktoken python-dotenv pypdf langchain-huggingface sentence-transformers"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "276de997",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"Google Colab environment setup\"\"\"\n",
"# import os\n",
"\n",
"# # set environment variables for google and huggingface\n",
"\n",
"# os.environ['GOOGLE_API_KEY'] = userdata.get(\"GOOGLE_API_KEY\")\n",
"# os.environ['HUGGINGFACEHUB_ACCESS_TOKEN'] = userdata.get(\"HUGGINGFACEHUB_ACCESS_TOKEN\")"
]
},
{
"cell_type": "markdown",
"id": "2651b55f",
"metadata": {},
"source": [
"### Load keys."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7673d4e2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI key loaded: True\n",
"\n",
"Gemini key loaded: True\n"
]
}
],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()\n",
"\n",
"openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"gemini_api_key = os.getenv(\"GEMINI_API_KEY\")\n",
"\n",
"print(\"OpenAI key loaded:\", bool(openai_api_key))\n",
"# print(\"OpenAI key:\", openai_api_key)\n",
"\n",
"print(\"\\nGemini key loaded:\", bool(gemini_api_key))\n",
"# print(\"Gemini key:\", gemini_api_key)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d1aa3528",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.12/dist-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"source": [
"# import necessary libraries\n",
"\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain_google_genai import GoogleGenerativeAIEmbeddings,ChatGoogleGenerativeAI,GoogleGenerativeAI\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_core.prompts import PromptTemplate\n",
"from langchain_community.document_loaders import PyPDFLoader\n",
"from langchain_huggingface import HuggingFaceEmbeddings"
]
},
{
"cell_type": "markdown",
"id": "72887907",
"metadata": {},
"source": [
"### Test key with prompt."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "181c0959",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"E0000 00:00:1759189263.070101 109561 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"content='Hello! How can I help you today?' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []} id='run--0c2a5214-4f37-43b5-b432-836a3abe7058-0' usage_metadata={'input_tokens': 2, 'output_tokens': 46, 'total_tokens': 48, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 37}}\n"
]
}
],
"source": [
"from langchain_google_genai import ChatGoogleGenerativeAI\n",
"\n",
"LLM = ChatGoogleGenerativeAI(\n",
" model=\"gemini-2.5-flash\",\n",
" google_api_key=gemini_api_key\n",
")\n",
"\n",
"response = LLM.invoke(\"Hello\")\n",
"print(response)\n"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "83bf1f6f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello! How can I help you today?\n"
]
}
],
"source": [
"print(response.content)"
]
},
{
"cell_type": "markdown",
"id": "7d42c604",
"metadata": {},
"source": [
"## Load document."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a39b0bfb",
"metadata": {},
"outputs": [],
"source": [
"# load and read PDF file\n",
"\n",
"load_document = PyPDFLoader(\"dataset/ChenZhang_cropmapping_ReviewPaper.pdf\")\n",
"document = load_document.load()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "44f61c67",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"29"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(document)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "46d97e70",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Review\n",
"Remote sensing for crop mapping: A perspective on current and future \n",
"crop-specific land cover data products\n",
"Chen Zhang\n",
"a , *\n",
", Hannah Kerner\n",
"b\n",
", Sherrie Wang\n",
"c\n",
", Pengyu Hao\n",
"d\n",
", Zhe Li\n",
"e\n",
", Kevin A. Hunt\n",
"e\n",
", \n",
"Jonathon Abernethy\n",
"e\n",
", Haoteng Zhao\n",
"f\n",
", Feng Gao\n",
"f\n",
", Liping Di\n",
"a , *\n",
", Claire Guo\n",
"a , g\n",
", Ziao Liu\n",
"a\n",
", \n",
"Zhengwei Yang\n",
"e\n",
", Rick Mueller\n",
"e\n",
", Claire Boryan\n",
"e\n",
", Qi Chen\n",
"h\n",
", Peter C. Beeson\n",
"i\n",
", Hankui K. Zhang\n",
"j\n",
", \n",
"Yu Shen\n",
"j , k\n",
"a\n",
"Center for Spatial Information Science and Systems, George Mason University, Fairfax, VA 22030, USA\n",
"b\n",
"School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA\n",
"c\n",
"Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA\n",
"d\n",
"Food and Agriculture Organization of the United Nations, Viale delle Terme di Caracalla, 00153 Rome, Italy\n",
"e\n",
"U.S. Department of Agriculture, National Agricultural Statistics Service, Washington, DC 20250, USA\n",
"f\n",
"U.S. Department of Agriculture, Agricultural Research Service, Hydrology and Remote Sensing Laboratory, Beltsville, MD 20705, USA\n",
"g\n",
"Thomas Jefferson High School for Science and Technology, Alexandria, VA 22312, USA\n",
"h\n",
"Department of Geography & Environment, University of Hawai β i at M Β―anoa, Honolulu, HI 96822, USA\n",
"i\n",
"U.S. Department of Agriculture, Economic Research Service, Washington, DC 20250, USA\n",
"j\n",
"Geospatial Sciences Center of Excellence, Department of Geography and Geospatial Sciences, South Dakota State University, Brookings, SD 57007, USA\n",
"k\n",
"Nicholas School of the Environment, Duke University, Durham, NC 27708, USA\n",
"ARTICLE INFO\n",
"Edited by Dr. Marie Weiss\n",
"Keywords:\n",
"Crop mapping\n",
"Land use land cover\n",
"Geospatial data product\n",
"Systematic literature review\n",
"Cropland data layer\n",
"ABSTRACT\n",
"Crop mapping is an indispensable application in agricultural and environmental remote sensing. Over the last \n",
"few decades, the exponential growth of open Earth Observation (EO) data has significantly enhanced crop \n",
"mapping and enabled the production of detailed crop-specific land cover data at national and regional scales. \n",
"These data have served multiple purposes across a wide range of applications and research initiatives. However, \n",
"there is currently no comprehensive summary of the crop mapping data products, nor is there a detailed dis -\n",
"cussion of their uses in remote sensing studies. This paper provides the first in-depth review of remote sensing for \n",
"crop mapping from the perspective of crop-specific land cover data by evaluating over 60 open-access opera -\n",
"tional products, archival crop type map datasets, single-crop extent map datasets, cropping pattern datasets, and \n",
"crop mapping platforms and systems. Using the Cropland Data Layer (CDL) β one of the most widely used \n",
"products with over 25 years of continuous monitoring of U.S. croplands β as a case study, we also conduct a \n",
"systematic literature review on the application of crop type maps in remote sensing science. Our analysis syn -\n",
"thesizes 129 research articles through three core research questions: (1) What EO data are used with CDL; (2) \n",
"What scientific problems and technologies are explored using CDL; and (3) What role does CDL play in remote \n",
"sensing applications. Furthermore, we delve into the implications of our vision for new data products and \n",
"propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products \n",
"to improving global mapping reliability and developing operational in-season crop mapping systems. This review \n",
"paper not only serves as a reference for stakeholders seeking to utilize crop-specific land cover data in their work, \n",
"but also outlines the directions for future geospatial data product development.\n",
"* Corresponding authors.\n",
"E-mail addresses: czhang11@gmu.edu (C. Zhang), hkerner@asu.edu (H. Kerner), sherwang@mit.edu (S. Wang), pengyu.hao@fao.org (P. Hao), zhe.li@usda.gov\n",
"(Z. Li), kevin.a.hunt@usda.gov (K.A. Hunt), jake.abernethy@usda.gov (J. Abernethy), haoteng.zhao@usda.gov (H. Zhao), feng.gao@usda.gov (F. Gao), ldi@gmu. \n",
"edu (L. Di), zliu23@gmu.edu (Z. Liu), zhengwei.yang@usda.gov (Z. Yang), rick.mueller@usda.gov (R. Mueller), claire.boryan@usda.gov (C. Boryan), qichen@ \n",
"hawaii.edu (Q. Chen), peter.beeson@usda.gov (P.C. Beeson), hankui.zhang@sdstate.edu (H.K. Zhang), yu.shen@duke.edu (Y. Shen). \n",
"Contents lists available at ScienceDirect\n",
"Remote Sensing of Environment\n",
"journal homepage: www.else vier.com/loc ate/rse\n",
"https://doi.org/10.1016/j.rse.2025.114995\n",
"Received 12 December 2024; Received in revised form 11 August 2025; Accepted 22 August 2025 \n",
"Remote Sensing of Environment 330 (2025) 114995 \n",
"0034-4257/Β© 2025 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by- \n",
"nc-nd/4.0/ ).\n"
]
}
],
"source": [
"# First page of PDF\n",
"print(document[0].page_content) "
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "fa96dcd3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the WoS database, including the publication title, abstract, or keywords. \n",
"In our survey, we found that many papers introduced, discussed, or cited \n",
"CDL, but did not directly use the data in their experiments. Therefore, \n",
"IC1 could ensure that CDL has been applied in the selected publications, \n",
"rather than simply mentioning it in passing.\n",
"To narrow down the publications to those specifically related to \n",
"remote sensing, IC2 states that the publication β s β Category β field in the \n",
"WoS database must be labeled as β remote sensing β . However, many \n",
"publications related to remote sensing were published in computer sci -\n",
"ence, agricultural, or multidisciplinary journals, which were not cate -\n",
"gorized as β remote sensing β . To include these publications in this \n",
"review, we added a rule that requires the presence of certain terms, such \n",
"as β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \n",
"β MODIS β in any of the title, keywords, or abstract of the publication.\n",
"To ensure the selected publications reflected the up-to-date research \n",
"trends and avoided duplicate research items, IC3 limits the document \n",
"type to only peer-reviewed articles that were published in journals \n",
"indexed by the WoS Core Collection. Focusing on these high-impact \n",
"journal articles guarantees that our review reflects the most represen -\n",
"tative studies within the remote sensing field.\n",
"The query string of inclusion criteria in the WoS data database is: \n",
"ALL = ( β Cropland Data Layer β OR β CDL β ) AND (WC = β Remote Sensing β \n",
"OR ALL = ( β Remote Sensing β OR β Earth observation β OR β Landsat β OR \n",
"β Sentinel β OR β MODIS β )) AND DT = β Article β , where ALL represents all \n",
"fields (title, abstract, keywords), WC represents WoS categories, and DT \n",
"represents document type. After the initial screening process, we \n",
"manually applied the three exclusion criteria to exclude publications \n",
"where the full term β CDL β was not related to β Cropland Data Layer β , \n",
"studies that did not use remote sensing data, and any review articles. \n",
"These exclusion criteria were essential for ensuring the reliability of our \n",
"selection results and for eliminating any irrelevant literature. The \n",
"literature selection process from the CDL citations on the USDA NASS \n",
"website adheres to the same inclusion and exclusion criteria. The \n",
"eligible documents were combined with the screening results of WoS \n",
"database, and any duplicate records were removed.\n",
"3.3. Results\n",
"The result of the literature screening process is illustrated in Fig. 4 . \n",
"Applying the inclusion criteria, we screened 162 and 43 articles from the \n",
"WoS database and the USDA NASS CDL website, respectively. We then \n",
"excluded 48 and 8 non-qualified articles from the two sources. After \n",
"removing the 20 duplicated records, we identified 129 qualified articles \n",
"for use in this systematic literature review. The full literature list and \n",
"surveyed features per selected publication are summarized in Table A1 .\n",
"Table 8 summarizes the publication distribution of 129 qualified \n",
"articles across over 40 scientific journals. It should be noted that these \n",
"screening results only encompass representative articles related to the \n",
"CDL in remote sensing science. As documents are searched based on \n",
"Table 7 \n",
"Document search criteria.\n",
"ID Description\n",
"Inclusion Criteria 1 (IC1) β Cropland Data Layer β OR β CDL β contained in any fields\n",
"Inclusion Criteria 2 (IC2) Category in β Remote Sensing β or in other categories but contain β Remote Sensing β or β Earth observation β or β Landsat β or β Sentinel β or β MODIS β in any \n",
"fields\n",
"Inclusion Criteria 3 (IC3) Publication is a journal article\n",
"Exclusion Criteria 1 \n",
"(EC1)\n",
"The full term of β CDL β is not related to β Cropland Data Layer β\n",
"Exclusion Criteria 2 \n",
"(EC2)\n",
"No remote sensing data is used in the study\n",
"Exclusion Criteria 3 \n",
"(EC3)\n",
"Publication is a review paper\n",
"Fig. 4. Literature screening process.\n",
"Table 8 \n",
"Publication distribution of the qualified CDL-related remote sensing studies by \n",
"journals.\n",
"Journal Record \n",
"Count\n",
"Remote Sensing of Environment 27\n",
"Remote Sensing 26\n",
"International Journal of Applied Earth Observation and \n",
"Geoinformation\n",
"9\n",
"ISPRS Journal of Photogrammetry and Remote Sensing 9\n",
"Photogrammetric Engineering and Remote Sensing 4\n",
"Agronomy Journal 3\n",
"Computers and Electronics in Agriculture 3\n",
"Remote Sensing Letters 3\n",
"Agricultural Systems 2\n",
"Agricultural Water Management 2\n",
"Canadian Journal of Remote Sensing 2\n",
"Earth System Science Data 2\n",
"European Journal of Remote Sensing 2\n",
"IEEE Journal of Selected Topics in Applied Earth Observations and \n",
"Remote Sensing\n",
"2\n",
"International Journal of Remote Sensing 2\n",
"Science of Remote Sensing 2\n",
"Sensors 2\n",
"Others (only one paper) 27\n",
"Total 129\n",
"C. Zhang et al. Remote Sensing of Environment 330 (2025) 114995 \n",
"10\n"
]
}
],
"source": [
"# 10th page of PDF\n",
"print(document[9].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "31c4df34",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"et al., 2013 ). CDL data also have been used to delineate and stratify \n",
"regions, such as U.S. soybean growing areas ( Song et al., 2017 ), which \n",
"helps in understanding field size patterns for more effective agricultural \n",
"resource management.\n",
"Training samples: Beyond a crop type map, CDL is widely utilized \n",
"as an authoritative geospatial benchmark to support field-level crop \n",
"spectral signature training. The ML models trained with high-confidence \n",
"pixels in CDL and associated products (e.g., CSB, Confidence Layer) can \n",
"be applied to extend land cover classification while adjusting for factors \n",
"such as hemisphere seasonality and evolving farming trends, which is \n",
"invaluable for global crop monitoring. As discussed in RQ2, ML and DL \n",
"are the main technologies in remote sensing studies, which rely on high- \n",
"quality training data. Due to the extensive crop-specific land cover in -\n",
"formation, CDL has been extensively used to label training samples in EO \n",
"data. This enables the further supervised-learning-based training pro -\n",
"cess for semantic segmentation models ( Du et al., 2022a ), ML models \n",
"( Momm et al., 2020 ), DL models ( Cai et al., 2018 ; Xu et al., 2020 ), and \n",
"transfer learning models ( Hao et al., 2020 ; Wei et al., 2022 ). Instead of \n",
"directly using CDL as training samples, some works further optimized \n",
"the training sample selection process by modeling crop rotation patterns \n",
"in the historical CDL ( Zhang et al., 2022a ). Zhang et al. (2021) and Lin \n",
"et al. (2022b) used DNNs to automatically recognize training samples \n",
"from CDL time series to label Landsat and Sentinel-2 data for early and \n",
"in-season crop mapping.\n",
"Benchmark data: CDL is often adopted as benchmark data or \n",
"reference data to validate new crop mapping methodologies and algo -\n",
"rithms. The traditional ground-truthing process is usually labor- \n",
"intensive, particularly when surveying extensive geographic areas. By \n",
"comparing results against the CDL, researchers can efficiently assess \n",
"model performance, detect areas for improvement, and refine their \n",
"strategies to achieve optimal outcomes. However, despite its widespread \n",
"use, it should be noted that CDL only represents a high-quality classifi -\n",
"cation map rather than ground truth. Several studies have examined the \n",
"uncertainty and potential biases associated with using CDL as bench -\n",
"mark data for result validation. For example, Lark et al. (2021) found the \n",
"average accuracy for all crop classes has improved from 87 % in 2008 to \n",
"92 % in 2016. Kerner et al. (2022) showed 2019β2020 CDL had 89 % \n",
"accuracy evaluated with independent ground truth data within the \n",
"central US Corn Belt.\n",
"Other uses: CDL and its derivative data products have been applied \n",
"in addressing broader applications and scientific problems. Boryan et al. \n",
"(2014) developed a stratification method for agricultural area sampling \n",
"frame construction based on CDL. Gao et al. (2014) used CDL to assist in \n",
"the creation of Bidirectional Reflectance Distribution Function (BRDF) \n",
"look-up maps. Harmonic analysis techniques, such as linear and non- \n",
"linear harmonic models, have been employed with CDL to model peri -\n",
"odic patterns in time series data ( Roy and Yan, 2020 ; Wang et al., \n",
"2020a ). Shao et al. (2016a) evaluated different time-series smoothing \n",
"algorithms. Duveiller et al. (2015) developed a signal-to-noise ratio \n",
"method to identify spatially homogeneous vegetation cover. CDL has \n",
"also been utilized in GIS education ( Han et al., 2014 ) and as compared \n",
"dataset for particular purposes ( Wickham et al., 2014 ; Kokkinidis et al., \n",
"2017 ; Shi et al., 2018 ; Kraatz et al., 2023 ; Wang and Mountrakis, 2023 ).\n",
"4. Visions for future data products\n",
"As science and technology in remote sensing advances, the demand \n",
"for enhanced crop-specific land cover data products becomes increas -\n",
"ingly evident. This section explores vision and progress in improving \n",
"spatiotemporal coverage and resolution of the current data products \n",
"(Section 4.1), achieving reliable global mapping through robust training \n",
"datasets and cropland extent data (Section 4.2 and 4.3), incorporating \n",
"more crop-specific information (Section 4.4 and 4.5), and the develop -\n",
"ment of operational in-season crop mapping systems (Section 4.6).\n",
"4.1. Progress on enhanced coverage and resolution of current product\n",
"Enhanced spatial coverage and resolution significantly benefit crop \n",
"mapping, area estimation, and field size quantification by enabling more \n",
"accurate identification of land cover features. Advancement in geo -\n",
"spatial cloud computing platforms (e.g., GEE) and increasing availabil -\n",
"ity of higher spatiotemporal resolution open EO data (e.g., Sentinel-1, \n",
"Sentinel-2, HLS) have improved the efficiency and accuracy for pro -\n",
"ducing regional and national crop type map data with resolution of 10-m \n",
"or even higher ( Tran et al., 2022 ; Li et al., 2025 ). Such detailed field- \n",
"level crop cover information will not only facilitate a more precise \n",
"distinction between different types of vegetation and crops, but also \n",
"provide opportunities for improved agricultural monitoring, better \n",
"resource management, and informed decision-making to support sus -\n",
"tainable agriculture and food security.\n",
"As highlighted in the Section 3, the 30-m CDL has traditionally been \n",
"essential for scientific problem solving with various EO data. However, \n",
"the increasing availability of higher-resolution EO data from both open- \n",
"access and commercial satellites requires more detailed crop mapping \n",
"products. To meet this evolving need, the USDA NASS has been \n",
"enhancing data accuracy and usability by implementing a 10-m reso -\n",
"lution CDL. These improvements are vital, particularly given the \n",
"increasing vulnerability of agriculture to natural disasters and extreme \n",
"weather events. By utilizing the RF algorithm, enhanced stratified \n",
"random sampling approaches, and localized image processing, the 10-m \n",
"CDL provides a more accurate representation of diverse crop types for \n",
"CONUS, particularly in regions with unique or specialty crops. This \n",
"methodology reduces labor and workload while improving classification \n",
"accuracy and spatial clarity for small-area and specialty crops compared \n",
"to 30-m CDL. Fig. 8 shows the improvement achieved with the new 10-m \n",
"CDL compared to the current 30-m CDL on croplands with complex \n",
"landscapes.\n",
"Currently, the CDL is available only for the CONUS. However, efforts \n",
"are underway to extend coverage to other regions, such as Hawaii and U. \n",
"S. territories like Puerto Rico and the U.S. Virgin Islands. Enhanced CDLs \n",
"for these areas include the 2022 Beta version and the official 2024 \n",
"release of the 10-m resolution CDL for CONUS ( Li et al., 2024b ), and the \n",
"inaugural Hawaii Cropland Data Layer (HCDL) 2023 and 2024 ( Li et al., \n",
"2024a ). These products leverage gap-filled 10-day image composites \n",
"from Sentinel and Landsat sensors, processed through GEE. In devel -\n",
"oping the HCDL, assorted ML and DL algorithms were evaluated, \n",
"including RF, U-Net, ResNet50, VGG19, and DeepLabV3. The RF algo -\n",
"rithm achieved the best results for mapping major and specialty crops in \n",
"Hawaii. Fig. 9 illustrates the 10-m resolution HCDL 2023 V1.0 Beta, \n",
"which utilizes a RF algorithm with 100 trees for mapping crops, \n",
"including coffee, pineapple, macadamia nuts, commercial forest, citrus, \n",
"papaya, and tropical fruits. The official release of HCDL 2023 and 2024 \n",
"is anticipated in summer 2025. Future efforts will focus on creating a 10- \n",
"m resolution annual CDL for CONUS and potentially extending to Puerto \n",
"Rico and the U.S. Virgin Islands.\n",
"4.2. Developing training dataset in data-sparse regions\n",
"Lack of training data is a major barrier for developing crop type maps \n",
"like CDL in regions outside of the United States or other countries that \n",
"have instituted operational mapping programs (e.g., programs in \n",
"Table 1 ). Researchers aim to overcome this barrier in two main ways: (1) \n",
"developing more globally representative training datasets, and (2) \n",
"developing algorithms that learn more efficiently from small amounts of \n",
"training data.\n",
"Globally representative training datasets: Globally representative \n",
"reference data is essential for training modern data-hungry DL models \n",
"and has been identified as a key priority in advancing AI applications in \n",
"remote sensing ( Zhang et al., 2025a ). Collecting crop type data for \n",
"training ML classifiers for crop mapping is challenging because col -\n",
"lecting high-quality data typically requires ground-truthing ( Nakalembe \n",
"C. Zhang et al. Remote Sensing of Environment 330 (2025) 114995 \n",
"14\n"
]
}
],
"source": [
"# 14th page of PDF\n",
"print(document[13].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "0b15e82e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"and Kerner, 2023 ). Ground-truthing involves physically visiting agri -\n",
"cultural fields and recording the type of crop growing in the field. This \n",
"process is prohibitively expensive and logistically challenging for many \n",
"organizations and regions.\n",
"Currently available public reference samples are largely regional in \n",
"scope ( Dufourg et al., 2023 ; Kondmann et al., 2021 ). Recent work has \n",
"proposed novel methods of collecting ground-truth crop labels that \n",
"reduce the cost of data collection. Paliyam et al. (2021) proposed a \n",
"method called Street2Sat that uses computer vision (CV) techniques to \n",
"transform roadside images of fields collected with car- and motorcycle \n",
"helmet-mounted cameras into geo-referenced crop type labels of those \n",
"fields. dβAndrimont et al. (2022) used CV techniques to extract crop type \n",
"and phenology information from street-level images of fields taken with \n",
"car-mounted cameras in the Netherlands. Yan and Ryu (2021) and Soler \n",
"et al. (2024) used DL models to automatically create crop type labels \n",
"from Google Street View images in California and Thailand, \n",
"respectively.\n",
"Other work leveraged crowd-sourced data from online and mobile \n",
"platforms to collect ground-truth crop data. Wang et al. (2020b) used \n",
"crop type data crowd-sourced from the Plantix mobile app (used to help \n",
"farmers diagnose crop disease) for crop mapping in India. Fraisl et al. \n",
"(2022) demonstrated the use of the mobile app Picture Pile to engage \n",
"citizen scientists to annotate crop type labels in crowdsourced street- \n",
"level images from Mapillary, which could later be converted to geo- \n",
"referenced crop type labels for training crop mapping models. The \n",
"CropObserve app facilitated the process on crop-specific ground truth -\n",
"ing (e.g., crop types, phenological stage, visible damage, management \n",
"practices) anywhere in the world ( IIASA, 2023 ).\n",
"In parallel with data collection efforts, increasing attention is being \n",
"paid to making crop type reference data more Findable, Accessible, \n",
"Interoperable, and Reusable (FAIR). Major research initiatives (e.g., \n",
"CropHarvest, WorldCereal, and EuroCrops) are actively working on \n",
"harmonizing, standardizing, and openly publishing training datasets to \n",
"enhance the FAIRness of crop reference data within the remote sensing \n",
"and agricultural monitoring communities.\n",
"Algorithms that learn more efficiently from small amounts of \n",
"training data: To reduce the need for large labeled datasets to train \n",
"effective crop mapping models, researchers have proposed methods for \n",
"learning from a small amount of training data for a given location. Many \n",
"of these methods involve learning from labeled data in locations other \n",
"than the target region to supplement training. The WorldCereal project \n",
"trained a CatBoost classifier with expert-designed features extracted \n",
"from multiple satellite datasets using a reference database of globally \n",
"distributed crop type labels ( Van Tricht et al., 2023 ). Other work has \n",
"leveraged transfer learning, in which models are first βpre-trainedβ on a \n",
"large labeled dataset for one task (e.g., crop mapping in region A) and \n",
"then further trained (βfine-tunedβ) on a smaller dataset for the target \n",
"task (e.g., crop mapping in region B). Meta-learning algorithms are also \n",
"used to learn efficiently from a small number of crop type examples in a \n",
"new target region by learning from many globally-distributed crop type \n",
"classification tasks in the CropHarvest dataset ( Tseng et al., 2021a, \n",
"2022 ).\n",
"Researchers have developed methods for learning generic features \n",
"that are useful in diverse tasks (e.g., crop mapping, land cover mapping, \n",
"tree species classification) from a large amount of unlabeled satellite EO \n",
"data in a process called self-supervised learning. Similar to transfer \n",
"learning discussed previously, after a model is pre-trained using self- \n",
"supervised learning, it can be fine-tuned for a specific crop mapping \n",
"task. For example, Tseng et al. (2024) proposed a self-supervised model \n",
"called Presto (which stands for Pre-trained remote sensing transformer) \n",
"that learns from unlabeled EO data from multiple satellite platforms and \n",
"derived products. They showed that fine-tuning Presto on the Kenya \n",
"maize classification task and Brazil coffee classification task in Cro -\n",
"pHarvest achieved state-of-the-art performance. Both tasks required \n",
"learning from small training data sizes of 1345 in Kenya and 203 in \n",
"Brazil. In Phase II of the ESA WorldCereal project (2024-2026) ( ESA, \n",
"2024 ), Presto was adopted for feature extraction for crop type mapping \n",
"in place of the expert-designed features used to train a CatBoost classifier \n",
"in Van Tricht et al. (2023) . With Prestoβs robust algorithm for improving \n",
"spatiotemporal transferability, this integration is key to WorldCerealβs \n",
"aim of establishing a generic and customizable global crop mapping \n",
"system.\n",
"In recent years, foundation models have emerged to address the \n",
"scarcity of labeled training data in remote sensing applications ( Jakubik \n",
"et al., 2023 ; Xiao et al., 2025 ). For example, Google recently introduced \n",
"AlphaEarth Foundations (AEF) for global mapping from sparse label \n",
"data ( Brown et al., 2025 ). As a geospatial foundation model, AEF in -\n",
"tegrates multi-source, multi-modal EO and geoinformation data into a \n",
"time-continuous embedding space, and the resulting global dataset of \n",
"analysis-ready embedding field layers could enable a wide range of \n",
"mapping tasks. Such foundation models and analysis-ready data offer a \n",
"promising solution for efficient production of cropland and crop type \n",
"maps at a global scale.\n",
"4.3. Improving consistency of cropland extent mask for global crop \n",
"mapping\n",
"From the perspective of global crop mapping, a reliable and consis -\n",
"tent cropland extent map serves as the fundamental land cover category \n",
"in the crop-specific land cover data production, which are crucial for the \n",
"subsequent crop type classification process especially over the data- \n",
"sparse regions. Various cropland extent mask data derived from EO \n",
"data have been widely developed and validated over the past years. \n",
"However, selecting the most appropriate cropland extent mask and \n",
"conducting local validation of these data tailored to the specific re -\n",
"quirements of the study remains challenging due to inconsistency and \n",
"variability in their reported accuracies and cropland definitions.\n",
"To improve consistency and transparency of cropland extent, \n",
"Table 9 \n",
"FAO land use categories for cropland.\n",
"Land Use Category Definition\n",
"Cropland Land used for cultivation of crops. The total of areas under Arable land and Permanent crops.\n",
"Arable land Land used for cultivation of crops in rotation with fallow, meadows and pastures within cycles of up to 5 years. The total of areas under Temporary \n",
"crops, temporary meadows and pastures, and temporary fallow. Arable land does not include land that is potentially cultivable but is not cultivated.\n",
"Temporary crops Land used for crops with a less than 1-year growing cycle, which must be newly sown or planted for further production after the harvest. Some crops \n",
"remaining in the field for more than 1 year may also be considered as temporary crops (e.g., asparagus, strawberries, pineapples, bananas, and sugar \n",
"cane). Multiple-cropped areas are counted only once.\n",
"Temporary fallow Land that is not seeded for one or more growing seasons. The maximum idle period is usually less than 5 years. This land may be in the form sown for \n",
"the exclusive production of green manure. Land remaining fallow for too long may acquire characteristics requiring it to be reclassified as, for \n",
"instance, permanent meadows and pastures if used for grazing or haying.\n",
"Temporary meadows and \n",
"pastures\n",
"Land temporarily cultivated with herbaceous forage crops for mowing or pasture, as part of crop rotation periods of less than 5 years.\n",
"Permanent crops Land cultivated with long-term crops which do not have to be replanted for several years (e.g., cocoa and coffee), land under trees and shrubs \n",
"producing flowers (e.g., roses and jasmine), and nurseries (except those for forest trees, which should be classified under βforestryβ). Permanent \n",
"meadows and pastures are excluded from permanent crops.\n",
"C. Zhang et al. Remote Sensing of Environment 330 (2025) 114995 \n",
"16\n"
]
}
],
"source": [
"# 16th page of PDF\n",
"print(document[15].page_content)"
]
},
{
"cell_type": "markdown",
"id": "2ec8c146",
"metadata": {},
"source": [
"## Split texts."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1f22bdbf",
"metadata": {},
"outputs": [],
"source": [
"# split into chunks\n",
"\n",
"doc_split= RecursiveCharacterTextSplitter(\n",
" chunk_size=1000,\n",
" chunk_overlap=200,\n",
")\n",
"chunks = doc_split.split_documents(document)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "94a200e7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"268"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(chunks)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b519f56f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"crop mapping from the perspective of crop-specific land cover data by evaluating over 60 open-access opera -\n",
"tional products, archival crop type map datasets, single-crop extent map datasets, cropping pattern datasets, and \n",
"crop mapping platforms and systems. Using the Cropland Data Layer (CDL) β one of the most widely used \n",
"products with over 25 years of continuous monitoring of U.S. croplands β as a case study, we also conduct a \n",
"systematic literature review on the application of crop type maps in remote sensing science. Our analysis syn -\n",
"thesizes 129 research articles through three core research questions: (1) What EO data are used with CDL; (2) \n",
"What scientific problems and technologies are explored using CDL; and (3) What role does CDL play in remote \n",
"sensing applications. Furthermore, we delve into the implications of our vision for new data products and \n",
"propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products\n"
]
}
],
"source": [
"# display 4th chunk\n",
"print(chunks[3].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "a62f7966",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"propose emerging research topics, ranging from extending the spatiotemporal coverage of current data products \n",
"to improving global mapping reliability and developing operational in-season crop mapping systems. This review \n",
"paper not only serves as a reference for stakeholders seeking to utilize crop-specific land cover data in their work, \n",
"but also outlines the directions for future geospatial data product development.\n",
"* Corresponding authors.\n",
"E-mail addresses: czhang11@gmu.edu (C. Zhang), hkerner@asu.edu (H. Kerner), sherwang@mit.edu (S. Wang), pengyu.hao@fao.org (P. Hao), zhe.li@usda.gov\n",
"(Z. Li), kevin.a.hunt@usda.gov (K.A. Hunt), jake.abernethy@usda.gov (J. Abernethy), haoteng.zhao@usda.gov (H. Zhao), feng.gao@usda.gov (F. Gao), ldi@gmu. \n",
"edu (L. Di), zliu23@gmu.edu (Z. Liu), zhengwei.yang@usda.gov (Z. Yang), rick.mueller@usda.gov (R. Mueller), claire.boryan@usda.gov (C. Boryan), qichen@\n"
]
}
],
"source": [
"# display 5th chunk\n",
"print(chunks[4].page_content)"
]
},
{
"cell_type": "markdown",
"id": "d13d1af5",
"metadata": {},
"source": [
"### Vector Store Creation.\n",
"\n",
"Generate document embeddings and build a FAISS vector store for efficient similarity-based retrieval."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "fd8ae71d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/wills/.local/lib/python3.12/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).\n",
" from pandas.core import (\n",
"2025-09-30 00:19:35.389072: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.\n",
"2025-09-30 00:19:37.212948: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
"To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
"/home/wills/.local/lib/python3.12/site-packages/matplotlib/projections/__init__.py:63: UserWarning: Unable to import Axes3D. This may be due to multiple versions of Matplotlib being installed (e.g. as a system package and as a pip package). As a result, the 3D projection is not available.\n",
" warnings.warn(\"Unable to import Axes3D. This may be due to multiple versions of \"\n"
]
}
],
"source": [
"embeds = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")\n",
"vector_store = FAISS.from_documents(chunks, embeds)"
]
},
{
"cell_type": "markdown",
"id": "cb690b06",
"metadata": {},
"source": [
"### Retrieval."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "ae03425b",
"metadata": {},
"outputs": [],
"source": [
"retriever = vector_store.as_retriever(search_type=\"similarity\", search_kwargs={\"k\": 5})"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "c691f892",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x76fcba598e00>, search_kwargs={'k': 5})"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "d4345de2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(id='50bfaa32-168f-4e5e-94b6-27d2668d4ef5', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='fields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we \\nmanually applied the three exclusion criteria to exclude publications \\nwhere the full term β CDL β was not related to β Cropland Data Layer β , \\nstudies that did not use remote sensing data, and any review articles. \\nThese exclusion criteria were essential for ensuring the reliability of our \\nselection results and for eliminating any irrelevant literature. The \\nliterature selection process from the CDL citations on the USDA NASS \\nwebsite adheres to the same inclusion and exclusion criteria. The \\neligible documents were combined with the screening results of WoS \\ndatabase, and any duplicate records were removed.\\n3.3. Results\\nThe result of the literature screening process is illustrated in Fig. 4 . \\nApplying the inclusion criteria, we screened 162 and 43 articles from the \\nWoS database and the USDA NASS CDL website, respectively. We then'),\n",
" Document(id='864cf8ce-bfa8-426b-9bfe-390c8679f13a', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='as β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β Cropland Data Layer β OR β CDL β ) AND (WC = β Remote Sensing β \\nOR ALL = ( β Remote Sensing β OR β Earth observation β OR β Landsat β OR \\nβ Sentinel β OR β MODIS β )) AND DT = β Article β , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we'),\n",
" Document(id='b700e905-8222-4ec0-8131-bee3ac9f51ca', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication β s β Category β field in the \\nWoS database must be labeled as β remote sensing β . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β remote sensing β . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.'),\n",
" Document(id='d6273845-9651-4f71-a5b0-3d3b72037a37', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 7, 'page_label': '8'}, page_content='preliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe articleβs title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description'),\n",
" Document(id='b58a38d1-950a-4e9e-b7f7-d1985985c0dd', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 8, 'page_label': '9'}, page_content='Fig. 3. The number of publications indexed by Scopus and Google Scholar (data accessed by January, 2024). The publications are filtered based on combined \\nkeywords β Cropland Data Layer β AND β Remote Sensing β and the single keyword β Cropland Data Layer β .\\nTable 6 \\nResearch questions.\\nID Research Question Objective Description\\nRQ1 What EO data are used with CDL? Identify common and suitable EO data in conjunction with crop type maps in remote sensing field\\nRQ2 What scientific problems and technologies are explored \\nusing CDL?\\nUnderstand the state of the science and main technologies in remote sensing that are applied with crop type \\nmaps\\nRQ3 What role does CDL play in remote sensing applications? Help researchers to recognize the significance of crop type maps and consider how to incorporate into these \\ndata their own work')]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# test retriever \n",
"retriever.invoke(\"what is the main topic of the document?\")"
]
},
{
"cell_type": "markdown",
"id": "45f8260f",
"metadata": {},
"source": [
"## Augmentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb4d1435",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"E0000 00:00:1759188738.019868 109561 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.\n"
]
}
],
"source": [
"LLM_gen = GoogleGenerativeAI(model=\"models/gemini-1.5-flash\", google_api_key=gemini_api_key)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b6ebacf",
"metadata": {},
"outputs": [],
"source": [
"prompt = PromptTemplate(\n",
" template = \"\"\"\n",
" You are a helpful assistant.\n",
" Answer ONLY from the provided transcript context.\n",
" If the context IS INSUFFICIENT, say you don't know.\n",
"\n",
" {context}\n",
"\n",
" Question: {question}\n",
" \"\"\",\n",
" input_variables=[\"context\",\"question\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "17adfd08",
"metadata": {},
"outputs": [],
"source": [
"question = \"Is the aspect of stars mentioned in this document provided? If yes, explain what was discussed?\"\n",
"retrieved_documents = retriever.invoke(question)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "0c21e23b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(id='b700e905-8222-4ec0-8131-bee3ac9f51ca', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication β s β Category β field in the \\nWoS database must be labeled as β remote sensing β . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β remote sensing β . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.'),\n",
" Document(id='864cf8ce-bfa8-426b-9bfe-390c8679f13a', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 9, 'page_label': '10'}, page_content='as β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β Cropland Data Layer β OR β CDL β ) AND (WC = β Remote Sensing β \\nOR ALL = ( β Remote Sensing β OR β Earth observation β OR β Landsat β OR \\nβ Sentinel β OR β MODIS β )) AND DT = β Article β , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we'),\n",
" Document(id='cf0e5d6a-656a-4073-b1ee-1e7fce2b5952', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 17, 'page_label': '18'}, page_content='early growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 β 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping'),\n",
" Document(id='c4a9043a-fe10-459d-b714-4b1083160bac', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 23, 'page_label': '24'}, page_content='jag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 β 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm Β΄ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 β 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,'),\n",
" Document(id='d6273845-9651-4f71-a5b0-3d3b72037a37', metadata={'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'creator': 'Elsevier', 'creationdate': '2025-09-02T19:45:23+00:00', 'crossmarkdomains[1]': 'elsevier.com', 'creationdate--text': '2nd September 2025', 'robots': 'noindex', 'elsevierwebpdfspecifications': '7.0.1', 'moddate': '2025-09-02T20:26:19+00:00', 'doi': '10.1016/j.rse.2025.114995', 'title': 'Remote sensing for crop mapping: A perspective on current and future crop-specific land cover data products', 'keywords': 'Crop mapping,Land use land cover,Geospatial data product,Systematic literature review,Cropland data layer', 'subject': 'Remote Sensing of Environment, 330 (2025) 114995. doi:10.1016/j.rse.2025.114995', 'crossmarkdomains[2]': 'sciencedirect.com', 'author': 'Chen Zhang', 'source': 'ChenZhang_cropmapping_ReviewPaper.pdf', 'total_pages': 29, 'page': 7, 'page_label': '8'}, page_content='preliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe articleβs title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description')]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retrieved_documents"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "7993ff0d",
"metadata": {},
"outputs": [],
"source": [
"content_texts = \"\\n\\n\".join(document.page_content for document in retrieved_documents)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "d923bfda",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication β s β Category β field in the \\nWoS database must be labeled as β remote sensing β . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β remote sensing β . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.\\n\\nas β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β Cropland Data Layer β OR β CDL β ) AND (WC = β Remote Sensing β \\nOR ALL = ( β Remote Sensing β OR β Earth observation β OR β Landsat β OR \\nβ Sentinel β OR β MODIS β )) AND DT = β Article β , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nearly growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 β 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping\\n\\njag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 β 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm Β΄ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 β 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe articleβs title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description'"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"content_texts"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "32fe4db1",
"metadata": {},
"outputs": [],
"source": [
"final_prompt = prompt.invoke({\"context\":content_texts,\"question\":question})"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "49c857d3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"StringPromptValue(text=\"\\n You are a helpful assistant.\\n Answer ONLY from the provided transcript context.\\n If the context IS INSUFFICIENT, just say you don't know and probably need more information.\\n\\n the WoS database, including the publication title, abstract, or keywords. \\nIn our survey, we found that many papers introduced, discussed, or cited \\nCDL, but did not directly use the data in their experiments. Therefore, \\nIC1 could ensure that CDL has been applied in the selected publications, \\nrather than simply mentioning it in passing.\\nTo narrow down the publications to those specifically related to \\nremote sensing, IC2 states that the publication β s β Category β field in the \\nWoS database must be labeled as β remote sensing β . However, many \\npublications related to remote sensing were published in computer sci -\\nence, agricultural, or multidisciplinary journals, which were not cate -\\ngorized as β remote sensing β . To include these publications in this \\nreview, we added a rule that requires the presence of certain terms, such \\nas β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.\\n\\nas β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β Cropland Data Layer β OR β CDL β ) AND (WC = β Remote Sensing β \\nOR ALL = ( β Remote Sensing β OR β Earth observation β OR β Landsat β OR \\nβ Sentinel β OR β MODIS β )) AND DT = β Article β , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nearly growth stages, with a mean difference of three days and a mean \\nabsolute difference of one week ( Gao et al., 2024 ).\\nWISE has been extended to five Corn Belt states (i.e., Iowa, Illinois, \\nIndiana, Minnesota, and Nebraska) for routine mapping of crop emer -\\ngence using HLS (30 m, 3 β 4 day revisit) data ( Gao et al., 2021 ). As \\nillustrated in Fig. 10 , benefiting from the frequent revisits of HLS, WISE \\ndetected the majority of fields across CONUS and provided detailed \\nspatial variability within each field. Recent high temporal and spatial \\nresolution satellite datasets (e.g., HLS, PlanetScope) are making it \\nfeasible for mapping within-season crop emergence over the CONUS \\n( Gao et al., 2024 ) and have great potential for integration with in-season \\ncrop mapping data products and operational crop monitoring systems \\n( Zhang et al., 2022b ; Zhang et al., 2023a ).\\n4.5. Advancing national-scale crop-specific field boundary mapping\\n\\njag.2023.103390 .\\nESA, 2024. Webinar: WorldCereal Phase II [WWW Document]. https://esa-worldcereal. \\norg/en/events/webinar-worldcereal-phase-ii-32 .\\nFalkowski, M.J., Manning, J.A., 2010. Parcel-based classification of agricultural crops via \\nmultitemporal Landsat imagery for monitoring habitat availability of western \\nburrowing owls in the Imperial Valley agro-ecosystem. Can. J. Remote. Sens. 36, \\n750 β 762. https://doi.org/10.5589/m11-011 .\\nFAOSTAT, 2024. Definitions and standards used in FAOSTAT [WWW Document]. \\nhttps://www.fao.org/faostat/en/#definitions .\\nFarmonov, N., Amankulova, K., Khan, S.N., Abdurakhimova, M., Szatm Β΄ari, J., \\nKhabiba, T., Makhliyo, R., Khodicha, M., Mucsi, L., 2023. Effectiveness of machine \\nlearning and deep learning models at county-level soybean yield forecasting. \\nHungarian Geogr. Bull. 72, 383 β 398. https://doi.org/10.15201/hungeobull.72.4.4 .\\nFisette, T., Rollin, P., Aly, Z., Campbell, L., Daneshfar, B., Filyer, P., Smith, A.,\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe articleβs title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description\\n\\n Question: Is the aspect of stars mentioned in this document provided? If yes, explain what was discussed?\\n \")"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_prompt"
]
},
{
"cell_type": "markdown",
"id": "f838d0a6",
"metadata": {},
"source": [
"## Answer Generation."
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "d8730f3c",
"metadata": {},
"outputs": [],
"source": [
"response = LLM.invoke(final_prompt)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "a6989473",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"I don't know and probably need more information, as the provided transcript does not mention the aspect of stars.\""
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"response.content"
]
},
{
"cell_type": "markdown",
"id": "5299ae94",
"metadata": {},
"source": [
"## Build chain."
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "cf902ef7",
"metadata": {},
"outputs": [],
"source": [
"# import libraries for chain building\n",
"from langchain_core.runnables import RunnableParallel,RunnablePassthrough,RunnableLambda\n",
"from langchain_core.output_parsers import StrOutputParser"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"def reformat_doc(retrieved_documents):\n",
" content_texts = \"\\n\\n\".join(document.page_content for document in retrieved_documents)\n",
" return content_texts"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "834931d5",
"metadata": {},
"outputs": [],
"source": [
"parallel_chain = RunnableParallel({\n",
" \"context\": retriever | RunnableLambda(reformat_doc),\n",
" \"question\": RunnablePassthrough()\n",
"}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "91125159",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'context': 'as β Remote Sensing β , β Earth observation β , β Landsat β , β Sentinel β , or \\nβ MODIS β in any of the title, keywords, or abstract of the publication.\\nTo ensure the selected publications reflected the up-to-date research \\ntrends and avoided duplicate research items, IC3 limits the document \\ntype to only peer-reviewed articles that were published in journals \\nindexed by the WoS Core Collection. Focusing on these high-impact \\njournal articles guarantees that our review reflects the most represen -\\ntative studies within the remote sensing field.\\nThe query string of inclusion criteria in the WoS data database is: \\nALL = ( β Cropland Data Layer β OR β CDL β ) AND (WC = β Remote Sensing β \\nOR ALL = ( β Remote Sensing β OR β Earth observation β OR β Landsat β OR \\nβ Sentinel β OR β MODIS β )) AND DT = β Article β , where ALL represents all \\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we\\n\\nfields (title, abstract, keywords), WC represents WoS categories, and DT \\nrepresents document type. After the initial screening process, we \\nmanually applied the three exclusion criteria to exclude publications \\nwhere the full term β CDL β was not related to β Cropland Data Layer β , \\nstudies that did not use remote sensing data, and any review articles. \\nThese exclusion criteria were essential for ensuring the reliability of our \\nselection results and for eliminating any irrelevant literature. The \\nliterature selection process from the CDL citations on the USDA NASS \\nwebsite adheres to the same inclusion and exclusion criteria. The \\neligible documents were combined with the screening results of WoS \\ndatabase, and any duplicate records were removed.\\n3.3. Results\\nThe result of the literature screening process is illustrated in Fig. 4 . \\nApplying the inclusion criteria, we screened 162 and 43 articles from the \\nWoS database and the USDA NASS CDL website, respectively. We then\\n\\npreliminary results published in conference papers or abstracts, with the \\nfull research later published in journal papers. Several publications only \\nincluded the relevant keywords in their reference sections, rather than in \\nthe articleβs title, abstract, or keywords.\\nTo screen the representative studies among all publications, we \\nchose the Web of Science (WoS) Core Collection as the database and \\nestablished a set of query criteria to screen the qualified publications \\nthat used CDL as the data source in the remote sensing field. To include \\nmore qualified literature, we expanded the literature database by \\nincorporating articles from the CDL citations featured on the official \\nUSDA NASS website ( USDA NASS, 2023 ).\\n3.2.3. Search criteria\\nTable 7 lists the search criteria to screen qualified publications \\nwithin the database. To set clear boundaries for the review and ensure \\nTable 5 \\nExamples of crop mapping systems and platforms.\\nProduct Link Description\\n\\ncompag.2022.106866 .\\nDanielson, P., Yang, L., Jin, S., Homer, C., Napton, D., 2016. An assessment of the \\ncultivated cropland class of NLCD 2006 using a multi-source and multi-criteria \\napproach. Remote Sens 8, 101. https://doi.org/10.3390/rs8020101 .\\nDefourny, P., Bontemps, S., Bellemans, N., Cara, C., Dedieu, G., Guzzonato, E., \\nHagolle, O., Inglada, J., Nicola, L., Rabaute, T., Savinaud, M., Udroiu, C., Valero, S., \\nB Β΄egu Β΄e, A., Dejoux, J.-F., El Harti, A., Ezzahar, J., Kussul, N., Labbassi, K., \\nLebourgeois, V., Miao, Z., Newby, T., Nyamugama, A., Salh, N., Shelestov, A., \\nSimonneaux, V., Traore, P.S., Traore, S.S., Koetz, B., 2019. Near real-time agriculture \\nmonitoring at national scale at parcel resolution: performance assessment of the \\nSen2-Agri automated system in various cropping systems around the world. Remote \\nSens. Environ. 221, 551 β 568. https://doi.org/10.1016/j.rse.2018.11.007 .\\n\\nCRediT authorship contribution statement\\nChen Zhang: Writing β original draft, Project administration, \\nMethodology, Conceptualization. Hannah Kerner: Writing β original \\ndraft. Sherrie Wang: Writing β original draft. Pengyu Hao: Writing β \\noriginal draft. Zhe Li: Writing β original draft. Kevin A. Hunt: Writing β \\noriginal draft. Jonathon Abernethy: Writing β original draft. Haoteng \\nZhao: Writing β original draft. Feng Gao: Writing β original draft. \\nLiping Di: Writing β review & editing, Supervision, Funding acquisition. \\nClaire Guo: Writing β review & editing, Validation, Investigation. Ziao \\nLiu: Writing β review & editing, Investigation. Zhengwei Yang: Writing \\nβ review & editing, Resources. Rick Mueller: Writing β review & edit -\\ning, Resources. Claire Boryan: Writing β review & editing, Resources. \\nQi Chen: Writing β review & editing, Resources. Peter C. Beeson: \\nWriting β review & editing, Resources. Hankui K. Zhang: Writing β',\n",
" 'question': 'Quickly and briefly summarize the document'}"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"parallel_chain.invoke('Quickly and briefly summarize the document')"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "9fa7a8aa",
"metadata": {},
"outputs": [],
"source": [
"parse = StrOutputParser()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "f16763a7",
"metadata": {},
"outputs": [],
"source": [
"main_chain = parallel_chain | prompt | LLM | parse"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "8a92eb28",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This document outlines a methodology for selecting relevant literature on \"Cropland Data Layer\" (CDL) within the remote sensing field. It details specific inclusion and exclusion criteria, keywords used for searching (\"Remote Sensing\", \"Earth observation\", \"Landsat\", \"Sentinel\", \"MODIS\", \"Cropland Data Layer\", \"CDL\"), and the databases utilized (Web of Science Core Collection and USDA NASS CDL website). The process involved screening, manually applying exclusion criteria, and removing duplicate records, ultimately identifying a specific number of articles from each source.\n"
]
}
],
"source": [
"print(main_chain.invoke(\"Quickly and briefly summarize the document\"))"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "7133b4da",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* The document lists numerous authors and their specific contributions to the work, including writing, project administration, methodology, supervision, funding acquisition, validation, and providing resources.\n",
"* It describes the methodology for screening qualified publications related to the Cropland Data Layer (CDL) in the remote sensing field.\n",
"* The literature screening used the Web of Science (WoS) Core Collection and the USDA NASS website.\n",
"* Inclusion criteria required specific keywords related to \"Cropland Data Layer\" or \"CDL\" and remote sensing terms (e.g., \"Remote Sensing\", \"Landsat\", \"MODIS\") in the title, abstract, or keywords, focusing on peer-reviewed articles.\n",
"* Exclusion criteria were applied to remove irrelevant publications, such as those where \"CDL\" was not related to \"Cropland Data Layer,\" studies not using remote sensing data, or review articles.\n",
"* The initial screening process yielded 162 articles from the WoS database and 43 from the USDA NASS CDL website.\n"
]
}
],
"source": [
"print(main_chain.invoke(\"Quickly and briefly summarize the document. Put them in bullet format now.\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
|