Spaces:
Sleeping
Sleeping
| CHECK_HEADERS_PROMPT = """ | |
| Analyze the following DataFrame columns and identify any columns without names or with invalid names. | |
| Return only a JSON list of column indices (0-based) that need attention, without any explanation. | |
| Columns: {columns} | |
| """ | |
| NORMALIZE_HEADERS_PROMPT = """ | |
| Analyze the following DataFrame column names and normalize them according to these rules: | |
| 1. Convert to lowercase | |
| 2. Replace empty strings or spaces with underscores | |
| 3. Remove any invalid characters (keep only letters, numbers, and underscores) | |
| Return only a JSON object where keys are the original column names and values are the normalized names, without any explanation. | |
| Column names: {columns} | |
| """ | |
| CHECK_COLUMN_CONTENT_PROMPT = """ | |
| Analyze the following sample of values from the column '{column_name}' and determine: | |
| 1. The most appropriate data type (float, integer, string, or date) | |
| 2. Indices of empty or blank values | |
| 3. Indices of values that don't conform to the determined data type | |
| Sample values: | |
| {sample_values} | |
| Return only a JSON object with the following structure, without any explanation: | |
| {{ | |
| "data_type": "detected_type", | |
| "empty_indices": [list of indices of empty or blank values], | |
| "invalid_indices": [list of indices of values that don't conform to the detected type] | |
| }} | |
| """ | |
| CHECK_TYPOS_PROMPT = """ | |
| Analyze the following sample of values from the column '{column_name}' and identify any potential typos or misspellings. | |
| For each identified typo, suggest a correction. | |
| Sample values: | |
| {sample_values} | |
| Return only a JSON object with the following structure, without any explanation: | |
| {{ | |
| "typos": {{ | |
| "original_value1": "corrected_value1", | |
| "original_value2": "corrected_value2", | |
| ... | |
| }} | |
| }} | |
| If no typos are found, return an empty object for "typos". | |
| """ | |
| ENCODE_STRING_PROMPT = """ | |
| Analyze the following unique values from the column '{column_name}' and create an encoding scheme. | |
| Assign a unique integer to each unique string value, starting from 0. | |
| Unique values: | |
| {unique_values} | |
| Return only a JSON object with the following structure, without any explanation: | |
| {{ | |
| "string_value1": 0, | |
| "string_value2": 1, | |
| "string_value3": 2, | |
| ... | |
| }} | |
| Ensure that each unique string value is assigned a unique integer. | |
| """ | |
| DETERMINE_DTYPE_PROMPT = """ | |
| Analyze the following sample values from a column and determine the most appropriate data type. | |
| Possible types are: float, integer, string, or date. | |
| If more than 80% of the values conform to a specific type, choose that type. | |
| Otherwise, default to string. | |
| Sample values: | |
| {sample_values} | |
| Return only a JSON object with the following structure, without any explanation: | |
| {{ | |
| "column_type": "detected_type", | |
| "invalid_indices": [list of indices that do not conform to the detected type] | |
| }} | |
| """ | |
| TRANSFORM_STRING_PROMPT = """ | |
| Transform the following unique string values from the column '{column_name}' to lowercase. | |
| If a value is a variation of "nan" (case-insensitive), map it to "nan". | |
| Unique values: | |
| {unique_values} | |
| Return only a JSON object with the following structure, without any explanation: | |
| {{ | |
| "original_value1": "transformed_value1", | |
| "original_value2": "transformed_value2", | |
| ... | |
| }} | |
| """ | |
| CHECK_LOW_COUNT_VALUES_PROMPT = """ | |
| Analyze the following value counts from the column '{column_name}' and identify values with a count lower than 2. | |
| Value counts: | |
| {value_counts} | |
| Return only a JSON list of values that have a count lower than 2, without any explanation. | |
| """ | |
| CHECK_SCHEMA_CONFORMITY_PROMPT = """ | |
| Analyze the following sample of values from the column '{column_name}' and check if they conform to the determined data type '{data_type}'. | |
| Sample values: | |
| {sample_values} | |
| Return only a JSON object with the following structure, without any explanation: | |
| {{ | |
| "conforming_indices": [list of indices of values that conform to the data type], | |
| "nonconforming_indices": [list of indices of values that do not conform to the data type] | |
| }} | |
| """ |