diff --git a/graphrag_construct.html b/graphrag_construct.html
new file mode 100644
index 0000000000000000000000000000000000000000..90509da9aa7b1e24521b1518f14bc149e0a0270e
--- /dev/null
+++ b/graphrag_construct.html
@@ -0,0 +1,444 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0%
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/graphrag_demo.html b/graphrag_demo.html
new file mode 100644
index 0000000000000000000000000000000000000000..a4741e84738c6db2f33da452e2d6b831751f5057
--- /dev/null
+++ b/graphrag_demo.html
@@ -0,0 +1,459 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0%
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/graphrag_readme.md b/graphrag_readme.md
new file mode 100644
index 0000000000000000000000000000000000000000..8bd0a030339a2f1a90a2e3570c51be7e48c11aed
--- /dev/null
+++ b/graphrag_readme.md
@@ -0,0 +1,351 @@
+# GraphRAG README
+
+## Some fundamental concepts
+
+### Data Ingestion
+
+NOTE: mermaid.js diagrams below are based on some inspiring content from the [Connected Data London 2024: Entity Resolved Knowledge Graphs](https://github.com/DerwenAI/cdl2024_masterclass/blob/main/README.md) masterclass.
+
+```mermaid
+graph TD
+ %% Database shapes with consistent styling
+ SDS[(Structured Data Sources)]
+ UDS[(Unstructured Data Sources)]
+ LG[(lexical graph)]
+ SG[(semantic graph)]
+ VD[(vector database)]
+
+ %% Flow from structured data
+ SDS -->|PII features| ER[entity resolution]
+ SDS -->|data records| SG
+ SG -->|PII updates| ER
+ ER -->|semantic overlay| SG
+
+ %% Schema and ontology
+ ONT[schema, ontology, taxonomy, controlled vocabularies, etc.]
+ ONT --> SG
+
+ %% Flow from unstructured data
+ UDS --> K[text chunking function]
+ K --> NLP[NLP parse]
+ K --> EM[embedding model]
+ NLP --> E[NER, RE]
+ E --> LG
+ LG --> EL[entity linking]
+ EL <--> SG
+
+ %% Vector elements connections
+ EM --> VD
+ VD -.->|capture source chunk WITHIN references| SG
+
+ %% Thesaurus connection
+ ER -.->T[thesaurus]
+ T --> EL
+
+ %% Styling classes
+ classDef dataSource fill:#f4f4f4,stroke:#666,stroke-width:2px;
+ classDef storage fill:#e6f3ff,stroke:#4a90e2,stroke-width:2px;
+ classDef embedding fill:#fff3e6,stroke:#f5a623,stroke-width:2px;
+ classDef lexical fill:#f0e6ff,stroke:#4a90e2,stroke-width:2px;
+ classDef semantic fill:#f0e6ff,stroke:#9013fe,stroke-width:2px;
+ classDef reference fill:#e6ffe6,stroke:#417505,stroke-width:2px;
+
+ %% Apply styles by layer/type
+ class SDS,UDS dataSource;
+ class SG,VD storage;
+ class EM embedding;
+ class LG lexical;
+ class SG semantic;
+ class ONT,T reference;
+```
+
+### Augment LLM Inference
+
+```mermaid
+graph LR
+ %% Define database and special shapes
+ P[prompt]
+ SG[(semantic graph)]
+ VD[(vector database)]
+ LLM[LLM]
+ Z[response]
+
+ %% Main flow paths
+ P --> Q[generated query]
+ P --> EM[embedding model]
+
+ %% Upper path through graph elements
+ Q --> SG
+ SG --> W[semantic random walk]
+ T[thesaurus] --> W
+ W --> GA[graph analytics]
+
+ %% Lower path through vector elements
+ EM --> SS[vector similarity search]
+ SS --> VD
+
+ %% Node embeddings and chunk references
+ SG -.-|chunk references| VD
+ SS -->|node embeddings| SG
+
+ %% Final convergence
+ GA --> RI[ranked index]
+ VD --> RI
+ RI --> LLM
+ LLM --> Z
+
+ %% Styling classes
+ classDef dataSource fill:#f4f4f4,stroke:#666,stroke-width:2px;
+ classDef storage fill:#e6f3ff,stroke:#4a90e2,stroke-width:2px;
+ classDef embedding fill:#fff3e6,stroke:#f5a623,stroke-width:2px;
+ classDef lexical fill:#f0e6ff,stroke:#4a90e2,stroke-width:2px;
+ classDef semantic fill:#f0e6ff,stroke:#9013fe,stroke-width:2px;
+ classDef reference fill:#e6ffe6,stroke:#417505,stroke-width:2px;
+
+ %% Apply styles by layer/type
+ class SDS,UDS dataSource;
+ class SG,VD storage;
+ class EM embedding;
+ class LG lexical;
+ class SG semantic;
+ class ONT,T reference;
+```
+
+## Sequence Diagram - covering the current `strwythura` (structure) repo
+
+- the diagram below is largely based on the `demo.py` functions
+- I used [Prefect](https://www.prefect.io/) to `dig in` and reverse architect the flow...
+ - [graphrag_demo.py](./graphrag_demo.py) is my simple update to [Paco's original python code](./demo.py)
+ - I stuck to using Prefect function decorators based on the existing structure, but I'm looking forward to abstracting some of the concepts out further and thinking agentically.
+- Telemetry and instrumentation can often demystify complex processes, without the headaches of wading through long print statements. Some great insight often occurs when you can see how individual functions / components are interacting.
+ - this repo features a large and distinguished cast of open source models (GLiNER, GLiREL), open source embeddings (BGE, Word2Vec) and a vector store (LanceDB) for improved entity recognition and relationship extraction.
+- For a deeper dive, [Paco's YouTube video and associated diagrams](https://senzing.com/gph-graph-rag-llm-knowledge-graphs/) help highlight real-world use cases where effective Knowledge Graph construction can provide deeper meaning and insight.
+
+
+```mermaid
+sequenceDiagram
+ participant Main as Main Script
+ participant ConstructKG as construct_kg Flow
+ participant InitNLP as init_nlp Task
+ participant ScrapeHTML as scrape_html Task
+ participant MakeChunk as make_chunk Task
+ participant ParseText as parse_text Task
+ participant MakeEntity as make_entity Task
+ participant ExtractEntity as extract_entity Task
+ participant ExtractRelations as extract_relations Task
+ participant ConnectEntities as connect_entities Task
+ participant RunTextRank as run_textrank Task
+ participant AbstractOverlay as abstract_overlay Task
+ participant GenPyvis as gen_pyvis Task
+
+ Main->>ConstructKG: Start construct_kg flow
+ ConstructKG->>InitNLP: Initialize NLP pipeline
+ InitNLP-->>ConstructKG: Return NLP object
+
+ loop For each URL in url_list
+ ConstructKG->>ScrapeHTML: Scrape HTML content
+ ScrapeHTML->>MakeChunk: Create text chunks
+ MakeChunk-->>ScrapeHTML: Return chunk list
+ ScrapeHTML-->>ConstructKG: Return chunk list
+
+ loop For each chunk in chunk_list
+ ConstructKG->>ParseText: Parse text and build lex_graph
+ ParseText->>MakeEntity: Create entities from spans
+ MakeEntity-->>ParseText: Return entity
+ ParseText->>ExtractEntity: Extract and add entities to lex_graph
+ ExtractEntity-->>ParseText: Entity added to graph
+ ParseText->>ExtractRelations: Extract relations between entities
+ ExtractRelations-->>ParseText: Relations added to graph
+ ParseText->>ConnectEntities: Connect co-occurring entities
+ ConnectEntities-->>ParseText: Connections added to graph
+ ParseText-->>ConstructKG: Return parsed doc
+ end
+
+ ConstructKG->>RunTextRank: Run TextRank on lex_graph
+ RunTextRank-->>ConstructKG: Return ranked entities
+ ConstructKG->>AbstractOverlay: Overlay semantic graph
+ AbstractOverlay-->>ConstructKG: Overlay completed
+ end
+
+ ConstructKG->>GenPyvis: Generate Pyvis visualization
+ GenPyvis-->>ConstructKG: Visualization saved
+ ConstructKG-->>Main: Flow completed
+```
+
+## Run the code
+
+1. setup local Python environment and install Python dependencies
+
+ - I used Python 3.11, but 3.10 should work as well
+
+ ```bash
+ pip install -r requirements.txt
+ ```
+
+2. Start the local Prefect server
+
+ - follow the [self-hosted instructions](https://docs.prefect.io/v3/get-started/quickstart#connect-to-a-prefect-api) to launch the `Prefect UI`
+
+ ```python
+ prefect server start
+ ```
+
+3. run the `graphrag_demo.py` script
+
+ ```python
+ python graphrag_demo.py
+ ```
+
+## Appendix: Code Overview and Purpose
+
+- The code forms part of a talk for **GraphGeeks.org** about constructing **knowledge graphs** from **unstructured data sources**, such as web content.
+- It integrates web scraping, natural language processing (NLP), graph construction, and interactive visualization.
+
+---
+
+### **Key Components and Flow**
+
+#### **1. Model and Parameter Settings**
+- **Core Configuration**: Establishes the foundational settings like chunk size, embedding models (`BAAI/bge-small-en-v1.5`), and database URIs.
+- **NER Labels**: Defines entity categories such as `Person`, `Organization`, `Publication`, and `Technology`.
+- **Relation Types**: Configures relationships like `works_at`, `developed_by`, and `authored_by` for connecting entities.
+- **Scraping Parameters**: Sets user-agent headers for web requests.
+
+#### **2. Data Validation**
+- **Classes**:
+ - `TextChunk`: Represents segmented text chunks with their embeddings.
+ - `Entity`: Tracks extracted entities, their attributes, and relationships.
+- **Purpose**: Ensures data is clean and well-structured for downstream processing.
+
+#### **3. Data Collection**
+- **Functions**:
+ - `scrape_html`: Fetches and parses webpage content.
+ - `uni_scrubber`: Cleans Unicode and formatting issues.
+ - `make_chunk`: Segments long text into manageable chunks for embedding.
+- **Role**: Prepares raw, unstructured data for structured analysis.
+
+#### **4. Lexical Graph Construction**
+- **Initialization**:
+ - `init_nlp`: Sets up NLP pipelines with spaCy, GLiNER (NER), and GLiREL (RE).
+- **Graph Parsing**:
+ - `parse_text`: Creates lexical graphs using TextRank algorithms.
+ - `make_entity`: Extracts and integrates entities into the graph.
+ - `connect_entities`: Links entities co-occurring in the same context.
+- **Purpose**: Converts text into a structured, connected graph of entities and relationships.
+
+#### **5. Numerical Processing**
+- **Functions**:
+ - `calc_quantile_bins`: Creates quantile bins for numerical data.
+ - `root_mean_square`: Computes RMS for normalization.
+ - `stripe_column`: Applies quantile binning to data columns.
+- **Role**: Provides statistical operations to refine and rank graph components.
+
+#### **6. TextRank Implementation**
+- **Functions**:
+ - `run_textrank`: Ranks entities in the graph based on a PageRank-inspired algorithm.
+- **Purpose**: Identifies and prioritizes key entities for knowledge graph construction.
+
+#### **7. Semantic Overlay**
+- **Functions**:
+ - `abstract_overlay`: Abstracts a semantic layer from the lexical graph.
+ - Connects entities to their originating text chunks for context preservation.
+- **Role**: Enhances the graph with higher-order relationships and semantic depth.
+
+#### **8. Visualization**
+- **Tool**: `pyvis`
+- **Functions**:
+ - `gen_pyvis`: Creates an interactive visualization of the knowledge graph.
+- **Features**:
+ - Node sizing reflects entity importance.
+ - Physics-based layout supports intuitive exploration.
+
+#### **9. Orchestration**
+- **Function**:
+ - `construct_kg`: Orchestrates the full pipeline from data collection to visualization.
+- **Purpose**: Ensures the seamless integration of all layers and components.
+
+---
+
+### **Notable Implementation Details**
+
+- **Multi-Layer Graph Representation**: Combines lexical and semantic graphs for layered analysis.
+- **Vector Embedding Integration**: Enhances entity representation with embeddings.
+- **Error Handling and Debugging**: Includes robust logging and debugging features.
+- **Scalability**: Designed for handling diverse and large datasets with dynamic relationships.
+
+---
+
+## Appendix: Architectural Workflow
+
+### **1. Architectural Workflow: A Layered Approach to Knowledge Graph Construction**
+
+#### **1.1 Workflow Layers**
+
+**Data Ingestion:**
+- Role: Extract raw data from structured and unstructured sources for downstream processing.
+- Responsibilities: Handle diverse data formats, ensure quality, and standardize for analysis.
+- Requirements: Reliable scraping, parsing, and chunking mechanisms to prepare data for embedding and analysis.
+
+**Lexical Graph Construction:**
+- Role: Build a foundational graph by integrating tokenized data and semantic relationships.
+- Responsibilities: Identify key entities through tokenization and ranking (e.g., TextRank).
+- Requirements: Efficient methods for integrating named entities and relationships into a coherent graph structure.
+
+**Entity and Relation Extraction:**
+- Role: Identify and label entities, along with their relationships, to enrich the graph structure.
+- Responsibilities: Extract domain-specific entities (NER) and relationships (RE) to add connectivity.
+- Requirements: Domain-tuned models and algorithms for accurate extraction.
+
+**Graph Construction and Visualization:**
+- Role: Develop and display the knowledge graph to facilitate analysis and decision-making.
+- Responsibilities: Create a graph structure using tools like NetworkX and enable exploration with interactive visualizations (e.g., PyVis).
+- Requirements: Scalable graph-building frameworks and intuitive visualization tools.
+
+**Semantic Overlay:**
+- Role: Enhance the graph with additional context and reasoning capabilities.
+- Responsibilities: Integrate ontologies, taxonomies, and domain-specific knowledge to provide depth and precision.
+- Requirements: Mechanisms to map structured data into graph elements and ensure consistency with existing knowledge bases.
+
+
+### **2. Visualized Workflow**
+
+#### **2.1 Logical Data Flow**
+
+```mermaid
+graph TD
+A[Raw Data] -->|Scrape| B[Chunks]
+B -->|Lexical Parsing| C[Lexical Graph]
+C -->|NER + RE| D[Entities and Relations]
+D -->|Construct KG| E[Knowledge Graph]
+E -->|Overlay Ontologies| F[Enriched Graph]
+F -->|Visualize| G[Interactive View]
+```
+
+---
+
+### **3. Glossary**
+
+| **Participant** | **Description** | **Workflow Layer** |
+|--------------------------------|---------------------------------------------------------------------------------------------------|-------------------------------------|
+| **HTML Scraper (BeautifulSoup)** | Fetches unstructured text data from web sources. | Data Ingestion |
+| **Text Chunker** | Breaks raw text into manageable chunks (e.g., 1024 tokens) and prepares them for embedding. | Data Ingestion |
+| **SpaCy Pipeline** | Processes chunks and integrates GLiNER and GLiREL for entity and relation extraction. | Entity and Relation Extraction |
+| **Embedding Model (bge-small-en-v1.5)** | Captures lower-level lexical meanings of text and translates them into machine-readable vector representations. | Data Ingestion |
+| **GLiNER** | Identifies domain-specific entities and returns labeled outputs. | Entity and Relation Extraction |
+| **GLiREL** | Extracts relationships between identified entities, adding connectivity to the graph. | Entity and Relation Extraction |
+| **Vector Database (LanceDB)** | Stores chunk embeddings for efficient querying in downstream tasks. | Data Ingestion |
+| **Word2Vec (Gensim)** | Generates entity embeddings based on graph co-occurrence for additional analysis. | Semantic Graph Construction |
+| **Graph Constructor (NetworkX)** | Builds and analyzes the knowledge graph, ranking entities using TextRank. | Graph Construction and Visualization |
+| **Graph Visualizer (PyVis)** | Provides an interactive visualization of the knowledge graph for interpretability. | Graph Construction and Visualization |
+
+## Citations: giving credit where credit is due...
+
+Inspired by the great work done by multiple individuals who created the [Connected Data London 2024: Entity Resolved Knowledge Graphs](https://github.com/donbr/cdl2024_masterclass/blob/main/README.md) masterclass I created this document to highlight areas that rang true.
+
+- Paco Nathan https://senzing.com/consult-entity-resolution-paco/
+- Clair Sullivan https://clairsullivan.com/
+- Louis Guitton https://guitton.co/
+- Jeff Butcher https://github.com/jbutcher21
+- Michael Dockter https://github.com/docktermj
+
+The code to use GLiNER and GLiREL started as a fork of one of four repos that make up the masterclass.
diff --git a/gtp_aurk_inhibitors.html b/gtp_aurk_inhibitors.html
new file mode 100644
index 0000000000000000000000000000000000000000..d85a481687d7ea344f73a848880f4c27f1664eef
--- /dev/null
+++ b/gtp_aurk_inhibitors.html
@@ -0,0 +1,222 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0%
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/pmid_35559673_interactions_network.html b/pmid_35559673_interactions_network.html
new file mode 100644
index 0000000000000000000000000000000000000000..797170cb8562bb4ad5ecca1380c3b53c7f54e377
--- /dev/null
+++ b/pmid_35559673_interactions_network.html
@@ -0,0 +1,14 @@
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/winston_churchill_we_shall_fight_speech_june_1940_txt.html b/winston_churchill_we_shall_fight_speech_june_1940_txt.html
new file mode 100644
index 0000000000000000000000000000000000000000..f5d6debeee44a85ebb4c171ee5823150cb9e2265
--- /dev/null
+++ b/winston_churchill_we_shall_fight_speech_june_1940_txt.html
@@ -0,0 +1,189 @@
+
+
+
+
+
+
+
+
+
+