Document datasets with .pdf files that are usable with pixparse libraries and tools.
			
	
	AI & ML interests
Document and User Interface Parsing, Understanding, Q&A.
			Organization Card
		
		Multi-modal document, image, and text datasets and models for document understanding, OCR, VQA tasks.
GitHub repos:
- Data Loading: chug- https://github.com/huggingface/chug
- Modelling: pixparse- coming soon
			models
			0
		
			
	None public yet
			datasets
			6
		
			
	
	
	
	
	pixparse/pdfa-eng-wds
			Viewer
			• 
	
				Updated
					
				• 
			
			7.1k
	
				• 
					
					3.84k
				
				• 
					
					154
				
pixparse/idl-wds
			Viewer
			• 
	
				Updated
					
				• 
			
			3.41M
	
				• 
					
					4.68k
				
				• 
					
					187
				
pixparse/docvqa-wds
	
				Updated
					
				
	
				• 
					
					231
				
				• 
					
					4
				
pixparse/docvqa-single-page-questions
			Viewer
			• 
	
				Updated
					
				• 
			
			50k
	
				• 
					
					1.04k
				
				• 
					
					10
				
pixparse/cc12m-wds
			Viewer
			• 
	
				Updated
					
				• 
			
			11M
	
				• 
					
					23.7k
				
				• 
					
					36
				
pixparse/cc3m-wds
			Viewer
			• 
	
				Updated
					
				• 
			
			2.93M
	
				• 
					
					11.5k
				
				• 
					
					42
				
