ARCH-3P12026-06-15
Decompose research_ingestion.py (~3.7k lines — largest file in the audit)
details
Decompose
research_ingestion.py (~3.7k lines — largest file in the audit). Comments show it was assembled by merging three services. Merges ≥4 subsystems: doc ingest (L2169-2502), embedding across backends (L230-318), pgvector backend incl. SQL search (L853-1324), hybrid BM25+vector search engine (L2648-3212), GDPR forget/redact w/ HMAC (L1704-2087), streaming/sessions (L3213-3559), multimodal (L3560-3701) — plus two storage backends (Chroma+pgvector). Split into ingest / search / forget-compliance / streaming modules + shared embedding+storage libs. Also site of ASYNC-4.