Overview: Extracting article text from HTML documents
tomazkovacic.com
MARCH 24, 2011
The majority of methods presented in research papers from that epoch are nowadays useless due to strong assumptions and heuristics that don’t apply on today’s web development practices. In the following chapters I’ll try to review some article text extraction methods that are applicable to today’s websites.
Let's personalize your content