Overview: Extracting article text from HTML documents
tomazkovacic.com
MARCH 24, 2011
Keep in mind, that back then, folks were still making websites with microsoft frontpage. The majority of methods presented in research papers from that epoch are nowadays useless due to strong assumptions and heuristics that don’t apply on today’s web development practices.
Let's personalize your content