Hi,
I am building a binary text classifier to classify sentences in research papers (cell culture medium research papers) in python by testing out the common algorithms for binary classification like linear svc, logistic regression, etc. The problem I’m facing is that even though there is a high accuracy when the model is trained with the collected data, when I try it with a complete research paper there are many common sentences that are irrelevant (which does not belong to either class) but the problem is, obviously for those sentences also a class output will be given by the model. How should I handle those irrelevant sentences ?,
Thanks in advance
1 post - 1 participant