I am working on an email extraction process using Java, Spring boot, and IMAP to read emails from Gmail. The process works fine for most emails, extracting only the text content. However, one specific email template is causing an issue—it retrieves the entire HTML and CSS instead of just the text.
Current Setup:
- Using IMAP to fetch emails from Gmail.
- Processing emails with JavaMail API (javax.mail), MimeMessage, and extracting text using Multipart and BodyPart parsing.
- Other emails are processed correctly, extracting only the text content.
Issue:
- A specific email template always returns the full HTML & CSS instead of plain text.
- Other email templates extract text as expected.
- The email is structured in a way that seems different from the others, but I am unsure why the extraction process fails for this one.
Code Snippet:
private String extractTextFromMessage(Message message) throws Exception {
if (message.isMimeType("text/plain")) {
return message.getContent().toString();
} else if (message.isMimeType("multipart/*")) {
Multipart multipart = (Multipart) message.getContent();
for (int i = 0; i < multipart.getCount(); i++) {
BodyPart bodyPart = multipart.getBodyPart(i);
if (bodyPart.isMimeType("text/plain")) {
return bodyPart.getContent().toString();
}
}
}
return "No text content found";
}
What I Tried:
- Ensured the email content type is checked properly.
- Iterated through** Multipart to find text/plain.**
- Debugged the email's raw content and found that the problematic email's text is embedded within HTML or elements.
- Tried using Jsoup (Jsoup.parse(html).text()) to extract text from the HTML but still faced issues with unwanted CSS and formatting.
Questions:
- Why is this particular email returning full HTML while others return only text?
- Is there a way to handle such emails differently and extract only meaningful text?
- Are there specific Gmail or IMAP quirks that could cause this issue?
Any insights or suggestions would be helpful and appreciated!
I have attached the log details for reference.
Success log:
15:13:49.801 [scheduling-1] INFO c.q.H.util.EmailReaderUtil - Processing message with subject: subject name
15:13:51.321 [scheduling-1] DEBUG c.q.H.templates.TradeIn - Full email content: temp name
------------------------------
Security Note: tempate details
------------------------------
*details*
125,688km, 2350cc, Automatic, Petrol
$12,345
Stock number: 12345
*View listing on Trade Me <image>*
------------------------------
*Member enquiry:*
*Type: * temp
*Name:* name name
*Email: * [email protected]
*Phone number: * 123456789
*Location: * adderesss
------------------------------
coments
------------------------------
*Trade-in vehicle details*
*Rego: * 12345
*Vehicle: * vh name
*Sub model: * 12344 S
*Colour: * Grey
*Vehicle type: * Ute
*Doors: * 4
*Seats: * 5
*Fuel type: * Diesel
*Transmission: * Manual
*Engine size: * 12345cc
*Origin: * N
------------------------------
*Vehicle images *
Contact
<response>
Contact enquirer
Error log:
07:28:32.416 [scheduling-1] INFO c.q.H.c.EmailProcessingController - Scheduler triggered: Processing unread emails.
07:28:35.497 [scheduling-1] INFO c.q.H.util.EmailReaderUtil - Processing message with subject: examp temp
07:28:36.638 [scheduling-1] DEBUG c.q.H.templates.temp - Full email content: <!DOCTYPE html>
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-16">
<style type="text/css">
td {
font-family: Verdana,Tahoma,Arial, "Sans Serif";
font-size: 10pt;
}
body {
font-family: Verdana,Tahoma,Arial, "Sans Serif";
font-size: 10pt;
}
table td {
border-collapse: collapse;
}
#footer-ad {
width: 700;
}
.msoFix {
mso-table-lspace:-1pt; mso-table-rspace:-1pt;
}
@media (max-width: 650px) {
table table table {
width: 100% !important;
}
table table table+table {
float: left !important;
}
table table table+table td {
text-align: center;
}
table table table {
width: 100% !important;
}
table table table+table {
float: left !important;
}
table table table+table td {
text-align: center;
}
#trademe-logo {
text-align: center;
}
body>table {
width: 100% !important;
}
body>table:last-of-type {
border: 1px solid white;
height: auto !important;
}
body>table:last-of-type img {
width: 100% !important;
height: auto !important;
}
}
@media screen and (min-width: 601px) {
.container {
width: 600px!important;
}
}
</style>
</head>
<tr>
<td width="30%"><b><span style="font-family:Arial,sans-serif;font-size:14px;color:#4C4646;letter-spacing:0.18px;line-height:23px;text-align:left;">
Name: </span></b></td>
<td><span style="font-family:Arial,sans-serif;font-size:14px;color:#4C4646;letter-spacing:0.18px;line-height:23px;text-align:left;">example name </span></td>
</tr>
**more....**
</table>
</td>
</tr>
</table>
<img width="1" height="1" alt="" src="http://link"></body>
</html>