Beyond Titles, Embracing Data Analysis in Every Tech Role


6 months ago -  
LLMProgrammersOutlookC#Data AnalysisAutomation

I use LLMs daily for a wide range of activities, including personal and work projects, as well as for my blog. They help me automate tasks, analyse data, and make decisions, thereby simplifying my life and enhancing work efficiency. Last week, my boss requested a report on our current status, including areas where we are investing more time and money. I had already developed a system to send daily log reports to his email following each stand-up meeting with my team. However, the challenge was that I had to sift through a vast number of emails in my inbox to summarise them., which was cumbersome for preparing the report. This prompted me to think, “Why not leverage the logs I already have to generate the required report?” My curiosity once again proved beneficial.

cover
Image Generated By DALL-E

We use Outlook for our internal emails, and the daily logs I send are in a specific format designed to simplify data extraction. Since I don’t store the logs in a database, I needed to find a way to export the data from Outlook. By filtering the sent emails and forwarding them to myself, I discovered an interesting feature: by selecting a group of emails and dragging them into a new email, they are attached as separate files in an ‘eml’ format. This standard, used by various email clients including Outlook, saves emails in plain text in MIME format. It encapsulates the email’s header and body, along with attachments in one or more formats, making it easy to parse with the programming language of your choice.

Parsing the eml files

I initially attempted to parse the files using JavaScript, as there’s a library named eml-format specifically designed for parsing .eml files. I wrote a simple script to parse the files and extract the information I needed, managing to retrieve the date, the subject, and the body of the email:

import emlformat from "eml-format";

const readEmlFormat = (eml: string): Promise<ReadEml> =>
  new Promise((resolve, reject) => {
    emlformat.read(eml, function (error, data) {
      if (error) return reject(error);
      resolve(data);
    });
  });

const files = fs.readdirSync(path.join(__dirname, "emails"), "utf-8");
for (const f of files) {
  const file = Bun.file(path.join(__dirname, "emails", f));
  const parsed = await readEmlFormat(file);
  ...
}

However, for some emails, the parsing resulted in null. Given my proficiency in C# 😃 and considering this task to be inherently Microsofty, I opted to address the problem using C#. Although I had a preference for parsing the files manually, I acknowledged the potential for edge cases in such tasks. Consequently, I searched NuGet and, to my delight, discovered an excellent package tailored for this purpose. Employing this package, I successfully parsed all the files.

using MimeKit;

var emails = new List<Email>();
var files = Directory.GetFiles(@"./emails");
foreach (var file in files)
{
    var message = MimeMessage.Load(file);
    var email = new Email(
        Path.GetFileName(file),
        message.Subject, message.HtmlBody, message.TextBody,
        message.Date.ToString("yyyy-MM-dd"));
    emails.Add(email);
}
record Email(string FileName, string Subject, ...);

The challenge was solved, and the next step for me was to analyse the data and prepare the report. Now that I had the emails as a huge JSON file, I iterated over the file and passed the text to my LLM to make a summary of each email with a prompt like this:

var response = await client.PostAsJsonAsync("...", new
{
    prompt = "Given the following email, summarise it. The priority is to get the main idea of the email. It should be short and concise." + email.TextBody,
});

Now I had another JSON file with this structure:

[
  {
    "date": "...",
    "summary": "..."
  },
  ...
]

Now the last step was to summarise the summaries with a prompt like this:

var response = await client.PostAsJsonAsync("...", new
{
    prompt = "Given this JSON file, provide me with a report in a human-readable format, using PowerPoint slides format. For example: Slide #1 ..., Slide #2 ...",
});

Now I just need to copy the result and place it in a PowerPoint file. 🤩

PS: I did spend more time on this to satisfy my curiosity, but this was the entire journey in a nutshell. I hope you enjoyed it. 😃