Step 1: Extract Email Addresses
from Outlook
From within Outlook, select menu item File->Import and Export.
i) Run the wizard. Export to a file→Comma
Separated Values (Windows)→Inbox→C:\Temp\MyInbox.csv.
ii) Click on "Map Fields" button and remove all the fields
except From: (Name), From: (Address), To: (Name), To: (Address),
CC: (Name), CC: (Address), BCC: (Name) and BCC: (Address).
Step 2: Load C:\Temp\MyInbox.csv
Using your favorite ETL tool, load C:\Temp\MyInbox.csv. You
now have a table containing eight columns.
Step 3: Normalize the table
We need to rearrange eight columns into just two columns
- Names and Addresses. Within your ETL tool, break these eight
columns into four tables of two columns each and then run
a "Union" type transformation to append four tables into one.
Step 4: Split multiple recipients into individuals
It is common to see an email being copied to multiple recipients.
These recipients are separated by semicolons in the Outlook
generated file. We need to split multiple recipients into
individuals. Both the columns, Names as well as Addresses,
would need to be split.
Step 5: Merge names and addresses
We now need to merge all the columns from the previous steps
into a new table containing the name and the corresponding
email address. At the end of this step, we have a table containing
the name and email address of each recipient.
Step 6: Filter data with legal SMTP addresses
At this point, your table may contain some addresses that:
Use the following regular expression to filter out illegal
SMTP addresses:
^[\w-\.]{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,3}$
Step 7: Remove duplicates
After merging from, to, cc, and bcc list, you now will be
left with many duplicate entries. The last step is use the
mechanism provided by your ETL tool to remove such duplicates.
You now have a result that does not contain illegal email
addresses or duplicate records.
You can extend the mechanism to process your "Sent Mail"
folder as well as other folders.
|