Split a PDF at Each Change in Header

A colleague of mine receives a detailed 600-page PDF commission report showing all orders placed in each of many accounts, similar to a phone bill showing charges under each line of service. How can I split the PDF into a separate file for each account?

The report that needs split is pretty simple, just huge. The first page is a cover page, and then every page that follows has details of every order placed. The data is paginated by account number, which is also shown in the top-left corner of every sheet.

Automation intent: When I specify a PDF file, split it into new files for every change in the customer ID at the top left corner. Save every split file as the customer ID.pdf, and then email select customers a copy of their respective file.

Here’s basically what we want to achieve:

Long ago I demonstrated here how to append CSV files together into a single file. Turns out, Power Automate lets you do treat PDFs nearly the same way. You can split parts of PDF documents to new files, append PDF files together (merge), extract text from pages and use it to make decisions, and more.

General Outline

The general process of the final flow I came up with is below.

  1. First, we prompt the user for PDF file that needs split. This shows as a familiar File Open common dialog box, which follows a quick “get special folder” step first so we can open the dialog to the user’s personal Desktop folder (which in this case is likely to be where the main file exists).
  2. Next, we use the name of the original file to create a folder with that same name next to it, which is where all the child files created will be stored. As a courtesy to the user we also open this folder in File Explorer so they can see files appearing as they are created. We also store this folder name as a variable so we can use it later if we send any of these attachments by email.
  3. Now, we enter the main loop. Starting with Page 1 of the main PDF, the flow extracts the text of the page and looks for the “Main Partner: ” text. If the text is not found, the page is simply skipped.
  4. If the “Main Partner: ” text is found, we first take note of the account number that follows it. Then we note the current page number as the starting range, and proceed to the next page and check again. If the partner number on the next page is the same as the prior page, we update our end range and continue to the following page.
  5. Once we come to a page where the main partner number is different, we know we’ve hit a new account. Note the prior page (which must be the last page for the prior customer) as the end range, then save the current Start-End page ranges noted as a separate file.
    • Oddly, there is no way to check how long a PDF is, so here I also deploy a check to tell if we’ve reached the final page of the file. If the current page is identical to the prior page, it means we received the same text extraction for both and we’re actually at the end of the document. So (in this case only) we export the current noted page range and end our flow.
  6. Return to step 3 for the now-current page number and continue again.
  7. Once we’re done looping through all pages of the PDF, the flow also provides a message to the user to inform them of the completion and to offer to send specific predetermined files to their final recipients (account holders). This is implemented with just a few “Send email through Outlook” steps.
The entire flow.

Example Case

Although the original file is over five hundred pages long, my example here is only 7 pages:

  • Page 1 is a cover
  • The first customer account is on pages 2-4
  • The second customer only has data on page 5.
  • The last customer has data on pages 6-7.
My sample file with order data for 3 fake customer accounts.

Here’s the output of the flow showing the folder and files created (the file names also include the original page numbers of the larger document, but this is optional and we didn’t use it in our actual final version of this flow.

The output.

The team responsible has told me this saves them a ton of time manually splitting this file to get the right data pages for the select customers that receive a copy of it, and this flow was fun to build!

Discover more from CmdrKeene's Blog:

Subscribe to get the latest posts sent to your email.