Parse a String with PHP’s preg_match_all()

Several times I’ve run into scenarios where I needed to essentially parse a string in PHP that wasn’t in a common format (JSON, CSV, tab-separated, etc.). Early on in my career I avoided regular expressions (RegEx) like the plague but a few years back I decided the time was right to embrace RegEx. Good thing I did, too, because with PHP’s preg_match_all() function, solving this requirement was a breeze

My scenario was, I’ve got the following string: Package #1 Box name: Medium Box : 6x4x3: W=1.4: Value=199.99: SKU=1 *  Mobile Phone 1.4lb; Package #2 Box name: Large Box : 10x7x5: W=0.7: Value=39.99: SKU=1 *  Phone Case 2.1lb;. This string is related to an order on an ecommerce store and it tells me that the best way to ship this order is in two separate boxes, once called “Medium Box” and the other called “Large Box.”

This string is not in an easy-to-parse format but it is consistent! What I needed to do was get everything between every occurrence of “Box Name: ” and the subsequent ” : “. Initially I considered using PHP’s substr() function in conjunction with strpos(). I’d use strpos() to work out where “Box Name: ” was, use strpos() again with an offset to look for the subsequent ” : “, adjust both those numbers, and feed that back into substr() as the start and end. What a pain. And, when I have a string that contains multiple boxes as my example does, I’m forced to loop and keep iterating up the offset. That could work, but not really a good solution.

Enter preg_match_all()! Three lines of code is all it took to prove it worked:


$comment = “Package #1 Box name: Medium Box : 6x4x3: W=1.4: Value=: SKU=1 * Samsung Mobile Phone 1.4lbs 1.4lb; Package #2 Box name: Large Box : 10x7x5: W=1.4: Value=: SKU=1 * Samsung Mobile Phone 1.4lbs 1.4lb;”;

preg_match_all(“/Box name: (.*?) : /”, $comment, $boxNames);

print_r($boxNames[1]);

The result from the print_r() on the last row there is: Array ( [0] => Medium Box [1] => Large Box ). That’s exactly what I needed. I could, of course, use additional preg_match_all()’s to find other elements like the box dimensions, weight, etc.

preg_match_all() takes three parameters in this scenario: a regular expression that covers the start element to look for (“Box Name: “) as well as the end element to look for (” : “); a string, and an output variable. Note that this line isn’t written as $boxNames = preg_match_all(…);, instead $boxNames is the third parameter.

$boxNames becomes an array of values. The first element of that array ($boxNames[0]) is the full match (in this case $boxNames[0][0] = “Box Name: Medium Box : “ and $boxNames[0][1] = “Box Name: Large Box : “). The second element of the array ($boxNames[1]) is the internal strings ($boxNames[1][0] = “Medium Box” and $boxNames[1][1] = “Large Box”). That’s the one I wanted so that’s the one I used. The array will contain as many matches as preg_match_all() finds (hence the “_all”).

Note

A tool like RegExr is useful here when putting together your expression. I’ve covered this in more detail previously.

Regular Expressions for Shipment Tracking Code Verification

I set up a tracking page recently for orders and realized that since we ship via 3 carriers (UPS, UPSMI, and USPS) I needed to sort out which API to call based on the tracking number (I didn’t want to force the user to remember the tracking method themselves when I could figure it out for them). Using some tracking number regular expressions I found at StackOverflow I compiled the following list. I’ll add more as I find them. I’ve tested all of them and they all seem to work for what I’ve needed.


UPS (1Z numbers)
\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b

UPS Mail Innovations (UPSMI)
MI[0-9]{6}(ABC[0-9]{7}|XYZ[0-9]{4})

USPS (4 Types)
(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)
^E\D{1}\d{9}\D{2}$|^9\d{15,21}$
^91[0-9]+$
^[A-Za-z]{2}[0-9]+US$

Notes:

  1. UPSMI is a bit strange in the way tracking numbers are set up. The tracking number starts with “MI” followed by your 6 character numeric UPSMI account number (this is different than your UPS account number which normally ends with “TT”). After that, it’s up to you to determine your format for your tracking number. We’re using one of two 3-character alphabetical codes followed by either a 4 or 7 character numeric code (that’s the “(ABC[0-9]{7}|XYZ[0-9]{4})” you’ll see in the regex). While the “MI[0-9]{6}” part is standard, I recommend writing and testing your own suffix with RegExr.

UPS Mail Innovations Sorting Facilities Codes

UPS Mail Innovations (UPSMI) is a partnership between UPS and the USPS where UPS picks up from the shipper, sorts the packages at one of a number of sorting centers throughout the US, ships them to another sorting center closest to the package’s destination, then hands it off to the USPS at one of USPS’ BMC (Bulk Mail Centers, now called NDCs or Network Distribution Centers) or SCF (Sectional Center Facilities). USPS then handles delivery from there. The advantage is that for mid-sized to larger shippers, rates are equal to USPS Media Mail but delivery times are equal to First Class plus one day. I’ll likely write more about UPSMI integration with WorldShip and other applications later.

I recently set up a connection to UPS’ tracking API which works for UPSMI packages as well now but ran into an issue with the information being returned. For standard small-package tracking, UPS’ tracking API returns the city and state for each activity (pickup, sorting, destination, etc.) but for UPSMI I was simply receiving facility codes instead of city and state. It took a while to track down packages which went through each of UPSMI’s facilities, but I eventually did and managed to put together the following PHP array which can be used to convert the facility code to the correct city and state.

USPS BMCs and SCFs don’t return the facility code, so you’re stuck either simply saying “Bulk Mail Center” or “Sectional Center Facility”, or setting up another API to get tracking information from the USPS once the package leaves the UPSMI system.

  $miFacilities = Array(
“BMC” => “Bulk Mail Center”,
“SCF” => “Sectional Center Facility”,
“OHGRV” => “Urbancrest, OH”,
“GATLA” => “Atlanta, GA”,
“TNLVR” => “La Vergne, TN”,
“CAFNN” => “Fontana, CA”,
“NJLOG” => “Logan Township, NJ”,
“WAABU” => “Auburn, WA”,
“MNMEN” => “Mendota Heights, MN”,
“ILCST” => “Carol Stream, IL”,
“UTWVY” => “West Valley City, UT”,
“TXOLL” => “Coppell, TX”,
“NCDHM” => “Durham, NC”,
“CALEA” => “San Leandro, CA”,
“MOKCY” => “Kansas City, MO”,
“AZTOL” => “Tolleson, AZ”,
“CTWDS” => “Windsor, CT”,
“FLORO” => “Orlando, FL”,
“NYEDG” => “Edgewood, NY”
);

I hope it saves someone else out there some time!

Stripping Everything but Numbers in PHP using RegEx

Here’s a useful little regular expression that will strip everything but numbers from a string in PHP.

preg_replace(“/[^0-9]/”,””,”604-619-5135″);

This uses PHP’s preg_replace function to replace everything but numbers with nothing. I used it to strip the dashes from ZIP+4 codes and also reformat phone numbers without forcing the user to enter it in my preferred format (if possible, it’s always better to clean up user input yourself than yell at the user for not entering it your way).

I gleaned this solution from “Chris” on StackOverflow so all credit goes to him. I recommend using RegExr when building your expression. I’ve covered this in more detail previously.

Generating Barcodes in PHP

I’ve discovered some applications which will allow you to create barcodes in PHP but never came across one which had been issued under MIT license (which allows full use for free, including commercial use, with a few restrictions). All the other applications I had found were quite expensive or licensed only for not-for-profit use.

Then, I stumbled upon this post by David Scott Tufts who had developed a PHP script which will generate barcodes “in four barcode formats including Code 128, Code 39, Code 2of5, and Codabar”. I downloaded his code and tried it out and, sure enough, it worked perfectly for the application I was building! Thanks to David Scott Tufts for saving me a significant amount of time, or quite a bit of money! His solution is quick, accurate, and simple.

To use his script, download his PHP file (he uses the .phps extension on the file to allow download, I believe, so I just changed that to .php before using). Upload the script to your server and use it as follows in your HTML:

<img src=”/images/barcode.php” alt=”barcode” />

Available parameters are:


text=string (Default: “0”)

size=int (Default: “20”)

codetype=(Code128|Code39|Code25|Codabar) (Default: “Code128”)

orientation=(horizontal|vertical) (Default: “Horizontal”)

Text: Should be a url-escaped string (e.g. “%20″ for ” “).

Size: 50 worked well for me. Play around to find the best size for your application, especially if printing.

Code Type: I used Code 128 for my application. The code you use may depend on your application or the scanner you expect to read the barcode with.

Orientation: Self-explanatory. Just make sure your IMG tag widths and heights reflect the orientation correctly.

That’s all there is to it! It took a few minutes to play around with the settings and get something that worked for what I needed, but compared to writing my own script it was a breeze.

Notes:

  1. This script requires PHP’s GD Library to be installed and available on the server.