Quick & Easy Guide to Using Emoji in HTML

I truly realized I was an adult this morning when one of my younger colleagues had to explain to me what Snapchat is and how to use it 👴. Previously, my knowledge of Snapchat was limited to their technical background 💻, business story 📈, and IPO 💰. The way that emoji have become a part of language in the US has only been expedited by the proliferation of visually-oriented chat options like Snapchat. So, the time has come as a now almost middle aged web developer to embrace the emoji. But, how to do it? It’s pretty easy when you understand a few key concepts.

HTML Entities

Anyone who’s done much with HTML will be familiar with HTML entities. &,  , < > © etc. are part of your vocabulary. Simply put, these allow you to render special characters that either have special meaning in HTML or aren’t found on most keyboards. The list of HTML entities is generally restricted to “named” entities but, you can actually use a similar syntax to express non-“named” entities using Unicode codes.

Unicode

Unicode is a standard (and the organization behind the standard) that sets out how software should express characters in order to allow different systems to communicate clearly. If you sent an email to a friend asking them to pay you the $10 they owed you but the system they use interprets the “$” character as a “¥” then you’d be pretty upset when they send you 9 cents. Unicode seeks to eliminate these types of issues by making sure characters are consistent across systems. This is important for emoji because Unicode has defined codes for over 2,500 different emoji. These might be rendered differently on different systems (see the Apple gun vs. the more standard one used by Google) but they should be the same element just with different styles.

Unicode + HTML Entities = 😁

All you need to know to get emoji in your HTML is how to write non-named HTML entities using Unicode codes. It’s pretty easy, really:
1. Find the Unicode code of the emoji you want to use. For what Unicode call the “grinning face” this is “U+1F600”
2. Drop the “U+” from the front, that’s just an indicator that it’s a Unicode code. For the grinning face you’d get “1F600”. This is a hexadecimal number.
3. Add that to the HTML entity form for non-named entities with hexadecimal values: &#x—–; and you get 😀 which renders as 😀

Now all you have to worry about is making sure you don’t accidentally use the wrong combination and end up with a sexual innuendo (unless that’s what you’re going for 😉).

Note

  • Unicode’s Full Emoji List is a really useful guide for looking up the Unicode code of over 2,500 Emoji.
  • Not all systems, software, or devices support all of the Emoji but Unicode’s list actually shows you which systems do support each one. It’s likely not up to date but with emoji support increasing it’s more likely to not show one that is supported than to show one that’s not supported so it’s a solid guideline.
  • Since Emoji styles vary between systems, if you want full control of the style used on your site you’ll probably want to grab a CSS Emoji library that’s to your liking.

Parse a String with PHP’s preg_match_all()

Several times I’ve run into scenarios where I needed to essentially parse a string in PHP that wasn’t in a common format (JSON, CSV, tab-separated, etc.). Early on in my career I avoided regular expressions (RegEx) like the plague but a few years back I decided the time was right to embrace RegEx. Good thing I did, too, because with PHP’s preg_match_all() function, solving this requirement was a breeze

My scenario was, I’ve got the following string: Package #1 Box name: Medium Box : 6x4x3: W=1.4: Value=199.99: SKU=1 *  Mobile Phone 1.4lb; Package #2 Box name: Large Box : 10x7x5: W=0.7: Value=39.99: SKU=1 *  Phone Case 2.1lb;. This string is related to an order on an ecommerce store and it tells me that the best way to ship this order is in two separate boxes, once called “Medium Box” and the other called “Large Box.”

This string is not in an easy-to-parse format but it is consistent! What I needed to do was get everything between every occurrence of “Box Name: ” and the subsequent ” : “. Initially I considered using PHP’s substr() function in conjunction with strpos(). I’d use strpos() to work out where “Box Name: ” was, use strpos() again with an offset to look for the subsequent ” : “, adjust both those numbers, and feed that back into substr() as the start and end. What a pain. And, when I have a string that contains multiple boxes as my example does, I’m forced to loop and keep iterating up the offset. That could work, but not really a good solution.

Enter preg_match_all()! Three lines of code is all it took to prove it worked:


$comment = “Package #1 Box name: Medium Box : 6x4x3: W=1.4: Value=: SKU=1 * Samsung Mobile Phone 1.4lbs 1.4lb; Package #2 Box name: Large Box : 10x7x5: W=1.4: Value=: SKU=1 * Samsung Mobile Phone 1.4lbs 1.4lb;”;

preg_match_all(“/Box name: (.*?) : /”, $comment, $boxNames);

print_r($boxNames[1]);

The result from the print_r() on the last row there is: Array ( [0] => Medium Box [1] => Large Box ). That’s exactly what I needed. I could, of course, use additional preg_match_all()’s to find other elements like the box dimensions, weight, etc.

preg_match_all() takes three parameters in this scenario: a regular expression that covers the start element to look for (“Box Name: “) as well as the end element to look for (” : “); a string, and an output variable. Note that this line isn’t written as $boxNames = preg_match_all(…);, instead $boxNames is the third parameter.

$boxNames becomes an array of values. The first element of that array ($boxNames[0]) is the full match (in this case $boxNames[0][0] = “Box Name: Medium Box : “ and $boxNames[0][1] = “Box Name: Large Box : “). The second element of the array ($boxNames[1]) is the internal strings ($boxNames[1][0] = “Medium Box” and $boxNames[1][1] = “Large Box”). That’s the one I wanted so that’s the one I used. The array will contain as many matches as preg_match_all() finds (hence the “_all”).

Note

A tool like RegExr is useful here when putting together your expression. I’ve covered this in more detail previously.

Regular Expressions for Shipment Tracking Code Verification

I set up a tracking page recently for orders and realized that since we ship via 3 carriers (UPS, UPSMI, and USPS) I needed to sort out which API to call based on the tracking number (I didn’t want to force the user to remember the tracking method themselves when I could figure it out for them). Using some tracking number regular expressions I found at StackOverflow I compiled the following list. I’ll add more as I find them. I’ve tested all of them and they all seem to work for what I’ve needed.


UPS (1Z numbers)
\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b

UPS Mail Innovations (UPSMI)
MI[0-9]{6}(ABC[0-9]{7}|XYZ[0-9]{4})

USPS (4 Types)
(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)
^E\D{1}\d{9}\D{2}$|^9\d{15,21}$
^91[0-9]+$
^[A-Za-z]{2}[0-9]+US$

Notes:

  1. UPSMI is a bit strange in the way tracking numbers are set up. The tracking number starts with “MI” followed by your 6 character numeric UPSMI account number (this is different than your UPS account number which normally ends with “TT”). After that, it’s up to you to determine your format for your tracking number. We’re using one of two 3-character alphabetical codes followed by either a 4 or 7 character numeric code (that’s the “(ABC[0-9]{7}|XYZ[0-9]{4})” you’ll see in the regex). While the “MI[0-9]{6}” part is standard, I recommend writing and testing your own suffix with RegExr.

UPS Mail Innovations Sorting Facilities Codes

UPS Mail Innovations (UPSMI) is a partnership between UPS and the USPS where UPS picks up from the shipper, sorts the packages at one of a number of sorting centers throughout the US, ships them to another sorting center closest to the package’s destination, then hands it off to the USPS at one of USPS’ BMC (Bulk Mail Centers, now called NDCs or Network Distribution Centers) or SCF (Sectional Center Facilities). USPS then handles delivery from there. The advantage is that for mid-sized to larger shippers, rates are equal to USPS Media Mail but delivery times are equal to First Class plus one day. I’ll likely write more about UPSMI integration with WorldShip and other applications later.

I recently set up a connection to UPS’ tracking API which works for UPSMI packages as well now but ran into an issue with the information being returned. For standard small-package tracking, UPS’ tracking API returns the city and state for each activity (pickup, sorting, destination, etc.) but for UPSMI I was simply receiving facility codes instead of city and state. It took a while to track down packages which went through each of UPSMI’s facilities, but I eventually did and managed to put together the following PHP array which can be used to convert the facility code to the correct city and state.

USPS BMCs and SCFs don’t return the facility code, so you’re stuck either simply saying “Bulk Mail Center” or “Sectional Center Facility”, or setting up another API to get tracking information from the USPS once the package leaves the UPSMI system.

  $miFacilities = Array(
“BMC” => “Bulk Mail Center”,
“SCF” => “Sectional Center Facility”,
“OHGRV” => “Urbancrest, OH”,
“GATLA” => “Atlanta, GA”,
“TNLVR” => “La Vergne, TN”,
“CAFNN” => “Fontana, CA”,
“NJLOG” => “Logan Township, NJ”,
“WAABU” => “Auburn, WA”,
“MNMEN” => “Mendota Heights, MN”,
“ILCST” => “Carol Stream, IL”,
“UTWVY” => “West Valley City, UT”,
“TXOLL” => “Coppell, TX”,
“NCDHM” => “Durham, NC”,
“CALEA” => “San Leandro, CA”,
“MOKCY” => “Kansas City, MO”,
“AZTOL” => “Tolleson, AZ”,
“CTWDS” => “Windsor, CT”,
“FLORO” => “Orlando, FL”,
“NYEDG” => “Edgewood, NY”
);

I hope it saves someone else out there some time!

A Useful Tool for Testing Regular Expressions

Regular expressions are exceedingly useful, but also exceedingly painful to write. For me, at least. The variety of matching options and their myriad combinations makes it difficult to remember what does what and trial-and-error can be difficult. That’s where RegExr, a powerful tool built by gskinner.com comes in.

gskinner.com's RegExr
gskinner.com’s RegExr

To use RegExr enter your regular expression in the field at the top and write some text which contains information you’re trying to match in the large field. If your regular expression matches text, it’ll be highlighted. You can also hover your mouse of your regular expression to see a description of what that character or set does. Additionally, you can use the provided samples on the right to quickly and easily build your regex.

In my case I was looking to identify various shipping carriers from their tracking numbers. On an ecommerce site I manage we’ve developed a tracking page where the user can simply enter the tracking number we sent them and track their package. The complication here is that we ship via USPS, UPS, and UPS Mail Innovations, all of which use varied tracking codes. In RegExr I wrote a list of various valid and invalid tracking numbers from each of the carriers and tested regular expressions to ensure they matched the correct tracking numbers (and only the correct tracking numbers). In a future post I’ll post the expressions I found or wrote for this purpose.

 

Replacing Unwanted Characters in MySQL Field

I recently ran into an issue where and automated import wasn’t clearing newline characters before inserting entries into my database. This resulted in thousands of entries in the database containing unwanted newlines in certain fields. To resolve this issue I could have looped through all the entries with a PHP script, but this was a one-time fix (I corrected the issue in the import so it won’t happen again) and I didn’t want to write what would be a resource-consuming looping script in PHP.

What’s the solution, then? Using MySQL’s built-in REPLACE() function. It works like this:

UPDATE table SET field = REPLACE(field, “\n”, “”) WHERE id > “10201”;

Simple but effective. The REPLACE() function takes three parameters:

  1. The field to be searched.
  2. The string to be replaced.
  3. The replacement string.

In my case, I’m searching for the string “\n” in the field “field” and I’m replacing it with “” (nothing). I’m also restricting the replacements to only those rows with an id greater than 10201 since that’s where the import began to experience it’s issue. Why make MySQL do more work than it has to?

So, that’s the solution to this issue.

Restarting a MySQL Server from the Command Line

While it doesn’t come up frequently, every once in a while you’ll need to restart a MySQL server from the command line. On my current webserver we’re running a VPS with WHM. What this means is that accessing the MySQL daemon is not as simple as using “mysqld” from the shell. So here’s how to get it to work, based on information found at this blog post, and my own experience:


#/etc/init.d/mysql start

#/etc/init.d/mysql stop

#/etc/init.d/mysql restart

That will start, stop, and restart, respectively, the MySQL server on our WHM server.