Meandering through US states and territories with Julia, never looking back. Well, just their USPS codes.

This is not going to be as didactic (and thus not nearly as long) as my last post, but I thought I'd share my Julia code for this week's Riddler Classic.

The puzzle

The idea is to chain USPS 2-letter codes for US states & territories, overlapping 1 character in between, and not repeating any code in a chain. You can think of it as states/territories share a border if they share a letter in their 2-letter code. Or as they put it:

The challenge is to find the longest string of letters in which (1) every pair of consecutive letters is a two-letter state or territory abbreviation, and (2) no state abbreviation occurs more than once. For example, Guam, Utah and Texas can be combined into the valid four-letter string GUTX. Another valid string is ALAK (Alabama, Louisiana and Alaska), while ALAL (Alabama, Louisiana and Alabama) is invalid because it includes the same state, Alabama, twice.

Wissner-Gross, Z. https://fivethirtyeight.com/features/can-you-help-dakota-jones-raid-the-lost-arc/.

They also provided a link to a USPS page with a list of codes.

Julia solution

I first just copy/pasted the codes into a raw string, and parsed it. I also added the codes for reaching the armed forces.

tbl = """
       State/Possession

       Abbreviation

       Alabama

       AL

       Alaska

       AK
       …

       Wisconsin
       
       WI

       Wyoming

       WY
"""

codes = vcat(map(strip, split(tbl, "\n\n")[4:2:end]), ["AA", "AP", "AE"])

Because these are always 2-letter codes, the simplest solution is probably to use a Char-to-Char multimap. I opted to use a Trie/prefix tree:

using DataStructures

function addseqs!(results, prefix, t)
  top = prefix[1:2]
  if length(results[top]) < length(prefix)
    println("Replacing $(results[top]) -> $prefix.")
    results[top] = prefix
  end
  c = string(prefix[end])
  s = subtrie(t, c)
  s == nothing && return
  for k in keys(s)
    t = deepcopy(t)
    delete!(t, string(c,k))
    addseqs!(results, string(prefix, k[end]), t)
  end
end

This function takes results to keep track of the longest observed sequence, prefix — the current state of the string, and t, the Trie with the currently available codes. It builds a sub-trie based on the end of prefix, and recursively calls itself with a pruned trie.

The search gets kicked off with this function:

function getlongest(codes)
  results = DefaultDict{String,String}("")
  for first in codes
    t = Trie(codes)
    delete!(t, first)
    addseqs!(results, first, t)
  end
  results
end

Both of these functions rely on deleting a key from the Trie, which actually, is not provided in this library. In Julia however, it's easy to create new methods for types you don't own:

function Base.delete!(t::Trie{T}, k::AbstractString) where T
  node = subtrie(t, k)
  if node != nothing node.is_key = false end
  t
end

Solution

There's not much to do from here. Running:

@time results = getlongest(codes)
# Abridged output:
#
#  Replacing  -> FM.
#  Replacing FM -> FMT.
#  Replacing FMT -> FMTX.
#  Replacing FMTX -> FMTNM.
#  Replacing FMTNM -> FMTNMP.
#  Replacing FMTNMP -> FMTNMPA.
#  Replacing FMTNMPA -> FMTNMPAP.
#  Replacing FMTNMPAP -> FMTNMPAPR.
#  Replacing FMTNMPAPR -> FMTNMPAPRI.
#  Replacing FMTNMPAPRI -> FMTNMPAPRID.
#  Replacing FMTNMPAPRID -> FMTNMPAPRIDE.
#  Replacing FMTNMPAPRIDE -> FMTNMPAPRIDCA.
#  Replacing FMTNMPAPRIDCA -> FMTNMPAPRIDCAK.
#  Replacing FMTNMPAPRIDCAK -> FMTNMPAPRIDCAKY.
#  Replacing FMTNMPAPRIDCAKY -> FMTNMPAPRIDCAKSD.
#  Replacing FMTNMPAPRIDCAKSD -> FMTNMPAPRIDCAKSCT.
#  Replacing FMTNMPAPRIDCAKSCT -> FMTNMPAPRIDCAKSCOK.
#  Replacing FMTNMPAPRIDCAKSCOK -> FMTNMPAPRIDCAKSCOHI.
#  Replacing FMTNMPAPRIDCAKSCOHI -> FMTNMPAPRIDCAKSCOHIA.
#  Replacing FMTNMPAPRIDCAKSCOHIA -> FMTNMPAPRIDCAKSCOHIAA.
#  Replacing FMTNMPAPRIDCAKSCOHIAA -> FMTNMPAPRIDCAKSCOHIAAL.
#  Replacing FMTNMPAPRIDCAKSCOHIAAL -> FMTNMPAPRIDCAKSCOHIAALA.
#  Replacing FMTNMPAPRIDCAKSCOHIAALA -> FMTNMPAPRIDCAKSCOHIAALAE.
#  Replacing FMTNMPAPRIDCAKSCOHIAALAE -> FMTNMPAPRIDCAKSCOHINVAALA.
#  Replacing FMTNMPAPRIDCAKSCOHINVAALA -> FMTNMPAPRIDCAKSCOHINVAALAE.
#  Replacing FMTNMPAPRIDCAKSCOHINVAALAE -> FMTNMPAPRIAKSDCAALASCOHINVA.
#  Replacing FMTNMPAPRIAKSDCAALASCOHINVA -> FMTNMPAPWAKSDCAALARIASCOHINJ.
#  Replacing FMTNMPAPWAKSDCAALARIASCOHINJ -> FMTNMPAPWAKSDCAALARIASCOHINVA.
#  Replacing FMTNMPAPWAKSDCAALARIASCOHINVA -> FMTNMPAPWVAKSDCAALARIASCOHINVI.
#  Replacing FMTNMPAPWVAKSDCAALARIASCOHINVI -> FMPAPWAKSDCAALARIASCTNMNVINCOHI.
#  Replacing FMPAPWAKSDCAALARIASCTNMNVINCOHI -> FMPAPWVAKSDCAALARIASCTNMNVINCOHI.
#
#  1631.254820 seconds (4.47 G allocations: 591.410 GiB, 11.13% gc time)

You can see this was probably not a very good approach, with 4 billion allocations (probably 1 for each deepcopy). Note that actual memory usage did not exceed ~1GB.

As you might have guessed, I only showed the output for the actual longest walk starting from FM (Federated States of Micronesia, and provider of the TLD of your favorite podcast's domain, I'm sure).

To find the longest sequence:

maximum(v -> (length(v), v), values(results))
# Output:
#  (32, "FMPAPWVAKSDCAALARIASCTNMNVINCOHI")

Itinerary

This is quite the mileage run:

  1. FM: Federated Sates of Micronesia
  2. MP: Northern Mariana Islands
  3. PA: Pennsylvania
  4. AP: Armed Forces Pacific
  5. PW: Palau
  6. WV: West Virginia
  7. VA: Virginia
  8. AK: Alaska
  9. KS: Kansas
  10. SD: South Dakota
  11. DC: District of Columbia
  12. CA: California
  13. AA: Armed Forces Americas
  14. AL: Alabama
  15. LA: Louisiana
  16. AR: Arkansas
  17. RI: Rhode Island
  18. IA: Iowa
  19. AS: American Samoa
  20. SC: South Carolina
  21. CT: Connecticut
  22. TN: Tennessee
  23. NM: New Mexico
  24. MN: Minnesota
  25. NV: Nevada
  26. VI: Virgin Islands
  27. IN: Indiana
  28. NC: North Carolina
  29. CO: Colorado
  30. OH: Ohio
  31. HI: Hawaii

If this doesn't get you elite status …