Riddler Featured

The longest interstate walk

Meandering through US states and territories with Julia, never looking back. Well, just their USPS codes.

Sujeet Akula

16 Sep 2019 • 3 min read

Meandering through US states and territories with Julia, never looking back. Well, just their USPS codes.

This is not going to be as didactic (and thus not nearly as long) as my last post, but I thought I'd share my Julia code for this week's Riddler Classic.

The puzzle

The idea is to chain USPS 2-letter codes for US states & territories, overlapping 1 character in between, and not repeating any code in a chain. You can think of it as states/territories share a border if they share a letter in their 2-letter code. Or as they put it:

The challenge is to find the longest string of letters in which (1) every pair of consecutive letters is a two-letter state or territory abbreviation, and (2) no state abbreviation occurs more than once. For example, Guam, Utah and Texas can be combined into the valid four-letter string GUTX. Another valid string is ALAK (Alabama, Louisiana and Alaska), while ALAL (Alabama, Louisiana and Alabama) is invalid because it includes the same state, Alabama, twice.

Wissner-Gross, Z. https://fivethirtyeight.com/features/can-you-help-dakota-jones-raid-the-lost-arc/.

They also provided a link to a USPS page with a list of codes.

Julia solution

I first just copy/pasted the codes into a raw string, and parsed it. I also added the codes for reaching the armed forces.

tbl = """
       State/Possession

       Abbreviation

       Alabama

       AL

       Alaska

       AK
       …

       Wisconsin
       
       WI

       Wyoming

       WY
"""

codes = vcat(map(strip, split(tbl, "\n\n")[4:2:end]), ["AA", "AP", "AE"])

Because these are always 2-letter codes, the simplest solution is probably to use a Char-to-Char multimap. I opted to use a Trie/prefix tree:

using DataStructures

function addseqs!(results, prefix, t)
  top = prefix[1:2]
  if length(results[top]) < length(prefix)
    println("Replacing $(results[top]) -> $prefix.")
    results[top] = prefix
  end
  c = string(prefix[end])
  s = subtrie(t, c)
  s == nothing && return
  for k in keys(s)
    t = deepcopy(t)
    delete!(t, string(c,k))
    addseqs!(results, string(prefix, k[end]), t)
  end
end

This function takes results to keep track of the longest observed sequence, prefix — the current state of the string, and t, the Trie with the currently available codes. It builds a sub-trie based on the end of prefix, and recursively calls itself with a pruned trie.

The search gets kicked off with this function:

function getlongest(codes)
  results = DefaultDict{String,String}("")
  for first in codes
    t = Trie(codes)
    delete!(t, first)
    addseqs!(results, first, t)
  end
  results
end

Both of these functions rely on deleting a key from the Trie, which actually, is not provided in this library. In Julia however, it's easy to create new methods for types you don't own:

function Base.delete!(t::Trie{T}, k::AbstractString) where T
  node = subtrie(t, k)
  if node != nothing node.is_key = false end
  t
end

Solution

There's not much to do from here. Running:

@time results = getlongest(codes)
# Abridged output:
#
#  Replacing  -> FM.
#  Replacing FM -> FMT.
#  Replacing FMT -> FMTX.
#  Replacing FMTX -> FMTNM.
#  Replacing FMTNM -> FMTNMP.
#  Replacing FMTNMP -> FMTNMPA.
#  Replacing FMTNMPA -> FMTNMPAP.
#  Replacing FMTNMPAP -> FMTNMPAPR.
#  Replacing FMTNMPAPR -> FMTNMPAPRI.
#  Replacing FMTNMPAPRI -> FMTNMPAPRID.
#  Replacing FMTNMPAPRID -> FMTNMPAPRIDE.
#  Replacing FMTNMPAPRIDE -> FMTNMPAPRIDCA.
#  Replacing FMTNMPAPRIDCA -> FMTNMPAPRIDCAK.
#  Replacing FMTNMPAPRIDCAK -> FMTNMPAPRIDCAKY.
#  Replacing FMTNMPAPRIDCAKY -> FMTNMPAPRIDCAKSD.
#  Replacing FMTNMPAPRIDCAKSD -> FMTNMPAPRIDCAKSCT.
#  Replacing FMTNMPAPRIDCAKSCT -> FMTNMPAPRIDCAKSCOK.
#  Replacing FMTNMPAPRIDCAKSCOK -> FMTNMPAPRIDCAKSCOHI.
#  Replacing FMTNMPAPRIDCAKSCOHI -> FMTNMPAPRIDCAKSCOHIA.
#  Replacing FMTNMPAPRIDCAKSCOHIA -> FMTNMPAPRIDCAKSCOHIAA.
#  Replacing FMTNMPAPRIDCAKSCOHIAA -> FMTNMPAPRIDCAKSCOHIAAL.
#  Replacing FMTNMPAPRIDCAKSCOHIAAL -> FMTNMPAPRIDCAKSCOHIAALA.
#  Replacing FMTNMPAPRIDCAKSCOHIAALA -> FMTNMPAPRIDCAKSCOHIAALAE.
#  Replacing FMTNMPAPRIDCAKSCOHIAALAE -> FMTNMPAPRIDCAKSCOHINVAALA.
#  Replacing FMTNMPAPRIDCAKSCOHINVAALA -> FMTNMPAPRIDCAKSCOHINVAALAE.
#  Replacing FMTNMPAPRIDCAKSCOHINVAALAE -> FMTNMPAPRIAKSDCAALASCOHINVA.
#  Replacing FMTNMPAPRIAKSDCAALASCOHINVA -> FMTNMPAPWAKSDCAALARIASCOHINJ.
#  Replacing FMTNMPAPWAKSDCAALARIASCOHINJ -> FMTNMPAPWAKSDCAALARIASCOHINVA.
#  Replacing FMTNMPAPWAKSDCAALARIASCOHINVA -> FMTNMPAPWVAKSDCAALARIASCOHINVI.
#  Replacing FMTNMPAPWVAKSDCAALARIASCOHINVI -> FMPAPWAKSDCAALARIASCTNMNVINCOHI.
#  Replacing FMPAPWAKSDCAALARIASCTNMNVINCOHI -> FMPAPWVAKSDCAALARIASCTNMNVINCOHI.
#
#  1631.254820 seconds (4.47 G allocations: 591.410 GiB, 11.13% gc time)

You can see this was probably not a very good approach, with 4 billion allocations (probably 1 for each deepcopy). Note that actual memory usage did not exceed ~1GB.

As you might have guessed, I only showed the output for the actual longest walk starting from FM (Federated States of Micronesia, and provider of the TLD of your favorite podcast's domain, I'm sure).

To find the longest sequence:

maximum(v -> (length(v), v), values(results))
# Output:
#  (32, "FMPAPWVAKSDCAALARIASCTNMNVINCOHI")

Itinerary

This is quite the mileage run:

FM: Federated Sates of Micronesia
MP: Northern Mariana Islands
PA: Pennsylvania
AP: Armed Forces Pacific
PW: Palau
WV: West Virginia
VA: Virginia
AK: Alaska
KS: Kansas
SD: South Dakota
DC: District of Columbia
CA: California
AA: Armed Forces Americas
AL: Alabama
LA: Louisiana
AR: Arkansas
RI: Rhode Island
IA: Iowa
AS: American Samoa
SC: South Carolina
CT: Connecticut
TN: Tennessee
NM: New Mexico
MN: Minnesota
NV: Nevada
VI: Virgin Islands
IN: Indiana
NC: North Carolina
CO: Colorado
OH: Ohio
HI: Hawaii

If this doesn't get you elite status …