Elm Robots and Humans
Humans build websites using a vast amount of tools and technologies. Elm is a functional programing language that can build reliable websites. Robots crawl websites and gather useful information for search engines.
In this note we will try to understand and build a robots.txt file content and a humans.txt file content using Elm.

Humans are odd. They think order and chaos are somehow opposites and try to control what won't be. But there is grace in their failings.
-- Vision, Avengers: Age of Ultron
Robots.txt
Robots.txt is a public file in our websites where we can define policies, sitemaps and host name to let crawlers know where, when and what they can access and index in search engines.
A robots.txt file can look something like this:
User-agent: *
Allow: *
Sitemap: https://marcodaniels.com/sitemap.xml
Host: https://marcodaniels.com
This information will allow all robots (User-agent: *) to access all pages (Allow: *) in this website. It also points to where we can find the sitemap.
Policies
In our robots.txt we can define multiple policies for multiple user-agents.
In this example we can disallow the page /search just for the Googlebot and for Bingbot we disallow all pages in our website. At the end we make sure to allow all the other crawlers in all our pages.
User-agent: Googlebot
Disallow: /search
User-agent: Bingbot
Disallow: /
User-agent: *
Allow: *
Crawl-Delay
Crawl-delay informs crawlers to wait a specific amount of time (in milliseconds) before they can start crawling our pages.
User-agent: Googlebot
Crawl-delay: 120
Disallow: /search
This directive can be used per user-agent.
Clean-Param
Clean-param tells crawlers to remove parameter(s) from the URL's query string.
User-agent: *
Allow: *
Clean-param: id /user
This will remove the id parameter from the URL .../user?id=1234
Humans.txt
Humans.txt is a fun initiative where we can introduce the people behind our website, inform what technologies and standards we follow and provide acknowledgements and greetings for the humans who build the website.
/* Team */
Engineer: Marco Martins
/* Technology */
Elm, Terraform, Nix
The humans.txt file does not have a defined structure behind, and it can have many formats and different information as it is a file from humans to humans.
elm-robots-humans
elm-robots-humans is an Elm package that allows us to write robots.txt and humans.txt file contents in a structured and typed manner.
The Robots module exposes multiple functions and types that allows us to write policies per user-agent with all the needed directives.
robots: String
robots =
Robots.robots
{ sitemap = Robots.SingleValue "/sitemap.xml"
, host = "https://marcodaniels.com"
, policies =
[ Robots.policy
{ userAgent = Robots.SingleValue "*"
, allow = Just (Robots.SingleValue "*")
, disallow = Nothing
} |> Robots.withCrawlDelay 120
]
}
Because some properties (sitemap, userAgent, allow, disallow) can be single or multiple valued string entry we use a Value custom type that allows us to be more expressive about our policy needs. The example above will generate the following string:
User-agent: *
Allow: *
Crawl-delay: 120
Sitemap: /sitemap.xml
Host: https://marcodaniels.com
For the humans.txt side the Humans module allow us to write each "section" with just headline and content:
humans: String
humans =
Humans.humans
[ { headline = "Team"
, content = [ "Engineer: Marco Martins" ]
}
, { headline = "Technology"
, content = [ "Elm, Terraform, Nix" ]
}
]
Since the humans.txt content does not require much structure, this allows us to make sure we have an easy way to create humans content. The example would generate the string:
/* Team */
Engineer: Marco Martins
/* Technology */
Elm, Terraform, Nix
You're still here?
Thank you so much for reading this!
Go ahead and check the elm-robots-humans package and its source code in GitHub.
You can also see it in action in this website at /robots.txt and the respective source code in Github.
Until then!