Skip to main content
C#

How To Validate URLs With C#

Learn how to validate URLs using C# with some of my code snippets in this C# tutorial and see what options is best in terms of performance.

Christian Schou Køster

Have you ever stumbled upon bad URLs that are invalid? Well.. you are not the first to encounter that issue, but luckily we can easily get around that in C#.

When working with the software, we often use URLs to perform requests for other services, etc... When doing so, it's crucial that we know these addresses are correct and won't cause us any trouble, e.g. exceptions due to badly formed structures.

By the end of this tutorial, you will have a good understanding of how to validate your URLs with C# in multiple ways. If you are ready, let's dig in and see how we can validate URLs in C# easily, and see which option is the best to go along with, in terms of performance.

How Are URLs Constructed?

Before we get to the awesome part (you are welcome to scroll down) I would like to introduce you to how a URL is made up. This is to make sure that you have a good understanding of URL formats and know what to look for.

URL is short for Uniform Resource Locator and is an address for resources on the internet. Inside a URL you have several components making up the actual URL. Let's have a look at those components.

  • The Scheme - This is HTTP or HTTPS.
  • The hostname - This is the address of the resource on the internet.
  • The Path - This is optional but used in almost all cases (except for home pages). It specifies a path at the address (hostname).
  • Query parameters - We often see these used with forms, redirects, etc... they are optional in most cases and can help the site you navigate to with extra values when doing a query.
  • Fragments - Fragments are also what we know as sections on a page. For instance, this fragment is #HowareURLSConstructed.


Alright with that in place, let's take a real-world example. This page is a good example: https://christian-schou.dk/blog/how-to-validate-urls-with-csharp/.

The scheme is https. The hostname is christian-schou.dk, and the path is /blog/how-to-validate-urls-with-csharp/. We don't have any query parameters for this request, but we easily could at the account page, for instance.

With a basic understanding of how URLs are constructed in place, let's move on to the first of the four validation options for URLs I will teach you in this tutorial.

How To Validate A URL With Regex In C#

The regex method is by far the best in terms of performance. C# has a built-in Regex class to make regular expressions. By using the regex method we now have a way of declaring a regular expression for URLs that we can match against.

In my code below, I have made a static readonly Regex to match a URL with both protocols, hostname/domain, port, and a path that includes query and sections on a page. I will explain the regex below. I have added the RegexOptions.Compiled option to the UrlRegex to make it faster for repeated matches.

using System.Text.RegularExpressions;

namespace UrlValidationDemo;

/// <summary>
///     An extension class to make URL validations
/// </summary>
public static partial class UrlValidators
{
    [GeneratedRegex(@"^(https?|ftps?):\/\/(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}(?::(?:0|[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5]))?(?:\/(?:[-a-zA-Z0-9@%_\+.~#?&=]+\/?)*)?$", RegexOptions.IgnoreCase | RegexOptions.Compiled, "en-US")]
    private static partial Regex TheMasterUrlRegex();
    
    // Compiled regex for better performance
    private static readonly Regex UrlRegex = TheMasterUrlRegex();

    /// <summary>
    ///     Validates a URL using a regular expression.
    /// </summary>
    /// <param name="url">The URL to validate.</param>
    /// <returns>true if the URL is valid; otherwise, false.</returns>
    public static bool IsValidUrl(this string url)
    {
        return UrlRegex.IsMatch(url);
    }
}

Let's break it down. I know that a regex can be quite complex to read and understand, so I will do my best to explain it to you without making you bored.

  • ^ - This asserts the start of a line. The URL must start with the following pattern.
  • (https?|ftps?) - This is a group that matches either "http" or "https", or "ftp" or "ftps". The ? makes the "s" optional.
  • :\/\/ - This matches :// literally. The backslashes are escape characters because : and / have special meanings in regular expressions.
  • (?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+ - This matches the domain name. It must start with a letter or number, followed by any combination of letters, numbers, or hyphens, but it cannot end with a hyphen. The {0,61} specifies that there can be between 0 and 61 of these characters. The ? makes this group optional. The \. matches a period literally. The + at the end means that there can be one or more of these domain name parts.
  • [a-zA-Z]{2,} - This matches the top-level domain, like ".com" or ".org". It must be at least two letters.
  • (?::(?:0|[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5]))? - This matches the port number if it's present. The ?: at the start makes it a non-capturing group. The ? at the end makes the whole group optional. The port number can be between b and 65535.
  • (?:\/(?:[-a-zA-Z0-9@%_\+.~#?&=]+\/?)*)? - This matches the path or fragment. It can contain various characters, and each part of the path must start with a /. The * at the end means that there can be zero or more parts of the path. The ? at the end makes the whole group optional.
  • $ - This asserts the end of a line. The URL must end after the previous pattern.
💡
non-capturing group groups a subpattern, allowing you to apply a quantifier to the entire group or use disjunctions within it. Source: mozilla.org.

This regular expression will match URLs that start with "http://", "https://", "ftp://", or "ftps://", followed by a valid domain name, an optional port number, and an optional path.

This is how it works.

var url = "https://www.christian-schou.dk";
var url2 = "ftp:////christian-schou.dk//files////coolFile?param=true";

Console.WriteLine($"The URL '{url}' is {(url.IsValidUrl() ? "valid" : "invalid")}.");
Console.WriteLine($"The URL '{url2}' is {(url2.IsValidUrl() ? "valid" : "invalid")}.");

Let's see the final result from running the program in the console.

The URL 'https://www.christian-schou.dk' is valid.
The URL 'ftp:////christian-schou.dk//files////coolFile?param=true' is invalid.

Just as I expected. The URL validator perfectly formed and accepted my first URL as a valid URL using the regex option. The second URL that contains too many slashes was reported invalid by the validator. 💪

Validating A URL Using The Uri Class

Instead of reinventing the wheel and creating our own methods, we can rely on the Uri class that has built-in options for validating URLs. In this tutorial, I will present you with two options for validating URLs using the Uri class in .NET.

Using The TryCreate Method

The Uri class has a method named TryCreate(). It is very easy to use and provides a straight forward way of telling if a URL is valid or not. Below is the code, and I will explain it below the implementation.

public static bool ValidateUrlWithUriTryCreate(this string url)
{
    return Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out Uri? _);
}

The method is an extension method for the string class, as indicated by the this keyword before the string url parameter. This means that you can call this method directly on any string object, like this: "https://christian-schou.dk".ValidateUrlWithUriTryCreate().

Inside the method, we're using the Uri.TryCreate method to try to create a new Uri object from the string. Uri.TryCreate takes three parameters:

  1. The string to try to create the Uri from.
  2. The kind of Uri to create (in this case, either a relative or absolute Uri),
  3. Finally an out parameter that will hold the created Uri if the method succeeds.

The Uri.TryCreate method returns true if it successfully creates a Uri from the string, and false otherwise. We're returning this value directly from our method, so our method will return true if the string is a valid URL, and false otherwise.

The out Uri? _ part of the code is a discard. We're not interested in the Uri that Uri.TryCreate creates, we only care about whether it was able to create it or not. The underscore _ is a convention in C# that means "we're not going to use this variable".

Let's test it - here is an example for the implementation.

var url = "https://www.christian-schou.dk";
var url2 = "ftp:////christian-schou.dk//files////coolFile?param=true";

Console.WriteLine($"The URL '{url}' is {(url.ValidateUrlWithUriTryCreate() ? "valid" : "invalid")}.");
Console.WriteLine($"The URL '{url2}' is {(url2.ValidateUrlWithUriTryCreate() ? "valid" : "invalid")}.");

Here are the results when running the code.

The URL 'https://www.christian-schou.dk' is valid.
The URL 'ftp:////christian-schou.dk//files////coolFile?param=true' is invalid.

Awesome, just as our regex method behaved. ✌️

Using The IsWellFormedUriString Method

If you are in the market for a more strict way of validating your URLs, you are in a lucky place today! The Uri class also provides us with a method named IsWellFormedUriString().

What do I mean when I say "a more strict way"? If you use this method it will make sure that you comply with official URL syntax, namely RFC 3986 specifying strictly the syntax of a URL, like I told you about earlier in the tutorial.

This is another great way of making sure that your URLs are valid but also follow the specifications for how a URL should look like. Here is the code for how I have implemented it in my demo.

public static bool ValidateUrlWithWellFormedString(this string url)
{
    return Uri.IsWellFormedUriString(url, UriKind.RelativeOrAbsolute);
}

And what is going on here? 🤔

Inside the method, we're using the Uri.IsWellFormedUriString method to check if the string is a well-formed URI. Uri.IsWellFormedUriString takes two parameters:

  1. The string to check 😅
  2. The kind of Uri to check for (in this case, it's also either a relative or absolute Uri like in the TryCreate method.)

The Uri.IsWellFormedUriString method returns true if the string is a well-formed URI according to the rules defined by the RFC 3986 specification, and false otherwise. We're returning this value directly from our method, so our method will return true if the string is a well-formed URI, and false otherwise. Here is the implementation and the results of running that code.

var url = "https://www.christian-schou.dk";
var url2 = "ftp:////christian-schou.dk//files////coolFile?param=true";

Console.WriteLine($"The URL '{url}' is {(url.ValidateUrlWithWellFormedString() ? "valid" : "invalid")}.");
Console.WriteLine($"The URL '{url2}' is {(url2.ValidateUrlWithWellFormedString() ? "valid" : "invalid")}.");
The URL 'https://www.christian-schou.dk' is valid.
The URL 'ftp:////christian-schou.dk//files////coolFile?param=true' is invalid.

Great, we are still returning the same results. ✅

Last But Not Least - Validate URLs Using The HttpClient

This is a little more cumbersome, as we have to handle exceptions and check the response from the client to determine if the URL is valid or not.

This is by far the slowest option. It will send a HEAD request to the URL to figure out if it was a success or not and then evaluate upon that. In a production environment I would not recommend this option, as it introduces some security concerns and might cause more troubles in terms of network connectivity, etc... for the whole of your environment, as it needs access to the internet, but maybe you need this? 😅 Anyway, this is my implementation in the demo.

/// <summary>
///     An HttpClient for making URL validation using a HEAD request to the target.
/// </summary>
private static readonly HttpClient Client = new();

public static async Task<bool> ValidateUrlWithHttpClient(this string url)
{
    try
    {
        var response = await Client.SendAsync(new HttpRequestMessage(HttpMethod.Head, url));
        return response.IsSuccessStatusCode;
    }
    catch (HttpRequestException httpRequestException)
        when (httpRequestException.InnerException is SocketException
        {
            SocketErrorCode: SocketError.HostNotFound
        })
    {
        return false;
    }
    catch (HttpRequestException httpRequestException)
        when (httpRequestException.StatusCode.HasValue && (int)httpRequestException.StatusCode.Value > 500)
    {
        return true;
    }

    catch (UriFormatException)
    {
        return false;
    }
}

So what is going on in this code?

Inside the method, we're using the HttpClient.SendAsync method to send a HEAD request to the URL. A HEAD request is like a GET request, but it only asks for the headers of the response, not the body. This can be faster and use less bandwidth than a GET request.

If the HEAD request is successful (i.e., the server returns a 2xx status code), the method returns true. If the server returns a status code greater than 500, the method also returns true, because this indicates a server error, not a problem with the URL itself. That is important to understand.

If the HttpClient.SendAsync method throws an HttpRequestException, the method checks the inner exception and the status code of the exception. If the inner exception is a SocketException with a SocketErrorCode of HostNotFound, the method returns false, because this means the URL is not valid. As before if greater than 500 in the status code, we return true, because it's a server issue not a URL issue.

If the HttpClient.SendAsync method throws a UriFormatException, the method returns false, because this means the URL is not valid. Let's test it out.

var url = "https://www.christian-schou.dk";
var url2 = "ftp:////christian-schou.dk//files////coolFile?param=true";

Console.WriteLine($"The URL '{url}' is {(await url.ValidateUrlWithHttpClient() ? "valid" : "invalid")}.");
Console.WriteLine($"The URL '{url2}' is {(await url2.ValidateUrlWithHttpClient() ? "valid" : "invalid")}.");
The URL 'https://www.christian-schou.dk' is valid.
The URL 'ftp:////christian-schou.dk//files////coolFile?param=true' is invalid.

Wohaa! 🎉 The same result is earlier is returned, just as I expected!

Benchmarking - What Is The Best Option?

Now that we have four different ways of doing URL validation in C#/.NET, how can we make sure that we using the quickest method in terms of performance? As a developer, I always want to make my code perform as best as possible.

Fortunately, we have a tool for running benchmark tests in .NET. It's called BenchmarkDotNet and is available for free. I won't go into details in this post about how that works, but I have created a benchmark test for the four methods.

GitHub - dotnet/BenchmarkDotNet: Powerful .NET library for benchmarking
Powerful .NET library for benchmarking. Contribute to dotnet/BenchmarkDotNet development by creating an account on GitHub.

Below are the results in a nicely formatted table by the tool.

BenchmarkDotNet v0.13.12, Ubuntu 22.04.3 LTS (Jammy Jellyfish)
12th Gen Intel Core i7-1265U, 1 CPU, 12 logical and 10 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2


| Method                | Mean              | Error             | StdDev            | Median            | Gen0   | Allocated |
|---------------------- |------------------:|------------------:|------------------:|------------------:|-------:|----------:|
| Regex                 |          96.41 ns |          1.518 ns |          1.268 ns |          96.95 ns |      - |         - |
| UriTryCreate          |          71.35 ns |          0.596 ns |          0.557 ns |          71.42 ns | 0.0088 |      56 B |
| IsWellFormedUriString |         216.11 ns |          1.908 ns |          1.785 ns |         216.54 ns | 0.0215 |     136 B |
| HttpClientValidation  | 384,080,024.11 ns | 11,388,124.432 ns | 33,399,397.601 ns | 404,238,734.00 ns |      - |   10344 B |

I think it's pretty clear that the HttpClientValidation method is not the one we would be using for an environment with high demands. It took about 0.4 seconds for that method to validate if the URL was valid or not 😅 The reason is that it's relying on the external service and that's the one making this method slow. It would not say that 0.4 seconds for a website to load with search functionality is slow, but compared to what we would like to achieve here, it is super slow! 🐌

On the other hand, we have the UriTryCreate option that takes 0.596 nanoseconds to validate if our URL is valid or not. That is fast! (if you ask me). At the same time, it doesn't require much space in our memory to perform the validation, making it even better! ⚡ If you need the RFC syntax standard, it would become around 3.2 times slower, when comparing UriTryCreate with IsWellFormedUriString. 🔥

Summary

In this C# tutorial on how to validate URLs using C#/.NET, you learned four different ways of doing it. You are now capable of implementing a validation service for URLs in your applications, by copying and pasting my code above, as they are just extensions and will work everywhere.

At the same time, we had a look at the performance of each method and concluded that the UriTryCreate is the fastest. I would recommend this if you don't need to comply with RFC 3986, otherwise choose the IsWellFormedUriString method for your validations, if needed.

If you have any questions or issues, please reach out in the comments below. As always I would be more than happy to help with issues arising using my code examples. Until next time, happy coding! ✌️

Christian Schou Køster