Class Utilities

java.lang.Object
com.soklet.Utilities

@ThreadSafe public final class Utilities extends Object
A non-instantiable collection of utility methods.
Author:
Mark Allen
  • Method Details

    • extractQueryParametersFromQuery

      Parses a query string such as "a=1&b=2&c=%20" into a multimap of names to values.

      Decodes percent-escapes using UTF-8, which is usually what you want (see extractQueryParametersFromQuery(String, QueryFormat, Charset) if you need to specify a different charset).

      Pairs missing a name are ignored.

      Multiple occurrences of the same name are collected into a Set in insertion order (duplicates are de-duplicated).

      Parameters:
      query - a raw query string such as "a=1&b=2&c=%20"
      queryFormat - how to decode: application/x-www-form-urlencoded or "strict" RFC 3986
      Returns:
      a map of parameter names to their distinct values, preserving first-seen name order; empty if none
    • extractQueryParametersFromQuery

      Parses a query string such as "a=1&b=2&c=%20" into a multimap of names to values.

      Decodes percent-escapes using the specified charset.

      Pairs missing a name are ignored.

      Multiple occurrences of the same name are collected into a Set in insertion order (duplicates are de-duplicated).

      Parameters:
      query - a raw query string such as "a=1&b=2&c=%20"
      queryFormat - how to decode: application/x-www-form-urlencoded or "strict" RFC 3986
      charset - the charset to use when decoding percent-escapes
      Returns:
      a map of parameter names to their distinct values, preserving first-seen name order; empty if none
    • extractQueryParametersFromUrl

      Parses query strings from relative or absolute URLs such as "/example?a=a=1&b=2&c=%20" or "https://www.soklet.com/example?a=1&b=2&c=%20" into a multimap of names to values.

      Decodes percent-escapes using UTF-8, which is usually what you want (see extractQueryParametersFromUrl(String, QueryFormat, Charset) if you need to specify a different charset).

      Pairs missing a name are ignored.

      Multiple occurrences of the same name are collected into a Set in insertion order (duplicates are de-duplicated).

      Parameters:
      url - a relative or absolute URL/URI string
      queryFormat - how to decode: application/x-www-form-urlencoded or "strict" RFC 3986
      Returns:
      a map of parameter names to their distinct values, preserving first-seen name order; empty if none/invalid
    • extractQueryParametersFromUrl

      Parses query strings from relative or absolute URLs such as "/example?a=a=1&b=2&c=%20" or "https://www.soklet.com/example?a=1&b=2&c=%20" into a multimap of names to values.

      Decodes percent-escapes using the specified charset.

      Pairs missing a name are ignored.

      Multiple occurrences of the same name are collected into a Set in insertion order (duplicates are de-duplicated).

      Parameters:
      url - a relative or absolute URL/URI string
      queryFormat - how to decode: application/x-www-form-urlencoded or "strict" RFC 3986
      charset - the charset to use when decoding percent-escapes
      Returns:
      a map of parameter names to their distinct values, preserving first-seen name order; empty if none/invalid
    • extractCookiesFromHeaders

      Parses Cookie request headers into a map of cookie names to values.

      Header name matching is case-insensitive ("Cookie" vs "cookie"), but cookie names are case-sensitive. Values are parsed per the following liberal rules:

      • Components are split on ';' unless inside a quoted string.
      • Quoted values have surrounding quotes removed and common backslash escapes unescaped.
      • Percent-escapes are decoded as UTF-8. '+' is not treated specially.
      Multiple occurrences of the same cookie name are collected into a Set in insertion order.
      Parameters:
      headers - request headers as a multimap of header name to values (must be non-null)
      Returns:
      a map of cookie name to distinct values; empty if no valid cookies are present
    • extractPathFromUrl

      @Nonnull public static String extractPathFromUrl(@Nonnull String url, @Nonnull Boolean performDecoding)
      Normalizes a URL or path into a canonical request path and optionally performs percent-decoding on the path.

      For example, "https://www.soklet.com/ab%20c?one=two" would be normalized to "/ab c".

      The OPTIONS * special case returns "*".

      Behavior:

      • If input starts with http:// or https://, the path portion is extracted.
      • Ensures the result begins with '/'.
      • Removes any trailing '/' (except for the root path '/').
      • Safely normalizes path traversals, e.g. path '/a/../b' would be normalized to '/b'
      • Strips any query string.
      • Applies aggressive trimming of Unicode whitespace.
      Parameters:
      url - a URL or path to normalize
      performDecoding - true if decoding should be performed on the path (e.g. replace %20 with a space character), false otherwise
      Returns:
      the normalized path, "/" for empty input
    • extractRawQueryFromUrl

      Extracts the raw (un-decoded) query component from a URL.

      For example, "/path?a=b&c=d%20e" would return "a=b&c=d%20e".

      Parameters:
      url - a raw URL or path
      Returns:
      the raw query component, or Optional.empty() if none
    • encodeQueryParameters

      @Nonnull public static String encodeQueryParameters(@Nonnull Map<String, Set<String>> queryParameters, @Nonnull QueryFormat queryFormat)
      Encodes decoded query parameters into a raw query string.

      For example, given {a=[b], c=[d e]} and QueryFormat.RFC_3986_STRICT, returns "a=b&c=d%20e".

      Parameters:
      queryParameters - the decoded query parameters
      queryFormat - the encoding strategy
      Returns:
      the encoded query string, or the empty string if no parameters
    • extractLocalesFromAcceptLanguageHeaderValue

      @Nonnull public static List<Locale> extractLocalesFromAcceptLanguageHeaderValue(@Nonnull String acceptLanguageHeaderValue)
      Parses an Accept-Language header value into a best-effort ordered list of Locales.

      Quality weights are honored by Locale.LanguageRange.parse(String); results are then mapped to available JVM locales. Unknown or unavailable language ranges are skipped. On parse failure, an empty list is returned.

      Parameters:
      acceptLanguageHeaderValue - the raw header value (must be non-null)
      Returns:
      locales in descending preference order; empty if none could be resolved
    • extractClientUrlPrefixFromHeaders

      Best-effort attempt to determine a client's URL prefix by examining request headers.

      A URL prefix in this context is defined as <scheme>://host<:optional port>, but no path or query components.

      Soklet is generally the "last hop" behind a load balancer/reverse proxy and does get accessed directly by clients.

      Normally a load balancer/reverse proxy/other upstream proxies will provide information about the true source of the request through headers like the following:

      • Host
      • Forwarded
      • Origin
      • X-Forwarded-Proto
      • X-Forwarded-Protocol
      • X-Url-Scheme
      • Front-End-Https
      • X-Forwarded-Ssl
      • X-Forwarded-Host
      • X-Forwarded-Port

      This method may take these and other headers into account when determining URL prefix.

      For example, the following would be legal URL prefixes returned from this method:

      • https://www.soklet.com
      • http://www.fake.com:1234

      The following would NOT be legal URL prefixes:

      • www.soklet.com (missing protocol)
      • https://www.soklet.com/ (trailing slash)
      • https://www.soklet.com/test (trailing slash, path)
      • https://www.soklet.com/test?abc=1234 (trailing slash, path, query)
      Parameters:
      headers - HTTP request headers
      Returns:
      the URL prefix, or Optional.empty() if it could not be determined
    • extractContentTypeFromHeaders

      Extracts the media type (without parameters) from the first Content-Type header.

      For example, "text/html; charset=UTF-8""text/html".

      Parameters:
      headers - request/response headers (must be non-null)
      Returns:
      the media type if present; otherwise Optional.empty()
      See Also:
    • extractContentTypeFromHeaderValue

      @Nonnull public static Optional<String> extractContentTypeFromHeaderValue(@Nullable String contentTypeHeaderValue)
      Extracts the media type (without parameters) from a Content-Type header value.

      For example, "application/json; charset=UTF-8""application/json".

      Parameters:
      contentTypeHeaderValue - the raw header value; may be null or blank
      Returns:
      the media type if present; otherwise Optional.empty()
    • extractCharsetFromHeaders

      Extracts the Charset from the first Content-Type header, if present and valid.

      Tolerates additional parameters and arbitrary whitespace. Invalid or unknown charset tokens yield Optional.empty().

      Parameters:
      headers - request/response headers (must be non-null)
      Returns:
      the charset declared by the header; otherwise Optional.empty()
      See Also:
    • extractCharsetFromHeaderValue

      @Nonnull public static Optional<Charset> extractCharsetFromHeaderValue(@Nullable String contentTypeHeaderValue)
      Extracts the charset=... parameter from a Content-Type header value.

      Parsing is forgiving: parameters may appear in any order and with arbitrary spacing. If a charset is found, it is validated via Charset.forName(String); invalid names result in Optional.empty().

      Parameters:
      contentTypeHeaderValue - the raw header value; may be null or blank
      Returns:
      the resolved charset if present and valid; otherwise Optional.empty()
    • trimAggressively

      A "stronger" version of String.trim() which discards any kind of whitespace or invisible separator.

      In a web environment with user-supplied inputs, this is the behavior we want the vast majority of the time. For example, users copy-paste URLs from Microsoft Word or Outlook and it's easy to accidentally include a U+202F "Narrow No-Break Space (NNBSP)" character at the end, which might break parsing.

      See https://www.compart.com/en/unicode/U+202F for details.

      Parameters:
      string - the string to trim
      Returns:
      the trimmed string, or null if the input string is null or the trimmed representation is of length 0
    • trimAggressivelyToNull

      Aggressively trims Unicode whitespace from the given string and returns null if the result is empty.

      See trimAggressively(String) for details on which code points are removed.

      Parameters:
      string - the input string; may be null
      Returns:
      a trimmed, non-empty string; or null if input was null or trimmed to empty
    • trimAggressivelyToEmpty

      Aggressively trims Unicode whitespace from the given string and returns "" if the input is null.

      See trimAggressively(String) for details on which code points are removed.

      Parameters:
      string - the input string; may be null
      Returns:
      a trimmed string (never null); "" if input was null
    • extractHeadersFromRawHeaderLines

      Given a list of raw HTTP header lines, convert them into a normalized case-insensitive, order-preserving map which "inflates" comma-separated headers into distinct values where permitted according to RFC 7230/9110.

      For example, given these raw header lines:

      List<String> lines = List.of(
        "Cache-Control: no-cache, no-store",
        "Set-Cookie: a=b; Path=/; HttpOnly",
        "Set-Cookie: c=d; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Path=/"
      );
      The result of parsing would look like this:
      result.get("cache-control") -> [
        "no-cache",
        "no-store"
      ]
      result.get("set-cookie") -> [
        "a=b; Path=/; HttpOnly",
        "c=d; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Path=/"
      ]

      Keys in the returned map are case-insensitive and are guaranteed to be in the same order as encountered in rawHeaderLines.

      Values in the returned map are guaranteed to be in the same order as encountered in rawHeaderLines.

      Parameters:
      rawHeaderLines - the raw HTTP header lines to parse
      Returns:
      a normalized mapping of header name keys to values