Class Utilities
- Author:
- Mark Allen
-
Method Summary
Modifier and TypeMethodDescriptionstatic StringencodeQueryParameters(Map<String, Set<String>> queryParameters, QueryFormat queryFormat) Encodes decoded query parameters into a raw query string.extractCharsetFromHeaders(Map<String, Set<String>> headers) Extracts theCharsetfrom the firstContent-Typeheader, if present and valid.extractCharsetFromHeaderValue(String contentTypeHeaderValue) Extracts thecharset=...parameter from aContent-Typeheader value.extractClientUrlPrefixFromHeaders(Map<String, Set<String>> headers) Best-effort attempt to determine a client's URL prefix by examining request headers.extractContentTypeFromHeaders(Map<String, Set<String>> headers) Extracts the media type (without parameters) from the firstContent-Typeheader.extractContentTypeFromHeaderValue(String contentTypeHeaderValue) Extracts the media type (without parameters) from aContent-Typeheader value.extractCookiesFromHeaders(Map<String, Set<String>> headers) ParsesCookierequest headers into a map of cookie names to values.extractHeadersFromRawHeaderLines(List<String> rawHeaderLines) Given a list of raw HTTP header lines, convert them into a normalized case-insensitive, order-preserving map which "inflates" comma-separated headers into distinct values where permitted according to RFC 7230/9110.extractLocalesFromAcceptLanguageHeaderValue(String acceptLanguageHeaderValue) Parses anAccept-Languageheader value into a best-effort ordered list ofLocales.static StringextractPathFromUrl(String url, Boolean performDecoding) Normalizes a URL or path into a canonical request path and optionally performs percent-decoding on the path.extractQueryParametersFromQuery(String query, QueryFormat queryFormat) Parses a query string such as"a=1&b=2&c=%20"into a multimap of names to values.extractQueryParametersFromQuery(String query, QueryFormat queryFormat, Charset charset) Parses a query string such as"a=1&b=2&c=%20"into a multimap of names to values.extractQueryParametersFromUrl(String url, QueryFormat queryFormat) Parses query strings from relative or absolute URLs such as"/example?a=a=1&b=2&c=%20"or"https://www.soklet.com/example?a=1&b=2&c=%20"into a multimap of names to values.extractQueryParametersFromUrl(String url, QueryFormat queryFormat, Charset charset) Parses query strings from relative or absolute URLs such as"/example?a=a=1&b=2&c=%20"or"https://www.soklet.com/example?a=1&b=2&c=%20"into a multimap of names to values.Extracts the raw (un-decoded) query component from a URL.static StringtrimAggressively(String string) A "stronger" version ofString.trim()which discards any kind of whitespace or invisible separator.static StringtrimAggressivelyToEmpty(String string) Aggressively trims Unicode whitespace from the given string and returns""if the input isnull.static StringtrimAggressivelyToNull(String string) Aggressively trims Unicode whitespace from the given string and returnsnullif the result is empty.
-
Method Details
-
extractQueryParametersFromQuery
@Nonnull public static Map<String, Set<String>> extractQueryParametersFromQuery(@Nonnull String query, @Nonnull QueryFormat queryFormat) Parses a query string such as"a=1&b=2&c=%20"into a multimap of names to values.Decodes percent-escapes using UTF-8, which is usually what you want (see
extractQueryParametersFromQuery(String, QueryFormat, Charset)if you need to specify a different charset).Pairs missing a name are ignored.
Multiple occurrences of the same name are collected into a
Setin insertion order (duplicates are de-duplicated).- Parameters:
query- a raw query string such as"a=1&b=2&c=%20"queryFormat- how to decode:application/x-www-form-urlencodedor "strict" RFC 3986- Returns:
- a map of parameter names to their distinct values, preserving first-seen name order; empty if none
-
extractQueryParametersFromQuery
@Nonnull public static Map<String, Set<String>> extractQueryParametersFromQuery(@Nonnull String query, @Nonnull QueryFormat queryFormat, @Nonnull Charset charset) Parses a query string such as"a=1&b=2&c=%20"into a multimap of names to values.Decodes percent-escapes using the specified charset.
Pairs missing a name are ignored.
Multiple occurrences of the same name are collected into a
Setin insertion order (duplicates are de-duplicated).- Parameters:
query- a raw query string such as"a=1&b=2&c=%20"queryFormat- how to decode:application/x-www-form-urlencodedor "strict" RFC 3986charset- the charset to use when decoding percent-escapes- Returns:
- a map of parameter names to their distinct values, preserving first-seen name order; empty if none
-
extractQueryParametersFromUrl
@Nonnull public static Map<String, Set<String>> extractQueryParametersFromUrl(@Nonnull String url, @Nonnull QueryFormat queryFormat) Parses query strings from relative or absolute URLs such as"/example?a=a=1&b=2&c=%20"or"https://www.soklet.com/example?a=1&b=2&c=%20"into a multimap of names to values.Decodes percent-escapes using UTF-8, which is usually what you want (see
extractQueryParametersFromUrl(String, QueryFormat, Charset)if you need to specify a different charset).Pairs missing a name are ignored.
Multiple occurrences of the same name are collected into a
Setin insertion order (duplicates are de-duplicated).- Parameters:
url- a relative or absolute URL/URI stringqueryFormat- how to decode:application/x-www-form-urlencodedor "strict" RFC 3986- Returns:
- a map of parameter names to their distinct values, preserving first-seen name order; empty if none/invalid
-
extractQueryParametersFromUrl
@Nonnull public static Map<String, Set<String>> extractQueryParametersFromUrl(@Nonnull String url, @Nonnull QueryFormat queryFormat, @Nonnull Charset charset) Parses query strings from relative or absolute URLs such as"/example?a=a=1&b=2&c=%20"or"https://www.soklet.com/example?a=1&b=2&c=%20"into a multimap of names to values.Decodes percent-escapes using the specified charset.
Pairs missing a name are ignored.
Multiple occurrences of the same name are collected into a
Setin insertion order (duplicates are de-duplicated).- Parameters:
url- a relative or absolute URL/URI stringqueryFormat- how to decode:application/x-www-form-urlencodedor "strict" RFC 3986charset- the charset to use when decoding percent-escapes- Returns:
- a map of parameter names to their distinct values, preserving first-seen name order; empty if none/invalid
-
extractCookiesFromHeaders
@Nonnull public static Map<String, Set<String>> extractCookiesFromHeaders(@Nonnull Map<String, Set<String>> headers) ParsesCookierequest headers into a map of cookie names to values.Header name matching is case-insensitive (
"Cookie"vs"cookie"), but cookie names are case-sensitive. Values are parsed per the following liberal rules:- Components are split on
';'unless inside a quoted string. - Quoted values have surrounding quotes removed and common backslash escapes unescaped.
- Percent-escapes are decoded as UTF-8.
'+'is not treated specially.
Setin insertion order.- Parameters:
headers- request headers as a multimap of header name to values (must be non-null)- Returns:
- a map of cookie name to distinct values; empty if no valid cookies are present
- Components are split on
-
extractPathFromUrl
@Nonnull public static String extractPathFromUrl(@Nonnull String url, @Nonnull Boolean performDecoding) Normalizes a URL or path into a canonical request path and optionally performs percent-decoding on the path.For example,
"https://www.soklet.com/ab%20c?one=two"would be normalized to"/ab c".The
OPTIONS *special case returns"*".Behavior:
- If input starts with
http://orhttps://, the path portion is extracted. - Ensures the result begins with
'/'. - Removes any trailing
'/'(except for the root path'/'). - Safely normalizes path traversals, e.g. path
'/a/../b'would be normalized to'/b' - Strips any query string.
- Applies aggressive trimming of Unicode whitespace.
- Parameters:
url- a URL or path to normalizeperformDecoding-trueif decoding should be performed on the path (e.g. replace%20with a space character),falseotherwise- Returns:
- the normalized path,
"/"for empty input
- If input starts with
-
extractRawQueryFromUrl
Extracts the raw (un-decoded) query component from a URL.For example,
"/path?a=b&c=d%20e"would return"a=b&c=d%20e".- Parameters:
url- a raw URL or path- Returns:
- the raw query component, or
Optional.empty()if none
-
encodeQueryParameters
@Nonnull public static String encodeQueryParameters(@Nonnull Map<String, Set<String>> queryParameters, @Nonnull QueryFormat queryFormat) Encodes decoded query parameters into a raw query string.For example, given
{a=[b], c=[d e]}andQueryFormat.RFC_3986_STRICT, returns"a=b&c=d%20e".- Parameters:
queryParameters- the decoded query parametersqueryFormat- the encoding strategy- Returns:
- the encoded query string, or the empty string if no parameters
-
extractLocalesFromAcceptLanguageHeaderValue
@Nonnull public static List<Locale> extractLocalesFromAcceptLanguageHeaderValue(@Nonnull String acceptLanguageHeaderValue) Parses anAccept-Languageheader value into a best-effort ordered list ofLocales.Quality weights are honored by
Locale.LanguageRange.parse(String); results are then mapped to available JVM locales. Unknown or unavailable language ranges are skipped. On parse failure, an empty list is returned.- Parameters:
acceptLanguageHeaderValue- the raw header value (must be non-null)- Returns:
- locales in descending preference order; empty if none could be resolved
-
extractClientUrlPrefixFromHeaders
@Nonnull public static Optional<String> extractClientUrlPrefixFromHeaders(@Nonnull Map<String, Set<String>> headers) Best-effort attempt to determine a client's URL prefix by examining request headers.A URL prefix in this context is defined as
<scheme>://host<:optional port>, but no path or query components.Soklet is generally the "last hop" behind a load balancer/reverse proxy and does get accessed directly by clients.
Normally a load balancer/reverse proxy/other upstream proxies will provide information about the true source of the request through headers like the following:
HostForwardedOriginX-Forwarded-ProtoX-Forwarded-ProtocolX-Url-SchemeFront-End-HttpsX-Forwarded-SslX-Forwarded-HostX-Forwarded-Port
This method may take these and other headers into account when determining URL prefix.
For example, the following would be legal URL prefixes returned from this method:
https://www.soklet.comhttp://www.fake.com:1234
The following would NOT be legal URL prefixes:
www.soklet.com(missing protocol)https://www.soklet.com/(trailing slash)https://www.soklet.com/test(trailing slash, path)https://www.soklet.com/test?abc=1234(trailing slash, path, query)
- Parameters:
headers- HTTP request headers- Returns:
- the URL prefix, or
Optional.empty()if it could not be determined
-
extractContentTypeFromHeaders
@Nonnull public static Optional<String> extractContentTypeFromHeaders(@Nonnull Map<String, Set<String>> headers) Extracts the media type (without parameters) from the firstContent-Typeheader.For example,
"text/html; charset=UTF-8"→"text/html".- Parameters:
headers- request/response headers (must be non-null)- Returns:
- the media type if present; otherwise
Optional.empty() - See Also:
-
extractContentTypeFromHeaderValue
@Nonnull public static Optional<String> extractContentTypeFromHeaderValue(@Nullable String contentTypeHeaderValue) Extracts the media type (without parameters) from aContent-Typeheader value.For example,
"application/json; charset=UTF-8"→"application/json".- Parameters:
contentTypeHeaderValue- the raw header value; may benullor blank- Returns:
- the media type if present; otherwise
Optional.empty()
-
extractCharsetFromHeaders
@Nonnull public static Optional<Charset> extractCharsetFromHeaders(@Nonnull Map<String, Set<String>> headers) Extracts theCharsetfrom the firstContent-Typeheader, if present and valid.Tolerates additional parameters and arbitrary whitespace. Invalid or unknown charset tokens yield
Optional.empty().- Parameters:
headers- request/response headers (must be non-null)- Returns:
- the charset declared by the header; otherwise
Optional.empty() - See Also:
-
extractCharsetFromHeaderValue
@Nonnull public static Optional<Charset> extractCharsetFromHeaderValue(@Nullable String contentTypeHeaderValue) Extracts thecharset=...parameter from aContent-Typeheader value.Parsing is forgiving: parameters may appear in any order and with arbitrary spacing. If a charset is found, it is validated via
Charset.forName(String); invalid names result inOptional.empty().- Parameters:
contentTypeHeaderValue- the raw header value; may benullor blank- Returns:
- the resolved charset if present and valid; otherwise
Optional.empty()
-
trimAggressively
A "stronger" version ofString.trim()which discards any kind of whitespace or invisible separator.In a web environment with user-supplied inputs, this is the behavior we want the vast majority of the time. For example, users copy-paste URLs from Microsoft Word or Outlook and it's easy to accidentally include a
U+202F "Narrow No-Break Space (NNBSP)"character at the end, which might break parsing.See https://www.compart.com/en/unicode/U+202F for details.
- Parameters:
string- the string to trim- Returns:
- the trimmed string, or
nullif the input string isnullor the trimmed representation is of length0
-
trimAggressivelyToNull
Aggressively trims Unicode whitespace from the given string and returnsnullif the result is empty.See
trimAggressively(String)for details on which code points are removed.- Parameters:
string- the input string; may benull- Returns:
- a trimmed, non-empty string; or
nullif input wasnullor trimmed to empty
-
trimAggressivelyToEmpty
Aggressively trims Unicode whitespace from the given string and returns""if the input isnull.See
trimAggressively(String)for details on which code points are removed.- Parameters:
string- the input string; may benull- Returns:
- a trimmed string (never
null);""if input wasnull
-
extractHeadersFromRawHeaderLines
@Nonnull public static Map<String, Set<String>> extractHeadersFromRawHeaderLines(@Nonnull List<String> rawHeaderLines) Given a list of raw HTTP header lines, convert them into a normalized case-insensitive, order-preserving map which "inflates" comma-separated headers into distinct values where permitted according to RFC 7230/9110.For example, given these raw header lines:
The result of parsing would look like this:List<String> lines = List.of( "Cache-Control: no-cache, no-store", "Set-Cookie: a=b; Path=/; HttpOnly", "Set-Cookie: c=d; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Path=/" );result.get("cache-control") -> [ "no-cache", "no-store" ] result.get("set-cookie") -> [ "a=b; Path=/; HttpOnly", "c=d; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Path=/" ]Keys in the returned map are case-insensitive and are guaranteed to be in the same order as encountered in
rawHeaderLines.Values in the returned map are guaranteed to be in the same order as encountered in
rawHeaderLines.- Parameters:
rawHeaderLines- the raw HTTP header lines to parse- Returns:
- a normalized mapping of header name keys to values
-