Skip to contents

Codes strings (e.g., organizational names) based on certain patterns

  • fixed - Match pattern string as it is within the target vector

  • begins - Match pattern string as it is in the beggining of the target vector

  • trim_begins - Match pattern string as it is in the beginning of the target vector ignoring preceding white-spaces

  • ends - Match pattern string as it is in the end of the target vector

  • trim_ends - Match pattern string as it is in the end of the target vector ignoring leading white-spaces

  • exact - Match pattern string exactly (i.e., match equal strings)

  • trim_exact - Match pattern string exactly (i.e., match equal strings) ignoring surrounding white-spaces

  • regex - Match regex pattern

  • replace_all - Replace everything (entire column) with new codes.

  • replace_empty - Code only records (i.e., rows) for which existing codes are empty (i.e., empty string, NA, empty list)

  • append_to_existing - Merge with existing codes appending new ones to the end

  • prepend_to_existing - Merge with existing codes prepending new ones to the front

Usage

detect_patterns(
  x,
  patterns,
  patterns_col = 1,
  patterns_codes_col = 2,
  patterns_type = "fixed",
  patterns_type_col = NULL,
  patterns_codes = NULL,
  output_codes_col_name = "{col_name_}coded",
  codes_omitted_rows_value = NULL,
  no_match_code = NULL,
  merge_existing_codes = "replace_all",
  return_only_codes = FALSE,
  return_only_first_detected_code = FALSE,
  return_merge_codes_description = FALSE,
  ...
)

Arguments

x

Vector or table to detect in.

patterns

Accepts both vector or table. If patterns it is table can also include replacements column.

patterns_col

If patterns is a table this specifies which column to use. Default is 1.

patterns_codes_col

If patterns is table which column to use as codes column. Default is 2.

patterns_type

Specifies kind(s) of patterns. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:

patterns_type_col

Column in patterns table where you can specify types of patterns for each pattern. If set then patterns.type is ignored. Default is NULL.

patterns_codes

If provided use it as codes. Should be the same length as patterns. Default is NULL.

output_codes_col_name

If provided use it as a name for codes column (new if it does not exist or the one to update) in results.

codes_omitted_rows_value

If rows is set. Use this value to fill the omitted rows. When we update existing codes column Default is NULL which means that we use initial codes values for omitted rows. If there is no codes col to update omited rows will be filled with NA.

no_match_code

If provided code records that did not get any match with it.

merge_existing_codes

Whether to merge newly detected codes with existing. Options are:

return_only_codes

If toggled on then just return codes vector.

return_only_first_detected_code

If TRUE then return only codes for the first detected pattern. If FALSE return list of all matched codes. Default is TRUE. (Currently does affect performance)

return_merge_codes_description

Return description of choices for merge_existing_codes paramenter.

...

Arguments passed on to standardize_options

col

Column of interest (the one we need to standardize) in the x object (if it is data.frame like).

rows

Logical vector to filter records of interest. Default is NULL which means do not filter records.

omitted_rows_value

If rows parameter is set then merge omitted_rows_value with the results (filtered by rows). Either single string or a character vector of length nrow(x). If NULL (the default) then original values of col are merged with results.

output_placement

Where to inset retults (standardized vector) in the x object. Default options is 'replace_col' which overwrides the col in x with results. Other options:

  • 'omit' :: do not write results back to table (usually used when append_output_copy is set for temporary values)

  • 'prepend_to_col' :: prepend to col

  • 'append_to_col' :: append to col

  • 'prepend_to_x' :: prepend to x data.frame like object

  • 'append_to_x' :: append to x data.frame like object

x_atomic_name

If x is vector use this name for original column if it is in results. Default is "x". If x is table the name of col will be used.

output_col_name

Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:

append_output_copy

Whether to append a copy of result vector to x object

output_copy_col_name

How the append copy wiil be named

Value

The updated x table with codes column or just codes if return_only_codes is set.