Skip to contents

Codes strings (e.g., organizational names) based on certain patterns

  • fixed - Match pattern string as it is within the target vector

  • begins - Match pattern string as it is in the beggining of the target vector

  • trim_begins - Match pattern string as it is in the beginning of the target vector ignoring preceding white-spaces

  • ends - Match pattern string as it is in the end of the target vector

  • trim_ends - Match pattern string as it is in the end of the target vector ignoring leading white-spaces

  • exact - Match pattern string exactly (i.e., match equal strings)

  • trim_exact - Match pattern string exactly (i.e., match equal strings) ignoring surrounding white-spaces

  • regex - Match regex pattern

  • replace_all - Replace everything (entire column) with new codes.

  • replace_empty - Code only records (i.e., rows) for which existing codes are empty (i.e., empty string, NA, empty list)

  • append_to_existing - Merge with existing codes appending new ones to the end

  • prepend_to_existing - Merge with existing codes prepending new ones to the front


  patterns_col = 1,
  patterns_codes_col = 2,
  patterns_type = "fixed",
  patterns_type_col = NULL,
  patterns_codes = NULL,
  output_codes_col_name = "{col_name_}coded",
  codes_omitted_rows_value = NULL,
  no_match_code = NULL,
  merge_existing_codes = "replace_all",
  return_only_codes = FALSE,
  return_only_first_detected_code = FALSE,
  return_merge_codes_description = FALSE,



Vector or table to detect in.


Accepts both vector or table. If patterns it is table can also include replacements column.


If patterns is a table this specifies which column to use. Default is 1.


If patterns is table which column to use as codes column. Default is 2.


Specifies kind(s) of patterns. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:


Column in patterns table where you can specify types of patterns for each pattern. If set then patterns.type is ignored. Default is NULL.


If provided use it as codes. Should be the same length as patterns. Default is NULL.


If provided use it as a name for codes column (new if it does not exist or the one to update) in results.


If rows is set. Use this value to fill the omitted rows. When we update existing codes column Default is NULL which means that we use initial codes values for omitted rows. If there is no codes col to update omited rows will be filled with NA.


If provided code records that did not get any match with it.


Whether to merge newly detected codes with existing. Options are:


If toggled on then just return codes vector.


If TRUE then return only codes for the first detected pattern. If FALSE return list of all matched codes. Default is TRUE. (Currently does affect performance)


Return description of choices for merge_existing_codes paramenter.


Arguments passed on to standardize_options


Column of interest (the one we need to standardize) in the x object (if it is data.frame like).


Logical vector to filter records of interest. Default is NULL which means do not filter records.


If rows parameter is set then merge omitted_rows_value with the results (filtered by rows). Either single string or a character vector of length nrow(x). If NULL (the default) then original values of col are merged with results.


Where to inset retults (standardized vector) in the x object. Default options is 'replace_col' which overwrides the col in x with results. Other options:

  • 'omit' :: do not write results back to table (usually used when append_output_copy is set for temporary values)

  • 'prepend_to_col' :: prepend to col

  • 'append_to_col' :: append to col

  • 'prepend_to_x' :: prepend to x data.frame like object

  • 'append_to_x' :: append to x data.frame like object


If x is vector use this name for original column if it is in results. Default is "x". If x is table the name of col will be used.


Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:


Whether to append a copy of result vector to x object


How the append copy wiil be named


The updated x table with codes column or just codes if return_only_codes is set.