This is original procedure based on patterns identified with original
PATSTAT data. I.e. it is intended for reproducibility purposes. Use
magerman_detect_characters
to identify similar patterns in your own
data.
A simple illustration of what this procedure does:
-> AE Æ
Usage
magerman_replace_accented_characters(
x,
patterns = magerman_patterns_accented_characters,
patterns_col = 1,
patterns_mode = "all",
patterns_mode_col = NULL,
patterns_type = "fixed",
patterns_type_col = NULL,
patterns_replacements_col = 2,
replacements = if (is.atomic(patterns)) "" else NULL,
...
)
Arguments
- x
Vector or table to standardize.
- patterns
Accepts both vector or table. If patterns is a table can also include replacements column.
- patterns_col
If patterns is not a vector which column to use. Default is 1.
- patterns_mode
Mode of matching. Could be one of c("all", "first", "last"). The default is "all" (it is 2x faster than "first" and "last" because of handy stri_replace_all_* functions). Also possible to pass a vector (same length as patterns)
- patterns_mode_col
Column in patterns table with the mode of matching
- patterns_type
Type of pattern for matching. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:
- patterns_type_col
Column with the type of pattern in case when patterns should have different types
- patterns_replacements_col
If patterns is not a vector and includes replacements which column to use for replacements. Default is 2.
- replacements
If patterns does not have column with replacements provide it here.
- ...
Arguments passed on to
standardize_options
col
Column of interest (the one we need to standardize) in the
x
object (if it is data.frame like).rows
Logical vector to filter records of interest. Default is NULL which means do not filter records.
omitted_rows_value
If
rows
parameter is set then mergeomitted_rows_value
with the results (filtered byrows
). Either single string or a character vector of lengthnrow(x)
. If NULL (the default) then original values ofcol
are merged with results.output_placement
Where to inset retults (standardized vector) in the
x
object. Default options is 'replace_col' which overwrides thecol
inx
with results. Other options:'omit' :: do not write results back to table (usually used when
append_output_copy
is set for temporary values)'prepend_to_col' :: prepend to
col
'append_to_col' :: append to
col
'prepend_to_x' :: prepend to
x
data.frame like object'append_to_x' :: append to
x
data.frame like object
x_atomic_name
If
x
is vector use this name for original column if it is in results. Default is "x". Ifx
is table the name ofcol
will be used.output_col_name
Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:
append_output_copy
Whether to append a copy of result vector to
x
objectoutput_copy_col_name
How the append copy wiil be named
Value
If nothing was indicated to cbind to results then it returns standardized vector. If something needs to be cbind then it returns data.table
See also
replace_patterns
Other magerman:
cockburn_detect_corp()
,
cockburn_detect_govt()
,
cockburn_detect_hosp()
,
cockburn_detect_indiv()
,
cockburn_detect_inst_conds_1()
,
cockburn_detect_inst()
,
cockburn_detect_univ()
,
cockburn_detect_uspto()
,
cockburn_remove_standard_names()
,
cockburn_remove_uspto()
,
cockburn_replace_compustat_names()
,
cockburn_replace_compustat()
,
cockburn_replace_derwent()
,
cockburn_replace_govt()
,
cockburn_replace_univ()
,
magerman_condense()
,
magerman_detect_characters()
,
magerman_detect_comma_period_irregularities()
,
magerman_detect_legal_form_beginning()
,
magerman_detect_legal_form_end()
,
magerman_detect_legal_form_middle()
,
magerman_detect_umlaut()
,
magerman_remove_common_words_anywhere()
,
magerman_remove_common_words_at_the_beginning()
,
magerman_remove_common_words_at_the_end()
,
magerman_remove_double_quotation_marks_beginning_end()
,
magerman_remove_double_quotation_marks_irregularities()
,
magerman_remove_double_spaces()
,
magerman_remove_html_codes()
,
magerman_remove_non_alphanumeric_at_the_beginning()
,
magerman_remove_non_alphanumeric_at_the_end()
,
magerman_remove_special_characters()
,
magerman_replace_comma_period_irregularities_all()
,
magerman_replace_comma_period_irregularities()
,
magerman_replace_legal_form_beginning()
,
magerman_replace_legal_form_end()
,
magerman_replace_legal_form_middle()
,
magerman_replace_proprietary_characters()
,
magerman_replace_sgml_characters()
,
magerman_replace_spelling_variation()
,
standardize_eee_ppat()