Cleanup Goverment Organizations (Non-Corporates group)
Source:R/cockburn.r
cockburn_replace_govt.Rd
From non_corporates.do file. Source - https://sites.google.com/site/patentdataproject/Home/posts/namestandardizationroutinesuploaded
Usage
cockburn_replace_govt(
x,
patterns = cockburn_patterns_govt_cleanup,
patterns_col = 1,
patterns_mode = "all",
patterns_mode_col = NULL,
patterns_type = "fixed",
patterns_type_col = NULL,
patterns_replacements_col = 2,
replacements = if (is.atomic(patterns)) "" else NULL,
...
)
Arguments
- x
Vector or table to standardize.
- patterns
Accepts both vector or table. If patterns is a table can also include replacements column.
- patterns_col
If patterns is not a vector which column to use. Default is 1.
- patterns_mode
Mode of matching. Could be one of c("all", "first", "last"). The default is "all" (it is 2x faster than "first" and "last" because of handy stri_replace_all_* functions). Also possible to pass a vector (same length as patterns)
- patterns_mode_col
Column in patterns table with the mode of matching
- patterns_type
Type of pattern for matching. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:
- patterns_type_col
Column with the type of pattern in case when patterns should have different types
- patterns_replacements_col
If patterns is not a vector and includes replacements which column to use for replacements. Default is 2.
- replacements
If patterns does not have column with replacements provide it here.
- ...
Arguments passed on to
standardize_options
col
Column of interest (the one we need to standardize) in the
x
object (if it is data.frame like).rows
Logical vector to filter records of interest. Default is NULL which means do not filter records.
omitted_rows_value
If
rows
parameter is set then mergeomitted_rows_value
with the results (filtered byrows
). Either single string or a character vector of lengthnrow(x)
. If NULL (the default) then original values ofcol
are merged with results.output_placement
Where to inset retults (standardized vector) in the
x
object. Default options is 'replace_col' which overwrides thecol
inx
with results. Other options:'omit' :: do not write results back to table (usually used when
append_output_copy
is set for temporary values)'prepend_to_col' :: prepend to
col
'append_to_col' :: append to
col
'prepend_to_x' :: prepend to
x
data.frame like object'append_to_x' :: append to
x
data.frame like object
x_atomic_name
If
x
is vector use this name for original column if it is in results. Default is "x". Ifx
is table the name ofcol
will be used.output_col_name
Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:
append_output_copy
Whether to append a copy of result vector to
x
objectoutput_copy_col_name
How the append copy wiil be named
See also
replace_patterns
Other magerman:
cockburn_detect_corp()
,
cockburn_detect_govt()
,
cockburn_detect_hosp()
,
cockburn_detect_indiv()
,
cockburn_detect_inst_conds_1()
,
cockburn_detect_inst()
,
cockburn_detect_univ()
,
cockburn_detect_uspto()
,
cockburn_remove_standard_names()
,
cockburn_remove_uspto()
,
cockburn_replace_compustat_names()
,
cockburn_replace_compustat()
,
cockburn_replace_derwent()
,
cockburn_replace_univ()
,
magerman_condense()
,
magerman_detect_characters()
,
magerman_detect_comma_period_irregularities()
,
magerman_detect_legal_form_beginning()
,
magerman_detect_legal_form_end()
,
magerman_detect_legal_form_middle()
,
magerman_detect_umlaut()
,
magerman_remove_common_words_anywhere()
,
magerman_remove_common_words_at_the_beginning()
,
magerman_remove_common_words_at_the_end()
,
magerman_remove_double_quotation_marks_beginning_end()
,
magerman_remove_double_quotation_marks_irregularities()
,
magerman_remove_double_spaces()
,
magerman_remove_html_codes()
,
magerman_remove_non_alphanumeric_at_the_beginning()
,
magerman_remove_non_alphanumeric_at_the_end()
,
magerman_remove_special_characters()
,
magerman_replace_accented_characters()
,
magerman_replace_comma_period_irregularities_all()
,
magerman_replace_comma_period_irregularities()
,
magerman_replace_legal_form_beginning()
,
magerman_replace_legal_form_end()
,
magerman_replace_legal_form_middle()
,
magerman_replace_proprietary_characters()
,
magerman_replace_sgml_characters()
,
magerman_replace_spelling_variation()
,
standardize_eee_ppat()