Cleanup Goverment Organizations (Non-Corporates group)
Source:R/cockburn.r
cockburn_replace_govt.RdFrom non_corporates.do file. Source - https://sites.google.com/site/patentdataproject/Home/posts/namestandardizationroutinesuploaded
Usage
cockburn_replace_govt(
x,
patterns = cockburn_patterns_govt_cleanup,
patterns_col = 1,
patterns_mode = "all",
patterns_mode_col = NULL,
patterns_type = "fixed",
patterns_type_col = NULL,
patterns_replacements_col = 2,
replacements = if (is.atomic(patterns)) "" else NULL,
...
)Arguments
- x
Vector or table to standardize.
- patterns
Accepts both vector or table. If patterns is a table can also include replacements column.
- patterns_col
If patterns is not a vector which column to use. Default is 1.
- patterns_mode
Mode of matching. Could be one of c("all", "first", "last"). The default is "all" (it is 2x faster than "first" and "last" because of handy stri_replace_all_* functions). Also possible to pass a vector (same length as patterns)
- patterns_mode_col
Column in patterns table with the mode of matching
- patterns_type
Type of pattern for matching. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:
- patterns_type_col
Column with the type of pattern in case when patterns should have different types
- patterns_replacements_col
If patterns is not a vector and includes replacements which column to use for replacements. Default is 2.
- replacements
If patterns does not have column with replacements provide it here.
- ...
Arguments passed on to
standardize_optionscolColumn of interest (the one we need to standardize) in the
xobject (if it is data.frame like).rowsLogical vector to filter records of interest. Default is NULL which means do not filter records.
omitted_rows_valueIf
rowsparameter is set then mergeomitted_rows_valuewith the results (filtered byrows). Either single string or a character vector of lengthnrow(x). If NULL (the default) then original values ofcolare merged with results.output_placementWhere to inset retults (standardized vector) in the
xobject. Default options is 'replace_col' which overwrides thecolinxwith results. Other options:'omit' :: do not write results back to table (usually used when
append_output_copyis set for temporary values)'prepend_to_col' :: prepend to
col'append_to_col' :: append to
col'prepend_to_x' :: prepend to
xdata.frame like object'append_to_x' :: append to
xdata.frame like object
x_atomic_nameIf
xis vector use this name for original column if it is in results. Default is "x". Ifxis table the name ofcolwill be used.output_col_nameUse this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:
append_output_copyWhether to append a copy of result vector to
xobjectoutput_copy_col_nameHow the append copy wiil be named
See also
replace_patterns
Other magerman:
cockburn_detect_corp(),
cockburn_detect_govt(),
cockburn_detect_hosp(),
cockburn_detect_indiv(),
cockburn_detect_inst_conds_1(),
cockburn_detect_inst(),
cockburn_detect_univ(),
cockburn_detect_uspto(),
cockburn_remove_standard_names(),
cockburn_remove_uspto(),
cockburn_replace_compustat_names(),
cockburn_replace_compustat(),
cockburn_replace_derwent(),
cockburn_replace_univ(),
magerman_condense(),
magerman_detect_characters(),
magerman_detect_comma_period_irregularities(),
magerman_detect_legal_form_beginning(),
magerman_detect_legal_form_end(),
magerman_detect_legal_form_middle(),
magerman_detect_umlaut(),
magerman_remove_common_words_anywhere(),
magerman_remove_common_words_at_the_beginning(),
magerman_remove_common_words_at_the_end(),
magerman_remove_double_quotation_marks_beginning_end(),
magerman_remove_double_quotation_marks_irregularities(),
magerman_remove_double_spaces(),
magerman_remove_html_codes(),
magerman_remove_non_alphanumeric_at_the_beginning(),
magerman_remove_non_alphanumeric_at_the_end(),
magerman_remove_special_characters(),
magerman_replace_accented_characters(),
magerman_replace_comma_period_irregularities_all(),
magerman_replace_comma_period_irregularities(),
magerman_replace_legal_form_beginning(),
magerman_replace_legal_form_end(),
magerman_replace_legal_form_middle(),
magerman_replace_proprietary_characters(),
magerman_replace_sgml_characters(),
magerman_replace_spelling_variation(),
standardize_eee_ppat()