Function reference
-
browse_dot_graph()
- Generates a temporary html file with visualization of given nstandr procedures and opens it in a browser (specified in
options('browser')
). This requiresDiagrammeR
package to be installed. If you do not have those you cancat()
the returned string from `nstandr:::make_dot_graph()' to the R console and copy it to some web tool for dot visualization (e.g.http://magjac.com/graphviz-visual-editor)
-
check_rows()
- Assumes that rows (if logical) are same length as x
-
cockburn_combabbrev()
- Collapses single character sequences
-
cockburn_detect_corp()
- Detect Corporates (code - 'firm')
-
cockburn_detect_govt()
- Detect Goverment Organizations (Non-Corporates group)
-
cockburn_detect_hosp()
- Detect Hospitals (Non-Corporates group)
-
cockburn_detect_indiv()
- Detect Individuals (Non-Corporates group)
-
cockburn_detect_inst()
- Detect Non-profit Institutes (Non-Corporates group)
-
cockburn_detect_inst_conds()
- Detects Non-profit institutes with special conditions
-
cockburn_detect_inst_conds_1()
- Detects Non-profit institutes with special conditions
-
cockburn_detect_inst_conds_2()
- Detects Non-profit institutes with special conditions
-
cockburn_detect_inst_german()
- Detects German Non-profit institutes
-
cockburn_detect_type()
- Identifies Entity Type
-
cockburn_detect_univ()
- Detect Universities (Non-Corporates group)
-
cockburn_detect_uspto()
- Special USPTO codes. Codes as 'indiv'
-
cockburn_remove_standard_names()
- Creates so called stem name (a name with all legal entity identifiers removed)
-
cockburn_remove_uspto()
- Removes special USPTO codes.
-
cockburn_replace_compustat()
- COMPUSTAT specific standardization for organizational names
-
cockburn_replace_compustat_names()
- COMPUSTAT specific standardization for organizational names. Full name replacements.
-
cockburn_replace_derwent()
- Performs Derwent standardization of organizational names
-
cockburn_replace_govt()
- Cleanup Goverment Organizations (Non-Corporates group)
-
cockburn_replace_punctuation()
- Removes punctuation and standardise some symbols.
-
cockburn_replace_standard_names()
- Create standard name
-
cockburn_replace_type()
- Cleanup Entity Type
-
cockburn_replace_univ()
- Cleanup Universities (Non-Corporates group)
-
defactor()
- Defactor the object
-
defactor_vector()
- Converts factor to character
-
detect_legal_form()
- Detect legal form
-
detect_patterns()
- Codes strings (e.g., organizational names) based on certain patterns
-
escape_regex()
- Escapes special for regex characters
-
escape_regex_for_type()
- Escapes special for different types of pattern
-
escape_regex_for_types()
- Escapes special for regex characters conditionally
-
get_dots()
- Provides access to arguments of nested functions. Sort of an alterative mechanism to passing
...
arguments but with more features.
-
get_standardize_options()
- Gets
standardize_options
at point with consistent updates up through calling stack.
-
get_target()
- Gets a target vector to standardize.
-
get_vector()
- Gets vector by column and defactor if needed. Optionaly one can provide a fallback_value which will be returned if col is not specified.
-
inset_target()
- Insets target vector back to input object (
x
)
-
is_empty()
- Checks if string has something to print
-
magerman_condense()
- Condensing names
-
magerman_detect_characters()
- Detect candidates for characters that need to be cleaned
-
magerman_detect_comma_period_irregularities()
- Detects comma period irregularities
-
magerman_detect_legal_form()
- Detect legal form
-
magerman_detect_legal_form_beginning()
- Detects legal form at the beginning of a name
-
magerman_detect_legal_form_end()
- Detects legal form at the end of a name
-
magerman_detect_legal_form_middle()
- Detects legal form in the middle of a name
-
magerman_detect_umlaut()
- Detect umlauts
-
magerman_remove_common_words()
- Remove common words
-
magerman_remove_common_words_anywhere()
- Removes common words anywhere in a name
-
magerman_remove_common_words_at_the_beginning()
- Removes common words at the beginning of a name
-
magerman_remove_common_words_at_the_end()
- Removes common words at the end of a name
-
magerman_remove_double_quotation_marks_beginning_end()
- Removes double quotation irregularities
-
magerman_remove_double_quotation_marks_irregularities()
- Removes double quotation irregularities
-
magerman_remove_double_spaces()
- Removes double spaces
-
magerman_remove_html_codes()
- Removes html codes
-
magerman_remove_legal_form()
- Removes legal form
-
magerman_remove_legal_form_and_clean()
- Removes legal form
-
magerman_remove_non_alphanumeric_at_the_beginning()
- Removes non alphanumeric characters at the beginning of a name
-
magerman_remove_non_alphanumeric_at_the_end()
- Removes non alphanumeric characters at the end of a name
-
magerman_remove_special_characters()
- Removes special characters
-
magerman_replace_accented_characters()
- Replaces accented characters
-
magerman_replace_comma_period_irregularities()
- Replaces comma period irregularities
-
magerman_replace_comma_period_irregularities_all()
- Replaces comma and period irregularities
-
magerman_replace_legal_form_beginning()
- Replaces legal form at the beginning of a name
-
magerman_replace_legal_form_end()
- Replaces legal form at the end of a name
-
magerman_replace_legal_form_middle()
- Replace legal form in the middle of a name
-
magerman_replace_proprietary_characters()
- Replaces proprietary characters
-
magerman_replace_sgml_characters()
- Replaces sgml characters
-
magerman_replace_spelling_variation()
- Replaces spelling variation
-
magerman_replace_umlaut()
- Replaces Umlauts
-
make_dot_edges()
- Makes dot graph edges for visualizing arrows between sequence of procedures.
-
make_dot_graph()
- Generates graph description for visualizing list of procedures in dot format.
-
make_dot_nodes()
- Generates description of dot graph nodes.
-
paste_dot_node()
- Makes a dot node (as html table) from procedure's attributes.
-
paste_dot_node_tr_td()
- Makes TR TD record for dot node TABLE
-
replace_patterns()
- A wrapper for string replacement and cbinding some columns.
-
save_dot_graph_as()
- Saves dot graph as file using system command 'dot' from GraphViz (https://graphviz.org/) if installed.
-
standardize()
make_std_names()
make_standard_names()
nstand()
- Standardizes organizational names. Takes either vector or column in the table.
-
standardize_cockburn()
- Standardizes strings using exact procedures described in Cockburn, et al. (2009)
-
standardize_dehtmlize()
- Converts HTML characters to UTF-8
-
standardize_detect_enc()
- Detects string encoding
-
standardize_is_data_empty()
- Checks if all elements in vector(s) are either "", NA, NULL or have zero length
-
standardize_magerman()
- Standardizes strings using exact procedures described in Magerman et al. 2009.
-
standardize_make_procedures_list()
- Makes list of procedures calls from table.
-
standardize_omit_empty()
- Removes elements that are either "", NA, NULL or have zero length
-
standardize_options()
- Does nothing but stores (as its own default arguments) options that control vector handeling through standardization process. These options are available in most nstandr functions that accept
...
parameter.
-
standardize_remove_brackets()
- Removes brackets and content in brackets
-
standardize_remove_quotes()
- Removes double quotes
-
standardize_squish_spaces()
- Removes redundant whitespases
-
standardize_toascii()
- Translates non-ascii symbols to its ascii equivalent
-
standardize_toupper()
- Uppercases vector of interest in the object (table)
-
standardize_x_split()
- Splits the object (table) in chunks by rows
-
unlist_if_possible()
- If column in the
x
table is list unlist it if possible
-
visualize()
- Visualizes list of procedures.
-
x_length()
- Gets lengths of the object
-
x_width()
- Gets width of the object