Skip to contents

All functions

browse_dot_graph()
Generates a temporary html file with visualization of given nstandr procedures and opens it in a browser (specified in options('browser')). This requires DiagrammeR package to be installed. If you do not have those you can cat() the returned string from `nstandr:::make_dot_graph()' to the R console and copy it to some web tool for dot visualization (e.g.http://magjac.com/graphviz-visual-editor)
check_rows()
Assumes that rows (if logical) are same length as x
cockburn_combabbrev()
Collapses single character sequences
cockburn_detect_corp()
Detect Corporates (code - 'firm')
cockburn_detect_govt()
Detect Goverment Organizations (Non-Corporates group)
cockburn_detect_hosp()
Detect Hospitals (Non-Corporates group)
cockburn_detect_indiv()
Detect Individuals (Non-Corporates group)
cockburn_detect_inst()
Detect Non-profit Institutes (Non-Corporates group)
cockburn_detect_inst_conds()
Detects Non-profit institutes with special conditions
cockburn_detect_inst_conds_1()
Detects Non-profit institutes with special conditions
cockburn_detect_inst_conds_2()
Detects Non-profit institutes with special conditions
cockburn_detect_inst_german()
Detects German Non-profit institutes
cockburn_detect_type()
Identifies Entity Type
cockburn_detect_univ()
Detect Universities (Non-Corporates group)
cockburn_detect_uspto()
Special USPTO codes. Codes as 'indiv'
cockburn_remove_standard_names()
Creates so called stem name (a name with all legal entity identifiers removed)
cockburn_remove_uspto()
Removes special USPTO codes.
cockburn_replace_compustat()
COMPUSTAT specific standardization for organizational names
cockburn_replace_compustat_names()
COMPUSTAT specific standardization for organizational names. Full name replacements.
cockburn_replace_derwent()
Performs Derwent standardization of organizational names
cockburn_replace_govt()
Cleanup Goverment Organizations (Non-Corporates group)
cockburn_replace_punctuation()
Removes punctuation and standardise some symbols.
cockburn_replace_standard_names()
Create standard name
cockburn_replace_type()
Cleanup Entity Type
cockburn_replace_univ()
Cleanup Universities (Non-Corporates group)
defactor()
Defactor the object
defactor_vector()
Converts factor to character
detect_legal_form()
Detect legal form
detect_patterns()
Codes strings (e.g., organizational names) based on certain patterns
escape_regex()
Escapes special for regex characters
escape_regex_for_type()
Escapes special for different types of pattern
escape_regex_for_types()
Escapes special for regex characters conditionally
get_dots()
Provides access to arguments of nested functions. Sort of an alterative mechanism to passing ... arguments but with more features.
get_standardize_options()
Gets standardize_options at point with consistent updates up through calling stack.
get_target()
Gets a target vector to standardize.
get_vector()
Gets vector by column and defactor if needed. Optionaly one can provide a fallback_value which will be returned if col is not specified.
inset_target()
Insets target vector back to input object (x)
is_empty()
Checks if string has something to print
magerman_condense()
Condensing names
magerman_detect_characters()
Detect candidates for characters that need to be cleaned
magerman_detect_comma_period_irregularities()
Detects comma period irregularities
magerman_detect_legal_form()
Detect legal form
magerman_detect_legal_form_beginning()
Detects legal form at the beginning of a name
magerman_detect_legal_form_end()
Detects legal form at the end of a name
magerman_detect_legal_form_middle()
Detects legal form in the middle of a name
magerman_detect_umlaut()
Detect umlauts
magerman_remove_common_words()
Remove common words
magerman_remove_common_words_anywhere()
Removes common words anywhere in a name
magerman_remove_common_words_at_the_beginning()
Removes common words at the beginning of a name
magerman_remove_common_words_at_the_end()
Removes common words at the end of a name
magerman_remove_double_quotation_marks_beginning_end()
Removes double quotation irregularities
magerman_remove_double_quotation_marks_irregularities()
Removes double quotation irregularities
magerman_remove_double_spaces()
Removes double spaces
magerman_remove_html_codes()
Removes html codes
magerman_remove_legal_form()
Removes legal form
magerman_remove_legal_form_and_clean()
Removes legal form
magerman_remove_non_alphanumeric_at_the_beginning()
Removes non alphanumeric characters at the beginning of a name
magerman_remove_non_alphanumeric_at_the_end()
Removes non alphanumeric characters at the end of a name
magerman_remove_special_characters()
Removes special characters
magerman_replace_accented_characters()
Replaces accented characters
magerman_replace_comma_period_irregularities()
Replaces comma period irregularities
magerman_replace_comma_period_irregularities_all()
Replaces comma and period irregularities
magerman_replace_legal_form_beginning()
Replaces legal form at the beginning of a name
magerman_replace_legal_form_end()
Replaces legal form at the end of a name
magerman_replace_legal_form_middle()
Replace legal form in the middle of a name
magerman_replace_proprietary_characters()
Replaces proprietary characters
magerman_replace_sgml_characters()
Replaces sgml characters
magerman_replace_spelling_variation()
Replaces spelling variation
magerman_replace_umlaut()
Replaces Umlauts
make_dot_edges()
Makes dot graph edges for visualizing arrows between sequence of procedures.
make_dot_graph()
Generates graph description for visualizing list of procedures in dot format.
make_dot_nodes()
Generates description of dot graph nodes.
paste_dot_node()
Makes a dot node (as html table) from procedure's attributes.
paste_dot_node_tr_td()
Makes TR TD record for dot node TABLE
replace_patterns()
A wrapper for string replacement and cbinding some columns.
save_dot_graph_as()
Saves dot graph as file using system command 'dot' from GraphViz (https://graphviz.org/) if installed.
standardize() make_std_names() make_standard_names() nstand()
Standardizes organizational names. Takes either vector or column in the table.
standardize_cockburn()
Standardizes strings using exact procedures described in Cockburn, et al. (2009)
standardize_dehtmlize()
Converts HTML characters to UTF-8
standardize_detect_enc()
Detects string encoding
standardize_is_data_empty()
Checks if all elements in vector(s) are either "", NA, NULL or have zero length
standardize_magerman()
Standardizes strings using exact procedures described in Magerman et al. 2009.
standardize_make_procedures_list()
Makes list of procedures calls from table.
standardize_omit_empty()
Removes elements that are either "", NA, NULL or have zero length
standardize_options()
Does nothing but stores (as its own default arguments) options that control vector handeling through standardization process. These options are available in most nstandr functions that accept ... parameter.
standardize_remove_brackets()
Removes brackets and content in brackets
standardize_remove_quotes()
Removes double quotes
standardize_squish_spaces()
Removes redundant whitespases
standardize_toascii()
Translates non-ascii symbols to its ascii equivalent
standardize_toupper()
Uppercases vector of interest in the object (table)
standardize_x_split()
Splits the object (table) in chunks by rows
unlist_if_possible()
If column in the x table is list unlist it if possible
visualize()
Visualizes list of procedures.
x_length()
Gets lengths of the object
x_width()
Gets width of the object