Remove line breaks
I can't seem to find this function in the Library or Forum.
It's a matter of replacing line breaks with a space. This would transform a paragraph of broken sentences as normally found in a PDF to a continuous flow of words.
Ideally, it keeps paragraphs, which are usually double line breaks.
Any ideas?
It's a matter of replacing line breaks with a space. This would transform a paragraph of broken sentences as normally found in a PDF to a continuous flow of words.
Ideally, it keeps paragraphs, which are usually double line breaks.
Any ideas?
This replaces all line breaks in the selected text with spaces (tries to preserve double breaks):
Code: Select all
Clipboard:=""
Send, ^c
ClipWait
string:=clipboard
stringreplace, string, string, `r`n`r`n, ♥, all
stringreplace, string, string, `r`n, %A_Space%, all
stringreplace, string, string, ♥, `n`n, all
SendInput, % string
I found this older post for removing line breaks from PDF text. See code below. One simple question: is this a macro, or where should I add it in FK to make it work? Also, are the little heart figures really part of the code?
Marko wrote: ↑Jun 24th, ’22, 14:02 This replaces all line breaks in the selected text with spaces (tries to preserve double breaks):
Code: Select all
Clipboard:="" Send, ^c ClipWait string:=clipboard stringreplace, string, string, `r`n`r`n, ♥, all stringreplace, string, string, `r`n, %A_Space%, all stringreplace, string, string, ♥, `n`n, all SendInput, % string
This script replaces all line breaks in the selected text with spaces. It tries to preserve double breaks - heart characters in this case are used to mask double breaks. If you need to replace all breaks simply remove two lines with hearts.
Create a new Shortcut (Type: Command) and paste the code into the command field. To use it, first select the text you want to change in the editor and then press the shortcut.
Create a new Shortcut (Type: Command) and paste the code into the command field. To use it, first select the text you want to change in the editor and then press the shortcut.
Thanks, Tom for the clarification about the hearts and how to create the command. Success.
One last question which, if there's a solution, would greatly improve its usefulness. Often, PDF text includes double spaces scattered throughout it. Can the script be altered so that these are removed? I've included the original script below.
Clipboard:=""
Send, ^c
ClipWait
string:=clipboard
stringreplace, string, string, `r`n`r`n, ♥, all
stringreplace, string, string, `r`n, %A_Space%, all
stringreplace, string, string, ♥, `n`n, all
SendInput, % string
John
One last question which, if there's a solution, would greatly improve its usefulness. Often, PDF text includes double spaces scattered throughout it. Can the script be altered so that these are removed? I've included the original script below.
Clipboard:=""
Send, ^c
ClipWait
string:=clipboard
stringreplace, string, string, `r`n`r`n, ♥, all
stringreplace, string, string, `r`n, %A_Space%, all
stringreplace, string, string, ♥, `n`n, all
SendInput, % string
John
This script will replace any number of multiple spaces (double, triple etc.) in selected text with a single space. Set it as a shortcut, Type: Command
Or more fancy version using a RegEx:
Code: Select all
Clipboard:=""
Send, ^c
ClipWait
string:=clipboard
Loop
{
StringReplace, string, string, %A_Space%%A_Space%, %A_Space%, UseErrorLevel
if ErrorLevel = 0 ;no more replacements needed
break
}
SendRaw, % string
Code: Select all
Clipboard:=""
Send, ^c
ClipWait
string:=clipboard
string:=Trim(RegExReplace(string, "\h\K\h+"))
SendRaw, % string
Tom, Thank you for the two updates you provided. I have to report that the original version worked, but the updates actually added spaces, not removed them and also added fresh line breaks. Here's a sample below with the original script first and then the update second. I don't know how to fix either update; is there a simple solution? If not, the original does work even if it leaves occasional spaces:
ORIGINAL: The last generation has seen dramatic changes in psychiatric training and practice that have been caused by several interdependent forces. First, there has been a societal shift affecting not only psychiatry, but also the larger world, as resource limitation has emerged as a new worldview.
SECOND AND THIRD: The last generation has seen dramatic changes in psychiatric training
and practice that have been caused by several interdependent forces. First,
there has been a societal shift affecting not only psychiatry, but also the
John
ORIGINAL: The last generation has seen dramatic changes in psychiatric training and practice that have been caused by several interdependent forces. First, there has been a societal shift affecting not only psychiatry, but also the larger world, as resource limitation has emerged as a new worldview.
SECOND AND THIRD: The last generation has seen dramatic changes in psychiatric training
and practice that have been caused by several interdependent forces. First,
there has been a societal shift affecting not only psychiatry, but also the
John
I'm not sure I understand. Can you give me a quick example of original text and expected result?
Hi Tom,
I rudely and inadvertently didn't reply to your helpful scripts back in July 2 for resolving line break issues. You'd asked me for an example. I scratched my head about how to show you the differences, so I hope this is clear. As you'll see, all the scripts work in different ways. Obviously, it would be great to have a script that removes invisible line breaks, doesn't add spare lines, and remove all double spaces.
ORIGINAL (HAS INVISIBLE LINE BREAKS AT THE END)
Similarly, Deleuze and Guattari hold that there not only exists a social stratification of
“classes” organized around the binary organization of a line that distinguishes them
(man/women, upper class/lower class, city/country etc.) and that maintains each as a
rigid system. The same social groups can also be regarded as composed of masses that
constantly feed into each other.
FIRST SCRIPT CONVERTS THIS TO ONLY ONE LINE BREAK AT THE END BUT ADDS DOUBLE SPACES:
Similarly, Deleuze and Guattari hold that there not only exists a social stratification of “classes” organized around the binary organization of a line that distinguishes them (man/women, upper class/lower class, city/country etc.) and that maintains each as a rigid system. The same social groups can also be regarded as composed of masses that constantly feed into each other.
SECOND, 'FANCY' REGEX CONVERTS THE TEXT, ADDS DOUBLE LINE BREAKS BUT REMOVES ALL DOUBLE SPACES!:
Similarly, Deleuze and Guattari hold that there not only exists a social stratification of
“Classes” organized around the binary organization of a line that distinguishes them
(Man/Women, upper class/lower class, city/country etc.) and that maintains each as a
Rigid system. The same social groups can also be regarded as composed of masses that
Constantly feed into each other.
I don't know if you have a solution - I copied and pasted these examples using Word - but it does look close.
Regards,
John
I rudely and inadvertently didn't reply to your helpful scripts back in July 2 for resolving line break issues. You'd asked me for an example. I scratched my head about how to show you the differences, so I hope this is clear. As you'll see, all the scripts work in different ways. Obviously, it would be great to have a script that removes invisible line breaks, doesn't add spare lines, and remove all double spaces.
ORIGINAL (HAS INVISIBLE LINE BREAKS AT THE END)
Similarly, Deleuze and Guattari hold that there not only exists a social stratification of
“classes” organized around the binary organization of a line that distinguishes them
(man/women, upper class/lower class, city/country etc.) and that maintains each as a
rigid system. The same social groups can also be regarded as composed of masses that
constantly feed into each other.
FIRST SCRIPT CONVERTS THIS TO ONLY ONE LINE BREAK AT THE END BUT ADDS DOUBLE SPACES:
Similarly, Deleuze and Guattari hold that there not only exists a social stratification of “classes” organized around the binary organization of a line that distinguishes them (man/women, upper class/lower class, city/country etc.) and that maintains each as a rigid system. The same social groups can also be regarded as composed of masses that constantly feed into each other.
SECOND, 'FANCY' REGEX CONVERTS THE TEXT, ADDS DOUBLE LINE BREAKS BUT REMOVES ALL DOUBLE SPACES!:
Similarly, Deleuze and Guattari hold that there not only exists a social stratification of
“Classes” organized around the binary organization of a line that distinguishes them
(Man/Women, upper class/lower class, city/country etc.) and that maintains each as a
Rigid system. The same social groups can also be regarded as composed of masses that
Constantly feed into each other.
I don't know if you have a solution - I copied and pasted these examples using Word - but it does look close.
Regards,
John