K.S. Bhaskar on 18 Apr 2016 08:45:56 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] >32K concurrent processes


Rohit --

Here's how you can replicate what I am attempting.

First, in your 64-bit Linux, you will need to install the 64-bit GT.M V6.3-000 release (as older releases are limited to <32K processes, it must be the latest release). If you're running Debian Stretch (Testing) or Sid (Unstable), use sudo apt-get install fis-gtm which installs GT.M in /usr/lib/fis-gtm/<version>. If you're not running Debian Stretch or Sid, but some other contemporary version of a major Linux distribution, you can download and use the latest version of gtminstall the GT.M installer (https://sourceforge.net/projects/fis-gtm/files/GT.M%20Installer/v0.13/gtminstall) which downloads and installs the tarball, or you can download the binary distribution tarball (https://sourceforge.net/projects/fis-gtm/files/GT.M-amd64-Linux/V6.3-000/gtm_V63000_linux_x8664_pro.tar.gz) and run the configure script in the tarball (you can use the defaults, and install it in a directory, like /opt/fis-gtm/V6.3-000_x86_64 using sudo ./configure (yes, installing GT.M does require root as it has a component that is run as suid root).

I'd just use the GT.M installer: ./gtminstall --help gives you a list of options and something like sudo ./gtminstall --verbose --utf8 default --installdir /opt/fis-gtm/V6.3-000_x86_64 should do what's needed.

Once GT.M is installed, create a default directory to use, point gtmdir to it, and set up environment variables. The first time you do this, GT.M will give you a bunch of output about creating and initializing a database, e.g.:

kbhaskar@bhaskark:~$ mkdir /testarea/ticktock
kbhaskar@bhaskark:~$ export gtmdir=/testarea/ticktock
kbhaskar@bhaskark:~$ source /usr/lib/fis-gtm/V6.3-000_x86_64/gtmprofile
%GDE-I-GDUSEDEFS, Using defaults for Global Directory 
/testarea/ticktock/V6.3-000_x86_64/g/gtm.gld

GDE> 
%GDE-I-EXECOM, Executing command file /usr/lib/fis-gtm/V6.3-000_x86_64/gdedefaults

GDE> 
%GDE-I-VERIFY, Verification OK

%GDE-I-GDCREATE, Creating Global Directory file 
/testarea/ticktock/V6.3-000_x86_64/g/gtm.gld
Created file /testarea/ticktock/V6.3-000_x86_64/g/gtm.dat
%GTM-I-JNLCREATE, Journal file /testarea/ticktock/V6.3-000_x86_64/g/gtm.mjl created for region DEFAULT with BEFOR
E_IMAGES
%GTM-I-JNLSTATE, Journaling state for region DEFAULT is now ON
kbhaskar@bhaskark:~$ tree $gtmdir
/testarea/ticktock
├── r
└── V6.3-000_x86_64
    ├── g
    │   ├── gtm.dat
    │   ├── gtm.gld
    │   └── gtm.mjl
    ├── o
    │   └── utf8
    └── r

6 directories, 3 files
kbhaskar@bhaskark:~$ 

Thereafter you can just set gtmdir and source gtmprofile.

Put the attached ticktock6.m file in the r subdirectory:

kbhaskar@bhaskark:~$ ls -l $gtmdir/r
total 12
-rw-r--r-- 1 kbhaskar gtc 8428 Apr 18 10:43 ticktock6.m
kbhaskar@bhaskark:~$ 

Delete the default database and journal file, and then change the "global directory" so that new databases are created with the appropriate parameters for the test. In the event the system crashes, you will likely need to power down the machine, delete the database and create new ones. The default environment sets up journaling and recovery so that the databases are recoverable after a crash, but since this test pushes the machine to its limits - and possible beyond - we don't want to use resources journaling.

kbhaskar@bhaskark:~$ rm -f $gtmdir/$gtmver/g/gtm.{dat,mjl}
kbhaskar@bhaskark:~$ gde
%GDE-I-LOADGD, Loading Global Directory file 
/testarea/ticktock/V6.3-000_x86_64/g/gtm.gld
%GDE-I-VERIFY, Verification OK


GDE> change -region DEFAULT -qdbrundown -nojournal
GDE> change -segment DEFAULT -lock_space=65536
GDE> exit
%GDE-I-VERIFY, Verification OK

%GDE-I-GDUPDATE, Updating Global Directory file 
/testarea/ticktock/V6.3-000_x86_64/g/gtm.gld
kbhaskar@bhaskark:~$ 

Here is how you create a new database file:

kbhaskar@bhaskark:~$ mupip create
Created file /testarea/ticktock/V6.3-000_x86_64/g/gtm.dat
kbhaskar@bhaskark:~$ 

Verify that the ticktock6 program runs:

kbhaskar@bhaskark:~$ mumps -run ticktock6 --help
Ramp up load to test number of concurrent processes
Usage: mumps -run ticktock5 [options] where options are:
--proc[esses]=<processes>    - defaults to 100
--bloc[ksize]=<blocksize>    - defaults to processes, must be not greater than processes
--dur[ation]=<duration>      - defaults to 10 seconds
--hang[time]=<looptime>      - defaults 1 second
--help       - prints helpful information and exits
--poll[time]=<polltime>      - defaults to 0 (see below)
--zdir[ectory]=<zdirectory>  - defaults to current directory
--verb[osity]=<verbosity>    - defaults to 1
or do call^ticktock5(processes,blocksize,duration,polltime,hangtime,zdirectory,verbosity).
Anything after the options (i.e., not starting with "--")on the command line is ignored.
Note: database must have enough lock space for one lock per process.
No claim of copyright is made with regard to this code.
kbhaskar@bhaskark:~$ 

Now, you're ready to try ramping up the load, which creates --proc processes in blocks of --bloc processes,with each process performing a hang of --hang seconds between database accesses. Here's a good load to try that should run on just about any machine (the rm command is to remove stdout and stderr files from job'd processes from previous runs):

kbhaskar@bhaskark:~$ rm -f *.mj[oe] ; mumps -run ticktock6 --proc=4000 --bloc=500 --hang=10 --dur=20 --verb=2
2016-04-18 11:31:57.720383 Starting
64026,41519,230300,14400: ticktock6: Waiting for 500 processes
64026,41529,231246,14400: ticktock6: Found 500 processes
64026,41530,582977,14400: ticktock6: Waiting for 1000 processes
64026,41540,586466,14400: ticktock6: Found 1000 processes
64026,41541,975515,14400: ticktock6: Waiting for 1500 processes
64026,41551,978126,14400: ticktock6: Found 1500 processes
64026,41553,664803,14400: ticktock6: Waiting for 2000 processes
64026,41563,665889,14400: ticktock6: Found 2000 processes
64026,41565,405776,14400: ticktock6: Waiting for 2500 processes
64026,41575,410612,14400: ticktock6: Found 2500 processes
64026,41577,409844,14400: ticktock6: Waiting for 3000 processes
64026,41587,423150,14400: ticktock6: Found 3000 processes
64026,41595,305134,14400: ticktock6: Waiting for 3500 processes
64026,41605,341080,14400: ticktock6: Found 3500 processes
64026,41612,77441,14400: ticktock6: Waiting for 4000 processes
64026,41622,188309,14400: ticktock6: Found 4000 processes
64026,41622,188375,14400: ticktock6: Launched 4000 processes in 104.458633 seconds
64026,41642,763781,14400: ticktock6: Time over, waiting for children to quit
2016-04-18 11:34:42.28382 Done
2016-04-18 11:34:42.28405 Terminating
kbhaskar@bhaskark:~$ 

The four numbers at the beginning of each line are times (as reported GT.M to application code): the number of days since Dec 31, 1840, the number of seconds since midnight, the number of microseconds within the current second, and offset of the reported time from UTC. Note that when it says time over, it can take --dur seconds for all processes to complete, and if the system gets heavily loaded and starts paging, it can take longer.

To ramp up to >32K processes, try a command like rm -f *.mj[oe] ; mumps -run ticktock6 --proc=33000 --bloc=4125 --hang=10 --dur=20 --verb=2 (remember to set pid_max first, as well as any other settings you want to play with, and make sure you know where the power switch is or where to pull the cord!).

Thanks for your help, Any results you come up with are very appreciated. Please send any questions my way (and I hope the above are correct and complete).

Regards
-- Bhaskar


On Sun, Apr 17, 2016 at 11:32 AM, Rohit Mehta <ro@paper-mill.com> wrote:
Sure, I don't have that kind of hardware to play with, but I'd be interested in looking at the application.

On Sun, Apr 17, 2016 at 11:12 AM, K.S. Bhaskar <bhaskar@bhaskars.com> wrote:
Thanks for the links Rohit. I'll check them out. The storage is SSD, the processes do minimal IO - I'm just trying to establish the ability to have a file open by more than 32K processes, and I'm clearly running into a system limit. This is a development machine (16 cores, 32GB RAM - the production machine has something like 64 cores and 512GB RAM), but I can't get you access to poke around because it is inside a corporate network.

However, as the software is all open source, I can easily help you get set up to poke around using your own system, if you want. Please let me know.

Regards
-- Bhaskar


On Sun, Apr 17, 2016 at 10:54 AM, Rohit Mehta <ro@paper-mill.com> wrote:
Some kernel parameters to research (which may not be right for your application)

Do these process do a lot of IO?  Is your storage rotational media or SSD?  Can your application run off ramdisk storage?  Have you tried enabling hyperthreading?

Do you have the ability to test application loads non-production system?  If so i'd be interesting in helping you poke around.  It might be an education for me.  


On Sun, Apr 17, 2016 at 10:42 AM, Rohit Mehta <ro@paper-mill.com> wrote:
Back many years ago, I installed Oracle on my Debian workstation for fun, and I remember the guide had a lot of tweaks.  "ulimit" is one that I can think of, but I don't remember them all.  I'm poking around the internet to see if I can find the oracle guide (although it might not be relevant on newer kernels)

On Sun, Apr 17, 2016 at 10:27 AM, K.S. Bhaskar <bhaskar@bhaskars.com> wrote:
Thanks Steve, but in this case we have a customer need to crank up the number of processes on Linux.

Regards
-- Bhaskar

On Sat, Apr 16, 2016 at 4:09 PM, Steve Litt <slitt@troubleshooters.com> wrote:
On Fri, 15 Apr 2016 17:40:09 -0400
"K.S. Bhaskar" <bhaskar@bhaskars.com> wrote:

> I am trying to crank up more than 32K concurrent processes (the
> processes themselves hang and do a teeny bit of activity every 10
> seconds). But the OS (64-bit Debian 8 - Jessie) stubbornly refuses to
> crank up beyond 32K-ish processes. pid_max is set to a very large
> number (1M), so that's not it. Any suggestions on what limits to look
> for appreciated. Thank you very much.

This is old information, but back in the day people who wanted lots and
lots of processes used one of the BSDs to host that server.

SteveT

Steve Litt
April 2016 featured book: Rapid Learning for the 21st Century
http://www.troubleshooters.com/rl21
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug




___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug



___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug



___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


ticktock6
	; Ramp up load to test number of concurrent processes
	; Usage: mumps -run ticktock5 [options] where options are:
	; --proc[esses]=<processes>    - defaults to 100
	; --bloc[ksize]=<blocksize>    - defaults to processes, must be not greater than processes
	; --dur[ation]=<duration>      - defaults to 10 seconds
	; --hang[time]=<looptime>      - defaults 1 second
	; --help		       - prints helpful information and exits
	; --poll[time]=<polltime>      - defaults to 0 (see below)
	; --zdir[ectory]=<zdirectory>  - defaults to current directory
	; --verb[osity]=<verbosity>    - defaults to 1
	; or do call^ticktock5(processes,blocksize,duration,polltime,hangtime,zdirectory,verbosity).
	; Anything after the options (i.e., not starting with "--")on the command line is ignored.
	; Note: database must have enough lock space for one lock per process.
	; No claim of copyright is made with regard to this code.

	; In blocks of <blocksize> processes, the parent ticktock5 JOBs <processes> child
	; processes, with their stdout and stderr in directory <zdirectory>. It holds a lock that
	; blocks child processes from running. Each child process takes a lock indexed by its pid
	; that will block the parent from terminating, and then waits polling in a loop for the
	; parent to release the blocking lock. Once all child processes have started, the parent
	; releases the blocking lock, and hangs for<duration> seconds. Once that time is complete,
	; it sets a value in the database to tell child processes to terminate, and waits in a loop
	; polling for lock that indicates all child processes have terminated. In both loops
	; polling for a lock, zero polltime means LOCK is untimed.

	; Once each child process gets is released, it loops performing an activity in a loop that
	; executes every <looptime> seconds, until it sees the value in the database from the
	; parent indicating completion. At that point, it terminates, releasing its lock that is
	; blocking the parent.

	; <verbosity> of 1 (the default) produces some minimal output; higher values produce more
	; output; zero produces no output.

	; Set error handler to print error message and return error code to shell
	; (should write to stderr, but GT.M doesn't support that).
	; Warning: long line, to set a string to be Xecuted
	set $etrap="set $etrap=""use $principal write $zstatus,! zhalt 1"" set tmp1=$piece($ecode,"","",2),tmp2=$text(@tmp1) if $length(tmp2) write $text(+0),@$piece(tmp2,"";"",2),! zhalt +$extract(tmp1,2,$length(tmp1))"
	set:$stack $ecode=",U255,"	; top level entryref only supported when called with mumps -run
	use $principal:(ctrap=$char(3):ioexception="if $zjobexam()")

	new blocksize,cmdline,duration,nextparm,nextval,polltime,tmp,verbosity,zdirectory

	; Process command line
	set cmdline=$zcmdline
	for  quit:'$$trimleadingstr(.cmdline,"--")  do	; process command line options
	. set tmp=$$trimleadingpiece(.cmdline," ")
	. set nextparm=$piece(tmp,"=",1)
	. set nextval=$piece(tmp,"=",2)
	. if $$matchparm(nextparm,"blocksize","bloc") set blocksize=+nextval
	. else  if $$matchparm(nextparm,"duration","dur") set duration=+nextval
	. else  if $$matchparm(nextparm,"hangtime","hang") set hangtime=+nextval
	. else  if $$matchparm(nextparm,"help","help") do help ; process exits after printing help
	. else  if $$matchparm(nextparm,"polltime","poll") set polltime=+nextval
	. else  if $$matchparm(nextparm,"processes","proc") set processes=+nextval
	. else  if $$matchparm(nextparm,"verbosity","verb") set verbosity=+nextval
	. else  if $$matchparm(nextparm,"zdirectory","zdir") set zdirectory=nextval ; unlike other parms, not numeric
	. else  set $ecode=",U254," ; illegal comamnd line parameter
	set:'$data(processes) processes=100
	set:'$data(blocksize)!(blocksize>processes) blocksize=processes
	set:'$data(duration) duration=10
	set:'$data(hangtime) hangtime=1
	set:'$data(polltime) polltime=0
	set:'$data(processes) processes=100
	set:'$data(verbosity) verbosity=1
	set:'$data(zdirectory) zdirectory=$zdirectory
	set tmp=$zhorolog
	write:verbosity $zdate(tmp,"YYYY-MM-DD 24:60:SS"),".",$piece(tmp,",",3)," Starting",!
	zwrite:verbosity>2
	do call(processes,blocksize,duration,polltime,hangtime,zdirectory,verbosity)
	set tmp=$zhorolog
	write:verbosity $zdate(tmp,"YYYY-MM-DD 24:60:SS"),".",$piece(tmp,",",3)," Terminating",!
	quit

call(proc,blk,dur,poll,hang,dir,verb)	 ; call-in entry point
	zwrite:verbosity>2
	new i,j,reg,startat,time,tmp
	kill ^child,^parent,^stats
	set ^count=0
	view "resetgvstats"	; reset database statistics
	lock +^parent	; children will run when parent releases this lock after launching all children
	; JOB child processes
	set startat=$zhorolog
	for i=1:1:proc do  do:'$zjob jobfailexit
	. set tmp=dir_$select($extract(dir,$length(dir))'="/":"/",1:"")_$text(+0)_"_"_$job_"_"_i_".mj"
	. set job="job^"_$text(+0)_"($job,hang,poll,dur):(out="""_tmp_"o"":err="""_tmp_"e"")"
	. set time=$zhorolog
	. job @job
	. do:$zjob
	. . set ^parent($job,time,$text(+0),$zjob)=""
	. . do:verb>2 msg("^parent","Launched "_i_" processes",time)
	. . do:'(i#blk)
	. . . do:verb msg("^parent","Waiting for "_i_" processes",time)
	. . . for  quit:(i=^count)  hang hang
	. . . do:verb>1 msg("^parent","Found "_i_" processes",$zhorolog)
	; child processes JOB'd; wait for them to get ready to run
	set time=$zhorolog
	; Caution long line below - single expression
	do msg("^parent","Launched "_proc_" processes in "_($piece(time,",",1)-$piece(startat,",",1)*86400+$piece(time,",",2)-$piece(startat,",",2)+(($piece(time,",",3)-$piece(startat,",",3))/1E6))_" seconds",time)
	lock -^parent		  ; release lock to let children run
	hang dur		  ; let the children run for duration, wait till children exit, then quit
;	set ^parent($job)="Q"
	set time=$zhorolog
	do shutdown
	lock -^child
	set reg="" for  set reg=$view("gvnext",reg) quit:""=reg  do
	. set tmp=$view("gvstat",reg)
	. set ^stats($job,reg)=tmp
	set tmp=$zhorolog
	write:verbosity $zdate(tmp,"YYYY-MM-DD 24:60:SS"),".",$piece(tmp,",",3)," Done",!
	quit

help	; print helpful text and exit
	new i,tmp
	for i=2:1 set tmp=$piece($text(+i),"; ",2) quit:""=tmp  write tmp,!
	halt

job(par,wait,poll,dur)	; entry point for JOB'd processes
	new i,dure6,msg,start
	lock +^child($job)	; get lock so parent won't exit till this child process terminates
	do msg("^child","Process "_$job_" started by "_par_", hang time "_wait_", waiting to proceed",$zhorolog)
	if $increment(^count)	; increment count to let parent know one more child process is ready
	; wait for parent to release lock to allow child to run
	set dure6=dur*1E6 ; dur is in seconds; convert to microseconds since $zut is in microseconds
	for  lock:'poll +^parent($job) quit:'poll  lock:poll ^parent($job) quit:$test  hang wait
	set start=$zut
	for i=1:1 quit:$zut-start>dure6  set ^child($job,$zhorolog,par)=$R(2147483646) hang wait ; simulated workload
	do msg("^child","Stop after "_i_" operations",$zhorolog)
	if $increment(^count,-1)
	lock -^parent($job),-^child($job)
	quit

jobfailexit
	do shutdown
	set $ecode=",U253,"
	zhalt 254	; should never get here - previous line should terminate process


matchparm(s,x,y)
	; Return whether s matches a minimum abbreviation of x specified by y
	quit y=$extract(s,1,$length(y))&(s=$extract(x,1,$length(s)))

msg(var,msg,time)	; log a message
	set @var@($job,time,$text(+0))=msg
	write time,": ",$text(+0),": ",msg,!
	quit

shutdown
	; poll & verb are inherited from caller, not passed in
	new tmp
	do msg("^parent","Time over, waiting for children to quit",time)
	for  set tmp=^count lock:'poll +^child quit:'poll  lock:poll +^child:poll quit:$test  do
	. if '(tmp#blk) do:verb>2 msg("^parent",tmp_" processes still remain",$zhorolog)
	. else  do:verb>1 msg("^parent",tmp_" processes still remain",$zhorolog)
	quit

trimleadingpiece(s,x)
	; Remove and optionally return first piece of s with x as piece separator
	new tmp
	set tmp=$piece(s,x,1)
	set s=$piece(s,x,2,$length(s,x))
	quit:$quit tmp quit

trimleadingstr(s,x)
	; Return s without leading $length(x) characters; return 1/0 if called as function
	if x=$extract(s,1,$length(x)) set s=$extract(s,$length(x)+1,$length(s)) quit:$quit 1 quit
	else  quit:$quit 0 quit

;	Error message texts
M6	;" "_$zstatus
U253	;"-F-JOBFAIL Failed to start job "_i
U254	;"-F-BADPARM ""--"_nextparm_"="_nextval_""" Not a valid parameter"
U255	;"-F-BADINVOCATION Must invoke from shell as mumps -run "_$text(+0)
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug