K.S. Bhaskar on 18 Apr 2016 08:45:06 -0700
Re: [PLUG] >32K concurrent processes
Sure, I don't have that kind of hardware to play with, but I'd be interested in looking at the application.On Sun, Apr 17, 2016 at 11:12 AM, K.S. Bhaskar <bhaskar@bhaskars.com> wrote:Thanks for the links Rohit. I'll check them out. The storage is SSD, the processes do minimal IO - I'm just trying to establish the ability to have a file open by more than 32K processes, and I'm clearly running into a system limit. This is a development machine (16 cores, 32GB RAM - the production machine has something like 64 cores and 512GB RAM), but I can't get you access to poke around because it is inside a corporate network.However, as the software is all open source, I can easily help you get set up to poke around using your own system, if you want. Please let me know.Regards-- BhaskarOn Sun, Apr 17, 2016 at 10:54 AM, Rohit Mehta <ro@paper-mill.com> wrote:Some kernel parameters to research (which may not be right for your application)and /etc/security.conf changes http://stackoverflow.com/questions/9361816/maximum-number-of-processes-in-linuxDo these process do a lot of IO? Is your storage rotational media or SSD? Can your application run off ramdisk storage? Have you tried enabling hyperthreading?Do you have the ability to test application loads non-production system? If so i'd be interesting in helping you poke around. It might be an education for me.On Sun, Apr 17, 2016 at 10:42 AM, Rohit Mehta <ro@paper-mill.com> wrote:Back many years ago, I installed Oracle on my Debian workstation for fun, and I remember the guide had a lot of tweaks. "ulimit" is one that I can think of, but I don't remember them all. I'm poking around the internet to see if I can find the oracle guide (although it might not be relevant on newer kernels)On Sun, Apr 17, 2016 at 10:27 AM, K.S. Bhaskar <bhaskar@bhaskars.com> wrote:Thanks Steve, but in this case we have a customer need to crank up the number of processes on Linux.Regards-- BhaskarOn Sat, Apr 16, 2016 at 4:09 PM, Steve Litt <slitt@troubleshooters.com> wrote:On Fri, 15 Apr 2016 17:40:09 -0400
"K.S. Bhaskar" <bhaskar@bhaskars.com> wrote:
> I am trying to crank up more than 32K concurrent processes (the
> processes themselves hang and do a teeny bit of activity every 10
> seconds). But the OS (64-bit Debian 8 - Jessie) stubbornly refuses to
> crank up beyond 32K-ish processes. pid_max is set to a very large
> number (1M), so that's not it. Any suggestions on what limits to look
> for appreciated. Thank you very much.
This is old information, but back in the day people who wanted lots and
lots of processes used one of the BSDs to host that server.
Steve Litt
April 2016 featured book: Rapid Learning for the 21st Century
ticktock6 ; Ramp up load to test number of concurrent processes ; Usage: mumps -run ticktock5 [options] where options are: ; --proc[esses]=<processes> - defaults to 100 ; --bloc[ksize]=<blocksize> - defaults to processes, must be not greater than processes ; --dur[ation]=<duration> - defaults to 10 seconds ; --hang[time]=<looptime> - defaults 1 second ; --help - prints helpful information and exits ; --poll[time]=<polltime> - defaults to 0 (see below) ; --zdir[ectory]=<zdirectory> - defaults to current directory ; --verb[osity]=<verbosity> - defaults to 1 ; or do call^ticktock5(processes,blocksize,duration,polltime,hangtime,zdirectory,verbosity). ; Anything after the options (i.e., not starting with "--")on the command line is ignored. ; Note: database must have enough lock space for one lock per process. ; No claim of copyright is made with regard to this code. ; In blocks of <blocksize> processes, the parent ticktock5 JOBs <processes> child ; processes, with their stdout and stderr in directory <zdirectory>. It holds a lock that ; blocks child processes from running. Each child process takes a lock indexed by its pid ; that will block the parent from terminating, and then waits polling in a loop for the ; parent to release the blocking lock. Once all child processes have started, the parent ; releases the blocking lock, and hangs for<duration> seconds. Once that time is complete, ; it sets a value in the database to tell child processes to terminate, and waits in a loop ; polling for lock that indicates all child processes have terminated. In both loops ; polling for a lock, zero polltime means LOCK is untimed. ; Once each child process gets is released, it loops performing an activity in a loop that ; executes every <looptime> seconds, until it sees the value in the database from the ; parent indicating completion. At that point, it terminates, releasing its lock that is ; blocking the parent. ; <verbosity> of 1 (the default) produces some minimal output; higher values produce more ; output; zero produces no output. ; Set error handler to print error message and return error code to shell ; (should write to stderr, but GT.M doesn't support that). ; Warning: long line, to set a string to be Xecuted set $etrap="set $etrap=""use $principal write $zstatus,! zhalt 1"" set tmp1=$piece($ecode,"","",2),tmp2=$text(@tmp1) if $length(tmp2) write $text(+0),@$piece(tmp2,"";"",2),! zhalt +$extract(tmp1,2,$length(tmp1))" set:$stack $ecode=",U255," ; top level entryref only supported when called with mumps -run use $principal:(ctrap=$char(3):ioexception="if $zjobexam()") new blocksize,cmdline,duration,nextparm,nextval,polltime,tmp,verbosity,zdirectory ; Process command line set cmdline=$zcmdline for quit:'$$trimleadingstr(.cmdline,"--") do ; process command line options . set tmp=$$trimleadingpiece(.cmdline," ") . set nextparm=$piece(tmp,"=",1) . set nextval=$piece(tmp,"=",2) . if $$matchparm(nextparm,"blocksize","bloc") set blocksize=+nextval . else if $$matchparm(nextparm,"duration","dur") set duration=+nextval . else if $$matchparm(nextparm,"hangtime","hang") set hangtime=+nextval . else if $$matchparm(nextparm,"help","help") do help ; process exits after printing help . else if $$matchparm(nextparm,"polltime","poll") set polltime=+nextval . else if $$matchparm(nextparm,"processes","proc") set processes=+nextval . else if $$matchparm(nextparm,"verbosity","verb") set verbosity=+nextval . else if $$matchparm(nextparm,"zdirectory","zdir") set zdirectory=nextval ; unlike other parms, not numeric . else set $ecode=",U254," ; illegal comamnd line parameter set:'$data(processes) processes=100 set:'$data(blocksize)!(blocksize>processes) blocksize=processes set:'$data(duration) duration=10 set:'$data(hangtime) hangtime=1 set:'$data(polltime) polltime=0 set:'$data(processes) processes=100 set:'$data(verbosity) verbosity=1 set:'$data(zdirectory) zdirectory=$zdirectory set tmp=$zhorolog write:verbosity $zdate(tmp,"YYYY-MM-DD 24:60:SS"),".",$piece(tmp,",",3)," Starting",! zwrite:verbosity>2 do call(processes,blocksize,duration,polltime,hangtime,zdirectory,verbosity) set tmp=$zhorolog write:verbosity $zdate(tmp,"YYYY-MM-DD 24:60:SS"),".",$piece(tmp,",",3)," Terminating",! quit call(proc,blk,dur,poll,hang,dir,verb) ; call-in entry point zwrite:verbosity>2 new i,j,reg,startat,time,tmp kill ^child,^parent,^stats set ^count=0 view "resetgvstats" ; reset database statistics lock +^parent ; children will run when parent releases this lock after launching all children ; JOB child processes set startat=$zhorolog for i=1:1:proc do do:'$zjob jobfailexit . set tmp=dir_$select($extract(dir,$length(dir))'="/":"/",1:"")_$text(+0)_"_"_$job_"_"_i_".mj" . set job="job^"_$text(+0)_"($job,hang,poll,dur):(out="""_tmp_"o"":err="""_tmp_"e"")" . set time=$zhorolog . job @job . do:$zjob . . set ^parent($job,time,$text(+0),$zjob)="" . . do:verb>2 msg("^parent","Launched "_i_" processes",time) . . do:'(i#blk) . . . do:verb msg("^parent","Waiting for "_i_" processes",time) . . . for quit:(i=^count) hang hang . . . do:verb>1 msg("^parent","Found "_i_" processes",$zhorolog) ; child processes JOB'd; wait for them to get ready to run set time=$zhorolog ; Caution long line below - single expression do msg("^parent","Launched "_proc_" processes in "_($piece(time,",",1)-$piece(startat,",",1)*86400+$piece(time,",",2)-$piece(startat,",",2)+(($piece(time,",",3)-$piece(startat,",",3))/1E6))_" seconds",time) lock -^parent ; release lock to let children run hang dur ; let the children run for duration, wait till children exit, then quit ; set ^parent($job)="Q" set time=$zhorolog do shutdown lock -^child set reg="" for set reg=$view("gvnext",reg) quit:""=reg do . set tmp=$view("gvstat",reg) . set ^stats($job,reg)=tmp set tmp=$zhorolog write:verbosity $zdate(tmp,"YYYY-MM-DD 24:60:SS"),".",$piece(tmp,",",3)," Done",! quit help ; print helpful text and exit new i,tmp for i=2:1 set tmp=$piece($text(+i),"; ",2) quit:""=tmp write tmp,! halt job(par,wait,poll,dur) ; entry point for JOB'd processes new i,dure6,msg,start lock +^child($job) ; get lock so parent won't exit till this child process terminates do msg("^child","Process "_$job_" started by "_par_", hang time "_wait_", waiting to proceed",$zhorolog) if $increment(^count) ; increment count to let parent know one more child process is ready ; wait for parent to release lock to allow child to run set dure6=dur*1E6 ; dur is in seconds; convert to microseconds since $zut is in microseconds for lock:'poll +^parent($job) quit:'poll lock:poll ^parent($job) quit:$test hang wait set start=$zut for i=1:1 quit:$zut-start>dure6 set ^child($job,$zhorolog,par)=$R(2147483646) hang wait ; simulated workload do msg("^child","Stop after "_i_" operations",$zhorolog) if $increment(^count,-1) lock -^parent($job),-^child($job) quit jobfailexit do shutdown set $ecode=",U253," zhalt 254 ; should never get here - previous line should terminate process matchparm(s,x,y) ; Return whether s matches a minimum abbreviation of x specified by y quit y=$extract(s,1,$length(y))&(s=$extract(x,1,$length(s))) msg(var,msg,time) ; log a message set @var@($job,time,$text(+0))=msg write time,": ",$text(+0),": ",msg,! quit shutdown ; poll & verb are inherited from caller, not passed in new tmp do msg("^parent","Time over, waiting for children to quit",time) for set tmp=^count lock:'poll +^child quit:'poll lock:poll +^child:poll quit:$test do . if '(tmp#blk) do:verb>2 msg("^parent",tmp_" processes still remain",$zhorolog) . else do:verb>1 msg("^parent",tmp_" processes still remain",$zhorolog) quit trimleadingpiece(s,x) ; Remove and optionally return first piece of s with x as piece separator new tmp set tmp=$piece(s,x,1) set s=$piece(s,x,2,$length(s,x)) quit:$quit tmp quit trimleadingstr(s,x) ; Return s without leading $length(x) characters; return 1/0 if called as function if x=$extract(s,1,$length(x)) set s=$extract(s,$length(x)+1,$length(s)) quit:$quit 1 quit else quit:$quit 0 quit ; Error message texts M6 ;" "_$zstatus U253 ;"-F-JOBFAIL Failed to start job "_i U254 ;"-F-BADPARM ""--"_nextparm_"="_nextval_""" Not a valid parameter" U255 ;"-F-BADINVOCATION Must invoke from shell as mumps -run "_$text(+0)
