Walt Mankowski on 18 Apr 2016 12:38:33 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] >32K concurrent processes |
Aha. I found this page linked off of a Stack Overflow thread. It's 5 years old so it might be a bit dated, but it seems to address many of the questions you originally raised. http://web.archive.org/web/20111209081734/http://research.cs.wisc.edu/condor/condorg/linux_scalability.html Walt On Mon, Apr 18, 2016 at 03:31:13PM -0400, K.S. Bhaskar wrote: > 32K+ processes concurrently accessing the database is exactly what I am > trying to accomplish. On the machine I have, I cannot have those processes > do any real application work (so an update by each process once every 10 > seconds is all I'm attempting). On a production machine, those processes > will be doing real work resulting in between 5 and 10 million database > accesses per second (and yes, this throughput is demonstrated and proven > with real application code but slightly less than 32K processes) > > So I'm not trying to prove how much work I can push through the database - > I know that already. I'm just trying to figure out what Linux parameters I > need to tweak to push the number of processes above 32K). > > Regards > -- Bhaskar > > > On Mon, Apr 18, 2016 at 3:14 PM, Gavin W. Burris <bug@wharton.upenn.edu> > wrote: > > > So, are you running 32k clients to hammer the database??? What are you > > trying to accomplish? Cheers. > > > > On Mon 04/18/16 02:48PM EDT, K.S. Bhaskar wrote: > > > Thanks Gavin. With GT.M real-time database replication, there is no > > single > > > point of failure. Furthermore, there are hooks to create and deploy > > > applications that remain available not just in the face of unplanned > > events > > > (like system crashes) but even planned events (such as application > > > upgrades, even many upgrades that involve schema changes). It is a proven > > > architecture which first went into daily live production in 1999. > > > > > > Regards > > > -- Bhaskar > > > > > > > > > On Mon, Apr 18, 2016 at 12:36 PM, Gavin W. Burris <bug@wharton.upenn.edu > > > > > > wrote: > > > > > > > Hi, Bhaskar. > > > > > > > > AH, OK. I should have asked, "What are you trying to accomplish?" > > Don't > > > > run everything on one box! Scale horizontally, with at least two > > > > user-facing nodes. You want to engineer in redundancy from > > square-one. If > > > > you don't, there will be no ability to sanely handle critical > > > > patching/updates, or deal with scaling up. > > > > > > > > With Grid Engine, that would be two master hosts, and at least two > > compute > > > > nodes actually running the procs, all with an nfs shared cell > > directory; > > > > The secondary master is called the shadow master in Grid Engine-speak. > > > > Grid Engine would be a good solution if you need to run some existing > > > > command-line or batch code. > > > > > > > > If this is for web, strongly consider having a redundant API endpoint > > to > > > > run functions. A good way to do this would be with Docker and Swarm. > > > > Docker is a completely different approach, but one that is correct for > > > > scaling web applications. > > > > > > > > Cheers. > > > > > > > > On Mon 04/18/16 11:49AM EDT, K.S. Bhaskar wrote: > > > > > Thanks for the suggestions, Gavin, but batching the load won't work > > in > > > > this > > > > > case. We're trying to run a workload that simulates a large number of > > > > > concurrent users (as you might find at a large financial or > > healthcare > > > > > institution) all of whom expect the system to respond immediately > > when > > > > they > > > > > ask it to do something. I intend to play with the scheduler. > > > > > > > > > > Regards > > > > > -- Bhaskar > > > > > > > > > > > > > > > On Mon, Apr 18, 2016 at 9:13 AM, Gavin W. Burris < > > bug@wharton.upenn.edu> > > > > > wrote: > > > > > > > > > > > Good morning, Bhaskar. > > > > > > > > > > > > Have you considered using /dev/shm aka tmpfs for shared memory on > > > > Linux? > > > > > > Maybe stage all required files there and make sure you are > > read-only > > > > where > > > > > > possible. > > > > > > > > > > > > With so many processes, your system is just constantly changing > > > > threads. > > > > > > Assuming you are not oversubscribing RAM (32GB / 32k is less than > > 1MB > > > > per), > > > > > > you will want to tune the kernel scheduler. > > > > > > > > > > > > > > > > > > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Performance_Tuning_Guide/sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-CPU-Configuration_suggestions.html#sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-Configuration_suggestions-Tuning_scheduling_policy > > > > > > > > > > > > This very much sounds like an HPC problem (high-performance > > > > computing), so > > > > > > my initial reaction is why not use a resource manager tuned for > > > > > > high-throughput? Take a look at Open Grid Scheduler ( > > > > > > http://gridscheduler.sourceforge.net/), an open source version of > > Grid > > > > > > Engine. This will give you a layer of control, a job queue, where > > you > > > > > > could then do a task array. Maybe you could launch 1000 jobs that > > > > iterate > > > > > > 320 times? The job queue could then be tuned to not overload the > > > > system > > > > > > and keep the system maximally / optimally utilized, aka don't run > > > > > > everything at once but place it in a queue that runs through what > > you > > > > need > > > > > > as resources are available. I would strongly consider using Grid > > > > Engine, > > > > > > expecially given your statement that the procs "do a teeny bit of > > > > activity > > > > > > every 10 seconds." > > > > > > > > > > > > Cheers. > > > > > > > > > > > > On Sun 04/17/16 11:12AM EDT, K.S. Bhaskar wrote: > > > > > > > Thanks for the links Rohit. I'll check them out. The storage is > > SSD, > > > > the > > > > > > > processes do minimal IO - I'm just trying to establish the > > ability to > > > > > > have > > > > > > > a file open by more than 32K processes, and I'm clearly running > > into > > > > a > > > > > > > system limit. This is a development machine (16 cores, 32GB RAM > > - the > > > > > > > production machine has something like 64 cores and 512GB RAM), > > but I > > > > > > can't > > > > > > > get you access to poke around because it is inside a corporate > > > > network. > > > > > > > > > > > > > > However, as the software is all open source, I can easily help > > you > > > > get > > > > > > set > > > > > > > up to poke around using your own system, if you want. Please let > > me > > > > know. > > > > > > > > > > > > > > Regards > > > > > > > -- Bhaskar > > > > > > > > > > > > > > > > > > > > > On Sun, Apr 17, 2016 at 10:54 AM, Rohit Mehta <ro@paper-mill.com > > > > > > > wrote: > > > > > > > > > > > > > > > Some kernel parameters to research (which may not be right for > > your > > > > > > > > application) > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://www.debian-administration.org/article/656/Installing_Oracle11_and_Oracle12_on_Debian_Wheezy_Squeeze > > > > > > > > and /etc/security.conf changes > > > > > > > > > > > > > > > > > > > > http://stackoverflow.com/questions/9361816/maximum-number-of-processes-in-linux > > > > > > > > > > > > > > > > Do these process do a lot of IO? Is your storage rotational > > media > > > > or > > > > > > > > SSD? Can your application run off ramdisk storage? Have you > > tried > > > > > > > > enabling hyperthreading? > > > > > > > > > > > > > > > > Do you have the ability to test application loads > > non-production > > > > > > system? > > > > > > > > If so i'd be interesting in helping you poke around. It might > > be > > > > an > > > > > > > > education for me. > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Apr 17, 2016 at 10:42 AM, Rohit Mehta < > > ro@paper-mill.com> > > > > > > wrote: > > > > > > > > > > > > > > > >> Back many years ago, I installed Oracle on my Debian > > workstation > > > > for > > > > > > fun, > > > > > > > >> and I remember the guide had a lot of tweaks. "ulimit" is one > > > > that I > > > > > > can > > > > > > > >> think of, but I don't remember them all. I'm poking around > > the > > > > > > internet to > > > > > > > >> see if I can find the oracle guide (although it might not be > > > > relevant > > > > > > on > > > > > > > >> newer kernels) > > > > > > > >> > > > > > > > >> On Sun, Apr 17, 2016 at 10:27 AM, K.S. Bhaskar < > > > > bhaskar@bhaskars.com> > > > > > > > >> wrote: > > > > > > > >> > > > > > > > >>> Thanks Steve, but in this case we have a customer need to > > crank > > > > up > > > > > > the > > > > > > > >>> number of processes on Linux. > > > > > > > >>> > > > > > > > >>> Regards > > > > > > > >>> -- Bhaskar > > > > > > > >>> > > > > > > > >>> On Sat, Apr 16, 2016 at 4:09 PM, Steve Litt < > > > > > > slitt@troubleshooters.com> > > > > > > > >>> wrote: > > > > > > > >>> > > > > > > > >>>> On Fri, 15 Apr 2016 17:40:09 -0400 > > > > > > > >>>> "K.S. Bhaskar" <bhaskar@bhaskars.com> wrote: > > > > > > > >>>> > > > > > > > >>>> > I am trying to crank up more than 32K concurrent processes > > > > (the > > > > > > > >>>> > processes themselves hang and do a teeny bit of activity > > > > every 10 > > > > > > > >>>> > seconds). But the OS (64-bit Debian 8 - Jessie) stubbornly > > > > > > refuses to > > > > > > > >>>> > crank up beyond 32K-ish processes. pid_max is set to a > > very > > > > large > > > > > > > >>>> > number (1M), so that's not it. Any suggestions on what > > limits > > > > to > > > > > > look > > > > > > > >>>> > for appreciated. Thank you very much. > > > > > > > >>>> > > > > > > > >>>> This is old information, but back in the day people who > > wanted > > > > lots > > > > > > and > > > > > > > >>>> lots of processes used one of the BSDs to host that server. > > > > > > > >>>> > > > > > > > >>>> SteveT > > > > > > > >>>> > > > > > > > >>>> Steve Litt > > > > > > > >>>> April 2016 featured book: Rapid Learning for the 21st > > Century > > > > > > > >>>> http://www.troubleshooters.com/rl21 > > > > > > > >>>> > > > > > > > >>>> > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > > >>>> Philadelphia Linux Users Group -- > > > > > > > >>>> http://www.phillylinux.org > > > > > > > >>>> Announcements - > > > > > > > >>>> http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > > >>>> General Discussion -- > > > > > > > >>>> http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > >>>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > > >>> Philadelphia Linux Users Group -- > > > > > > > >>> http://www.phillylinux.org > > > > > > > >>> Announcements - > > > > > > > >>> http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > > >>> General Discussion -- > > > > > > > >>> http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > >>> > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > > > Philadelphia Linux Users Group -- > > > > > > > > http://www.phillylinux.org > > > > > > > > Announcements - > > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > > > General Discussion -- > > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > > Philadelphia Linux Users Group -- > > > > > > http://www.phillylinux.org > > > > > > > Announcements - > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > > General Discussion -- > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > > > > > > > > -- > > > > > > Gavin W. Burris > > > > > > Senior Project Leader for Research Computing > > > > > > The Wharton School > > > > > > University of Pennsylvania > > > > > > Search our documentation: > > http://research-it.wharton.upenn.edu/about/ > > > > > > Subscribe to the Newsletter: > > http://whr.tn/ResearchNewsletterSubscribe > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > Philadelphia Linux Users Group -- > > > > > > http://www.phillylinux.org > > > > > > Announcements - > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > General Discussion -- > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > > > > > > > > > > > ___________________________________________________________________________ > > > > > Philadelphia Linux Users Group -- > > > > http://www.phillylinux.org > > > > > Announcements - > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > General Discussion -- > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > > -- > > > > Gavin W. Burris > > > > Senior Project Leader for Research Computing > > > > The Wharton School > > > > University of Pennsylvania > > > > Search our documentation: http://research-it.wharton.upenn.edu/about/ > > > > Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe > > > > > > ___________________________________________________________________________ > > > > Philadelphia Linux Users Group -- > > > > http://www.phillylinux.org > > > > Announcements - > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > General Discussion -- > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > ___________________________________________________________________________ > > > Philadelphia Linux Users Group -- > > http://www.phillylinux.org > > > Announcements - > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > General Discussion -- > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > -- > > Gavin W. Burris > > Senior Project Leader for Research Computing > > The Wharton School > > University of Pennsylvania > > Search our documentation: http://research-it.wharton.upenn.edu/about/ > > Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe > > ___________________________________________________________________________ > > Philadelphia Linux Users Group -- > > http://www.phillylinux.org > > Announcements - > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > General Discussion -- > > http://lists.phillylinux.org/mailman/listinfo/plug > > > ___________________________________________________________________________ > Philadelphia Linux Users Group -- http://www.phillylinux.org > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
Attachment:
signature.asc
Description: PGP signature
___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug