Walt Mankowski on 18 Apr 2016 12:38:33 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] >32K concurrent processes


Aha.

I found this page linked off of a Stack Overflow thread. It's 5 years
old so it might be a bit dated, but it seems to address many of the
questions you originally raised.

http://web.archive.org/web/20111209081734/http://research.cs.wisc.edu/condor/condorg/linux_scalability.html

Walt

On Mon, Apr 18, 2016 at 03:31:13PM -0400, K.S. Bhaskar wrote:
> 32K+ processes concurrently accessing the database is exactly what I am
> trying to accomplish. On the machine I have, I cannot have those processes
> do any real application work (so an update by each process once every 10
> seconds is all I'm attempting). On a production machine, those processes
> will be doing real work resulting in between 5 and 10 million database
> accesses per second (and yes, this throughput is demonstrated and proven
> with real application code but slightly less than 32K processes)
> 
> So I'm not trying to prove how much work I can push through the database -
> I know that already. I'm just trying to figure out what Linux parameters I
> need to tweak to push the number of processes above 32K).
> 
> Regards
> -- Bhaskar
> 
> 
> On Mon, Apr 18, 2016 at 3:14 PM, Gavin W. Burris <bug@wharton.upenn.edu>
> wrote:
> 
> > So, are you running 32k clients to hammer the database???  What are you
> > trying to accomplish?  Cheers.
> >
> > On Mon 04/18/16 02:48PM EDT, K.S. Bhaskar wrote:
> > > Thanks Gavin. With GT.M real-time database replication, there is no
> > single
> > > point of failure. Furthermore, there are hooks to create and deploy
> > > applications that remain available not just in the face of unplanned
> > events
> > > (like system crashes) but even planned events (such as application
> > > upgrades, even many upgrades that involve schema changes). It is a proven
> > > architecture which first went into daily live production in 1999.
> > >
> > > Regards
> > > -- Bhaskar
> > >
> > >
> > > On Mon, Apr 18, 2016 at 12:36 PM, Gavin W. Burris <bug@wharton.upenn.edu
> > >
> > > wrote:
> > >
> > > > Hi, Bhaskar.
> > > >
> > > > AH, OK.  I should have asked, "What are you trying to accomplish?"
> > Don't
> > > > run everything on one box!  Scale horizontally, with at least two
> > > > user-facing nodes.  You want to engineer in redundancy from
> > square-one.  If
> > > > you don't, there will be no ability to sanely handle critical
> > > > patching/updates, or deal with scaling up.
> > > >
> > > > With Grid Engine, that would be two master hosts, and at least two
> > compute
> > > > nodes actually running the procs, all with an nfs shared cell
> > directory;
> > > > The secondary master is called the shadow master in Grid Engine-speak.
> > > > Grid Engine would be a good solution if you need to run some existing
> > > > command-line or batch code.
> > > >
> > > > If this is for web, strongly consider having a redundant API endpoint
> > to
> > > > run functions.  A good way to do this would be with Docker and Swarm.
> > > > Docker is a completely different approach, but one that is correct for
> > > > scaling web applications.
> > > >
> > > > Cheers.
> > > >
> > > > On Mon 04/18/16 11:49AM EDT, K.S. Bhaskar wrote:
> > > > > Thanks for the suggestions, Gavin, but batching the load won't work
> > in
> > > > this
> > > > > case. We're trying to run a workload that simulates a large number of
> > > > > concurrent users (as you might find at a large financial or
> > healthcare
> > > > > institution) all of whom expect the system to respond immediately
> > when
> > > > they
> > > > > ask it to do something. I intend to play with the scheduler.
> > > > >
> > > > > Regards
> > > > > -- Bhaskar
> > > > >
> > > > >
> > > > > On Mon, Apr 18, 2016 at 9:13 AM, Gavin W. Burris <
> > bug@wharton.upenn.edu>
> > > > > wrote:
> > > > >
> > > > > > Good morning, Bhaskar.
> > > > > >
> > > > > > Have you considered using /dev/shm aka tmpfs for shared memory on
> > > > Linux?
> > > > > > Maybe stage all required files there and make sure you are
> > read-only
> > > > where
> > > > > > possible.
> > > > > >
> > > > > > With so many processes, your system is just constantly changing
> > > > threads.
> > > > > > Assuming you are not oversubscribing RAM (32GB / 32k is less than
> > 1MB
> > > > per),
> > > > > > you will want to tune the kernel scheduler.
> > > > > >
> > > > > >
> > > >
> > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Performance_Tuning_Guide/sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-CPU-Configuration_suggestions.html#sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-Configuration_suggestions-Tuning_scheduling_policy
> > > > > >
> > > > > > This very much sounds like an HPC problem (high-performance
> > > > computing), so
> > > > > > my initial reaction is why not use a resource manager tuned for
> > > > > > high-throughput?  Take a look at Open Grid Scheduler (
> > > > > > http://gridscheduler.sourceforge.net/), an open source version of
> > Grid
> > > > > > Engine.  This will give you a layer of control, a job queue, where
> > you
> > > > > > could then do a task array.  Maybe you could launch 1000 jobs that
> > > > iterate
> > > > > > 320 times?  The job queue could then be tuned to not overload the
> > > > system
> > > > > > and keep the system maximally / optimally utilized, aka don't run
> > > > > > everything at once but place it in a queue that runs through what
> > you
> > > > need
> > > > > > as resources are available.  I would strongly consider using Grid
> > > > Engine,
> > > > > > expecially given your statement that the procs "do a teeny bit of
> > > > activity
> > > > > > every 10 seconds."
> > > > > >
> > > > > > Cheers.
> > > > > >
> > > > > > On Sun 04/17/16 11:12AM EDT, K.S. Bhaskar wrote:
> > > > > > > Thanks for the links Rohit. I'll check them out. The storage is
> > SSD,
> > > > the
> > > > > > > processes do minimal IO - I'm just trying to establish the
> > ability to
> > > > > > have
> > > > > > > a file open by more than 32K processes, and I'm clearly running
> > into
> > > > a
> > > > > > > system limit. This is a development machine (16 cores, 32GB RAM
> > - the
> > > > > > > production machine has something like 64 cores and 512GB RAM),
> > but I
> > > > > > can't
> > > > > > > get you access to poke around because it is inside a corporate
> > > > network.
> > > > > > >
> > > > > > > However, as the software is all open source, I can easily help
> > you
> > > > get
> > > > > > set
> > > > > > > up to poke around using your own system, if you want. Please let
> > me
> > > > know.
> > > > > > >
> > > > > > > Regards
> > > > > > > -- Bhaskar
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Apr 17, 2016 at 10:54 AM, Rohit Mehta <ro@paper-mill.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > Some kernel parameters to research (which may not be right for
> > your
> > > > > > > > application)
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://www.debian-administration.org/article/656/Installing_Oracle11_and_Oracle12_on_Debian_Wheezy_Squeeze
> > > > > > > > and /etc/security.conf changes
> > > > > > > >
> > > > > >
> > > >
> > http://stackoverflow.com/questions/9361816/maximum-number-of-processes-in-linux
> > > > > > > >
> > > > > > > > Do these process do a lot of IO?  Is your storage rotational
> > media
> > > > or
> > > > > > > > SSD?  Can your application run off ramdisk storage?  Have you
> > tried
> > > > > > > > enabling hyperthreading?
> > > > > > > >
> > > > > > > > Do you have the ability to test application loads
> > non-production
> > > > > > system?
> > > > > > > > If so i'd be interesting in helping you poke around.  It might
> > be
> > > > an
> > > > > > > > education for me.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Apr 17, 2016 at 10:42 AM, Rohit Mehta <
> > ro@paper-mill.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Back many years ago, I installed Oracle on my Debian
> > workstation
> > > > for
> > > > > > fun,
> > > > > > > >> and I remember the guide had a lot of tweaks.  "ulimit" is one
> > > > that I
> > > > > > can
> > > > > > > >> think of, but I don't remember them all.  I'm poking around
> > the
> > > > > > internet to
> > > > > > > >> see if I can find the oracle guide (although it might not be
> > > > relevant
> > > > > > on
> > > > > > > >> newer kernels)
> > > > > > > >>
> > > > > > > >> On Sun, Apr 17, 2016 at 10:27 AM, K.S. Bhaskar <
> > > > bhaskar@bhaskars.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> Thanks Steve, but in this case we have a customer need to
> > crank
> > > > up
> > > > > > the
> > > > > > > >>> number of processes on Linux.
> > > > > > > >>>
> > > > > > > >>> Regards
> > > > > > > >>> -- Bhaskar
> > > > > > > >>>
> > > > > > > >>> On Sat, Apr 16, 2016 at 4:09 PM, Steve Litt <
> > > > > > slitt@troubleshooters.com>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> On Fri, 15 Apr 2016 17:40:09 -0400
> > > > > > > >>>> "K.S. Bhaskar" <bhaskar@bhaskars.com> wrote:
> > > > > > > >>>>
> > > > > > > >>>> > I am trying to crank up more than 32K concurrent processes
> > > > (the
> > > > > > > >>>> > processes themselves hang and do a teeny bit of activity
> > > > every 10
> > > > > > > >>>> > seconds). But the OS (64-bit Debian 8 - Jessie) stubbornly
> > > > > > refuses to
> > > > > > > >>>> > crank up beyond 32K-ish processes. pid_max is set to a
> > very
> > > > large
> > > > > > > >>>> > number (1M), so that's not it. Any suggestions on what
> > limits
> > > > to
> > > > > > look
> > > > > > > >>>> > for appreciated. Thank you very much.
> > > > > > > >>>>
> > > > > > > >>>> This is old information, but back in the day people who
> > wanted
> > > > lots
> > > > > > and
> > > > > > > >>>> lots of processes used one of the BSDs to host that server.
> > > > > > > >>>>
> > > > > > > >>>> SteveT
> > > > > > > >>>>
> > > > > > > >>>> Steve Litt
> > > > > > > >>>> April 2016 featured book: Rapid Learning for the 21st
> > Century
> > > > > > > >>>> http://www.troubleshooters.com/rl21
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > >
> > > >
> > ___________________________________________________________________________
> > > > > > > >>>> Philadelphia Linux Users Group         --
> > > > > > > >>>> http://www.phillylinux.org
> > > > > > > >>>> Announcements -
> > > > > > > >>>> http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > > > >>>> General Discussion  --
> > > > > > > >>>> http://lists.phillylinux.org/mailman/listinfo/plug
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > >
> > > >
> > ___________________________________________________________________________
> > > > > > > >>> Philadelphia Linux Users Group         --
> > > > > > > >>> http://www.phillylinux.org
> > > > > > > >>> Announcements -
> > > > > > > >>> http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > > > >>> General Discussion  --
> > > > > > > >>> http://lists.phillylinux.org/mailman/listinfo/plug
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > ___________________________________________________________________________
> > > > > > > > Philadelphia Linux Users Group         --
> > > > > > > > http://www.phillylinux.org
> > > > > > > > Announcements -
> > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > > > > General Discussion  --
> > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > > >
> > > > > >
> > > >
> > ___________________________________________________________________________
> > > > > > > Philadelphia Linux Users Group         --
> > > > > > http://www.phillylinux.org
> > > > > > > Announcements -
> > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > > > General Discussion  --
> > > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Gavin W. Burris
> > > > > > Senior Project Leader for Research Computing
> > > > > > The Wharton School
> > > > > > University of Pennsylvania
> > > > > > Search our documentation:
> > http://research-it.wharton.upenn.edu/about/
> > > > > > Subscribe to the Newsletter:
> > http://whr.tn/ResearchNewsletterSubscribe
> > > > > >
> > > >
> > ___________________________________________________________________________
> > > > > > Philadelphia Linux Users Group         --
> > > > > > http://www.phillylinux.org
> > > > > > Announcements -
> > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > > General Discussion  --
> > > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > > > >
> > > >
> > > > >
> > > >
> > ___________________________________________________________________________
> > > > > Philadelphia Linux Users Group         --
> > > > http://www.phillylinux.org
> > > > > Announcements -
> > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > General Discussion  --
> > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > >
> > > >
> > > > --
> > > > Gavin W. Burris
> > > > Senior Project Leader for Research Computing
> > > > The Wharton School
> > > > University of Pennsylvania
> > > > Search our documentation: http://research-it.wharton.upenn.edu/about/
> > > > Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
> > > >
> > ___________________________________________________________________________
> > > > Philadelphia Linux Users Group         --
> > > > http://www.phillylinux.org
> > > > Announcements -
> > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > General Discussion  --
> > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > >
> >
> > >
> > ___________________________________________________________________________
> > > Philadelphia Linux Users Group         --
> > http://www.phillylinux.org
> > > Announcements -
> > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > General Discussion  --
> > http://lists.phillylinux.org/mailman/listinfo/plug
> >
> >
> > --
> > Gavin W. Burris
> > Senior Project Leader for Research Computing
> > The Wharton School
> > University of Pennsylvania
> > Search our documentation: http://research-it.wharton.upenn.edu/about/
> > Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
> > ___________________________________________________________________________
> > Philadelphia Linux Users Group         --
> > http://www.phillylinux.org
> > Announcements -
> > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > General Discussion  --
> > http://lists.phillylinux.org/mailman/listinfo/plug
> >

> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

Attachment: signature.asc
Description: PGP signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug