K.S. Bhaskar on 18 Apr 2016 13:20:36 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] >32K concurrent processes


Gavin, we're a long way from hitting the scalability limits of the hardware of a single node, and the use case is maximizing transaction throughput with serialization (we're a half to one order of magnitude ahead of popular database engines, and intend to keep it that way). When we want to get past the scalability of a single node, and must compromise transaction throughput to get there, we'll consider alternative architectures.

As to the architecture showing its age, it's interesting that in recent years popular database engines have adopted architectural elements that GT.M has had for years, if not decades.

Regards
-- Bhaskar

On Mon, Apr 18, 2016 at 4:03 PM, Gavin W. Burris <bug@wharton.upenn.edu> wrote:
Hi, Bhaskar.

That would definitely lend benefits in some ways.  I can't help feeling the architecture is showing its age, with the problem of not being able to scale past a single node efficiently.  I guess overloading a single host is still advantageous for problems of a certain size.  What is the breaking point where the latency of inter-node communication is acceptable?  What use case is driving the decision to so strongly avoid going multi-node?

Cheers.

On Mon 04/18/16 03:56PM EDT, K.S. Bhaskar wrote:
> Gavin, the clients and database are the same. The database logic is inside
> application processes, or application logic is inside the database - either
> works out to the same thing. That's the trend these days in very high end
> databases, except that this has been GT.M's architecture since way back
> when.
>
> Regards
> -- Bhaskar
>
> On Mon, Apr 18, 2016 at 3:36 PM, Gavin W. Burris <bug@wharton.upenn.edu>
> wrote:
>
> > Just saw your previous post, that the 32k is for the testing clients, not
> > somehow the database.  I'd just spin up more boxes, once you find the
> > optimal number of clients a single one can handle.  Keep us posted.  This
> > is neat stuff.  Cheers.
> >
> > On Mon 04/18/16 03:32PM EDT, Gavin W. Burris wrote:
> > > Hi, Bhaskar.
> > >
> > > This sounds really neat.  Why do you need to simultaneously serialize
> > ALL transactions?  For instance, my bank balance or my medical records have
> > absolutely no real-time serial dependencies on any other account.  Maybe a
> > service free on my medical record is a dependency, but just update those
> > daily.  My balance may depend on a transfer, but just look at the posted
> > timestamp.  If one needs a bank-wide report, again, just look at
> > transactions to a specific timestamp.  What is an acceptable granularity?
> > Sure, you can get millisecond accuracy this way, but why would you want
> > that given the downsides?  Is this some kind of high-frequency trading
> > scheme?  If so, any further communications will have to be under billable
> > hours for my private consulting services.  :D
> > >
> > > Cheers.
> > >
> > > On Mon 04/18/16 02:44PM EDT, K.S. Bhaskar wrote:
> > > > This is not a distributed environment - it's a single system. The
> > reason is
> > > > transaction serialization. When every transaction can potentially
> > depend on
> > > > the result of the preceding transaction, the more you can centralize
> > > > serialization decision making, the faster you can make decisions
> > required
> > > > to ensure ACID properties at transaction commit time. With GT.M, this
> > > > serialization is done in the shared memory of a single computing node.
> > Even
> > > > with technologies such as RDMA over Infiniband, IPC between processes
> > on
> > > > different nodes is one to two orders of magnitude slower than
> > processes on
> > > > a single node. So, as long as throughput is not constrained by the
> > amount
> > > > of CPU, RAM, or IO you can put on a single node, centralized
> > serialization
> > > > gives you the best overall throughput. With GT.M, and with the types of
> > > > computer system you can purchase today, the throughput you can achieve
> > on a
> > > > single node is big enough to handle the needs of real-time
> > core-processing
> > > > (a core system is the system of record for your bank balance) on just
> > about
> > > > any bank. The largest real-time core systems in production anywhere in
> > the
> > > > world today that I know of run on GT.M - these are systems with over 30
> > > > million accounts. In healthcare, the real-time electronic health
> > records
> > > > for the entire Jordanian Ministry of Health system are being rolled
> > out on
> > > > a single system (⅓ of the electronic health records for a country with
> > the
> > > > area and population of Indiana processed on a single system).
> > > >
> > > > What people think of as a horizontally scalable architecture for a
> > > > transactional system is stateless application servers that can be spun
> > up
> > > > as needed, but which send all the needed state to a database under the
> > > > covers. This architecture scales only as well as the database scales
> > on a
> > > > single node, which is to say not very well - in our testing some years
> > ago,
> > > > we found that because of transaction serialization, a popular database
> > > > scaled better on a single node than across multiple nodes in a cluster.
> > > >
> > > > So thanks for all the suggestions but for now, the specific
> > information I
> > > > need is how to configure a Linux system to allow more than 32K
> > concurrent
> > > > processes. Increasing pid_max is a necessary change, but clearly not a
> > > > sufficient change.
> > > >
> > > > Regards
> > > > -- Bhaskar
> > > >
> > > >
> > > >
> > > > On Mon, Apr 18, 2016 at 12:19 PM, Keith C. Perry <
> > kperry@daotechnologies.com
> > > > > wrote:
> > > >
> > > > > Bhasker,
> > > > >
> > > > > What's the deployment infrastructure?  When you say "We're trying to
> > run a
> > > > > workload that simulates a large number of concurrent users (as you
> > might
> > > > > find at a large financial or healthcare institution)", it makes me
> > think
> > > > > that is more a distributed environment where you need a large number
> > of
> > > > > clients being serviced by a pool of servers.
> > > > >
> > > > > If that is the case it sound more like you would need a listener
> > (server)
> > > > > running that would then fork off or thread child connections to
> > respond to
> > > > > client requests.  This is also something that can be achieved in a
> > local
> > > > > context the listener on the localhost IP or using unix sockets.
> > > > >
> > > > >
> > > > > ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
> > > > > Keith C. Perry, MS E.E.
> > > > > Owner, DAO Technologies LLC
> > > > > (O) +1.215.525.4165 x2033
> > > > > (M) +1.215.432.5167
> > > > > www.daotechnologies.com
> > > > >
> > > > > ------------------------------
> > > > > *From: *"K.S. Bhaskar" <bhaskar@bhaskars.com>
> > > > > *To: *"Philadelphia Linux User's Group Discussion List" <
> > > > > plug@lists.phillylinux.org>
> > > > > *Sent: *Monday, April 18, 2016 11:49:19 AM
> > > > > *Subject: *Re: [PLUG] >32K concurrent processes
> > > > >
> > > > > Thanks for the suggestions, Gavin, but batching the load won't work
> > in
> > > > > this case. We're trying to run a workload that simulates a large
> > number of
> > > > > concurrent users (as you might find at a large financial or
> > healthcare
> > > > > institution) all of whom expect the system to respond immediately
> > when they
> > > > > ask it to do something. I intend to play with the scheduler.
> > > > >
> > > > > Regards
> > > > > -- Bhaskar
> > > > >
> > > > >
> > > > > On Mon, Apr 18, 2016 at 9:13 AM, Gavin W. Burris <
> > bug@wharton.upenn.edu>
> > > > > wrote:
> > > > >
> > > > >> Good morning, Bhaskar.
> > > > >>
> > > > >> Have you considered using /dev/shm aka tmpfs for shared memory on
> > Linux?
> > > > >> Maybe stage all required files there and make sure you are
> > read-only where
> > > > >> possible.
> > > > >>
> > > > >> With so many processes, your system is just constantly changing
> > threads.
> > > > >> Assuming you are not oversubscribing RAM (32GB / 32k is less than
> > 1MB per),
> > > > >> you will want to tune the kernel scheduler.
> > > > >>
> > > > >>
> > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Performance_Tuning_Guide/sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-CPU-Configuration_suggestions.html#sect-Red_Hat_Enterprise_Linux-Performance_Tuning_Guide-Configuration_suggestions-Tuning_scheduling_policy
> > > > >>
> > > > >> This very much sounds like an HPC problem (high-performance
> > computing),
> > > > >> so my initial reaction is why not use a resource manager tuned for
> > > > >> high-throughput?  Take a look at Open Grid Scheduler (
> > > > >> http://gridscheduler.sourceforge.net/), an open source version of
> > Grid
> > > > >> Engine.  This will give you a layer of control, a job queue, where
> > you
> > > > >> could then do a task array.  Maybe you could launch 1000 jobs that
> > iterate
> > > > >> 320 times?  The job queue could then be tuned to not overload the
> > system
> > > > >> and keep the system maximally / optimally utilized, aka don't run
> > > > >> everything at once but place it in a queue that runs through what
> > you need
> > > > >> as resources are available.  I would strongly consider using Grid
> > Engine,
> > > > >> expecially given your statement that the procs "do a teeny bit of
> > activity
> > > > >> every 10 seconds."
> > > > >>
> > > > >> Cheers.
> > > > >>
> > > > >> On Sun 04/17/16 11:12AM EDT, K.S. Bhaskar wrote:
> > > > >> > Thanks for the links Rohit. I'll check them out. The storage is
> > SSD, the
> > > > >> > processes do minimal IO - I'm just trying to establish the
> > ability to
> > > > >> have
> > > > >> > a file open by more than 32K processes, and I'm clearly running
> > into a
> > > > >> > system limit. This is a development machine (16 cores, 32GB RAM -
> > the
> > > > >> > production machine has something like 64 cores and 512GB RAM),
> > but I
> > > > >> can't
> > > > >> > get you access to poke around because it is inside a corporate
> > network.
> > > > >> >
> > > > >> > However, as the software is all open source, I can easily help
> > you get
> > > > >> set
> > > > >> > up to poke around using your own system, if you want. Please let
> > me
> > > > >> know.
> > > > >> >
> > > > >> > Regards
> > > > >> > -- Bhaskar
> > > > >> >
> > > > >> >
> > > > >> > On Sun, Apr 17, 2016 at 10:54 AM, Rohit Mehta <ro@paper-mill.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Some kernel parameters to research (which may not be right for
> > your
> > > > >> > > application)
> > > > >> > >
> > > > >> > >
> > > > >>
> > https://www.debian-administration.org/article/656/Installing_Oracle11_and_Oracle12_on_Debian_Wheezy_Squeeze
> > > > >> > > and /etc/security.conf changes
> > > > >> > >
> > > > >>
> > http://stackoverflow.com/questions/9361816/maximum-number-of-processes-in-linux
> > > > >> > >
> > > > >> > > Do these process do a lot of IO?  Is your storage rotational
> > media or
> > > > >> > > SSD?  Can your application run off ramdisk storage?  Have you
> > tried
> > > > >> > > enabling hyperthreading?
> > > > >> > >
> > > > >> > > Do you have the ability to test application loads non-production
> > > > >> system?
> > > > >> > > If so i'd be interesting in helping you poke around.  It might
> > be an
> > > > >> > > education for me.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Sun, Apr 17, 2016 at 10:42 AM, Rohit Mehta <
> > ro@paper-mill.com>
> > > > >> wrote:
> > > > >> > >
> > > > >> > >> Back many years ago, I installed Oracle on my Debian
> > workstation for
> > > > >> fun,
> > > > >> > >> and I remember the guide had a lot of tweaks.  "ulimit" is one
> > that
> > > > >> I can
> > > > >> > >> think of, but I don't remember them all.  I'm poking around the
> > > > >> internet to
> > > > >> > >> see if I can find the oracle guide (although it might not be
> > > > >> relevant on
> > > > >> > >> newer kernels)
> > > > >> > >>
> > > > >> > >> On Sun, Apr 17, 2016 at 10:27 AM, K.S. Bhaskar <
> > bhaskar@bhaskars.com
> > > > >> >
> > > > >> > >> wrote:
> > > > >> > >>
> > > > >> > >>> Thanks Steve, but in this case we have a customer need to
> > crank up
> > > > >> the
> > > > >> > >>> number of processes on Linux.
> > > > >> > >>>
> > > > >> > >>> Regards
> > > > >> > >>> -- Bhaskar
> > > > >> > >>>
> > > > >> > >>> On Sat, Apr 16, 2016 at 4:09 PM, Steve Litt <
> > > > >> slitt@troubleshooters.com>
> > > > >> > >>> wrote:
> > > > >> > >>>
> > > > >> > >>>> On Fri, 15 Apr 2016 17:40:09 -0400
> > > > >> > >>>> "K.S. Bhaskar" <bhaskar@bhaskars.com> wrote:
> > > > >> > >>>>
> > > > >> > >>>> > I am trying to crank up more than 32K concurrent processes
> > (the
> > > > >> > >>>> > processes themselves hang and do a teeny bit of activity
> > every 10
> > > > >> > >>>> > seconds). But the OS (64-bit Debian 8 - Jessie) stubbornly
> > > > >> refuses to
> > > > >> > >>>> > crank up beyond 32K-ish processes. pid_max is set to a
> > very large
> > > > >> > >>>> > number (1M), so that's not it. Any suggestions on what
> > limits to
> > > > >> look
> > > > >> > >>>> > for appreciated. Thank you very much.
> > > > >> > >>>>
> > > > >> > >>>> This is old information, but back in the day people who
> > wanted
> > > > >> lots and
> > > > >> > >>>> lots of processes used one of the BSDs to host that server.
> > > > >> > >>>>
> > > > >> > >>>> SteveT
> > > > >> > >>>>
> > > > >> > >>>> Steve Litt
> > > > >> > >>>> April 2016 featured book: Rapid Learning for the 21st Century
> > > > >> > >>>> http://www.troubleshooters.com/rl21
> > > > >> > >>>>
> > > > >> > >>>>
> > > > >>
> > ___________________________________________________________________________
> > > > >> > >>>> Philadelphia Linux Users Group         --
> > > > >> > >>>> http://www.phillylinux.org
> > > > >> > >>>> Announcements -
> > > > >> > >>>> http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > >> > >>>> General Discussion  --
> > > > >> > >>>> http://lists.phillylinux.org/mailman/listinfo/plug
> > > > >> > >>>>
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>>
> > > > >>
> > ___________________________________________________________________________
> > > > >> > >>> Philadelphia Linux Users Group         --
> > > > >> > >>> http://www.phillylinux.org
> > > > >> > >>> Announcements -
> > > > >> > >>> http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > >> > >>> General Discussion  --
> > > > >> > >>> http://lists.phillylinux.org/mailman/listinfo/plug
> > > > >> > >>>
> > > > >> > >>>
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >>
> > ___________________________________________________________________________
> > > > >> > > Philadelphia Linux Users Group         --
> > > > >> > > http://www.phillylinux.org
> > > > >> > > Announcements -
> > > > >> > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > >> > > General Discussion  --
> > > > >> > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > > >> > >
> > > > >> > >
> > > > >>
> > > > >> >
> > > > >>
> > ___________________________________________________________________________
> > > > >> > Philadelphia Linux Users Group         --
> > > > >> http://www.phillylinux.org
> > > > >> > Announcements -
> > > > >> http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > >> > General Discussion  --
> > > > >> http://lists.phillylinux.org/mailman/listinfo/plug
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Gavin W. Burris
> > > > >> Senior Project Leader for Research Computing
> > > > >> The Wharton School
> > > > >> University of Pennsylvania
> > > > >> Search our documentation:
> > http://research-it.wharton.upenn.edu/about/
> > > > >> Subscribe to the Newsletter:
> > http://whr.tn/ResearchNewsletterSubscribe
> > > > >>
> > > > >>
> > ___________________________________________________________________________
> > > > >> Philadelphia Linux Users Group         --
> > > > >> http://www.phillylinux.org
> > > > >> Announcements -
> > > > >> http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > >> General Discussion  --
> > > > >> http://lists.phillylinux.org/mailman/listinfo/plug
> > > > >>
> > > > >
> > > > >
> > > > >
> > ___________________________________________________________________________
> > > > > Philadelphia Linux Users Group         --
> > > > > http://www.phillylinux.org
> > > > > Announcements -
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > General Discussion  --
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > > >
> > > > >
> > ___________________________________________________________________________
> > > > > Philadelphia Linux Users Group         --
> > > > > http://www.phillylinux.org
> > > > > Announcements -
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > > General Discussion  --
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > > > >
> > > > >
> > >
> > > >
> > ___________________________________________________________________________
> > > > Philadelphia Linux Users Group         --
> > http://www.phillylinux.org
> > > > Announcements -
> > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > > General Discussion  --
> > http://lists.phillylinux.org/mailman/listinfo/plug
> > >
> > >
> > > --
> > > Gavin W. Burris
> > > Senior Project Leader for Research Computing
> > > The Wharton School
> > > University of Pennsylvania
> > > Search our documentation: http://research-it.wharton.upenn.edu/about/
> > > Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
> > >
> > ___________________________________________________________________________
> > > Philadelphia Linux Users Group         --
> > http://www.phillylinux.org
> > > Announcements -
> > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > > General Discussion  --
> > http://lists.phillylinux.org/mailman/listinfo/plug
> >
> > --
> > Gavin W. Burris
> > Senior Project Leader for Research Computing
> > The Wharton School
> > University of Pennsylvania
> > Search our documentation: http://research-it.wharton.upenn.edu/about/
> > Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
> > ___________________________________________________________________________
> > Philadelphia Linux Users Group         --
> > http://www.phillylinux.org
> > Announcements -
> > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > General Discussion  --
> > http://lists.phillylinux.org/mailman/listinfo/plug
> >

> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
Search our documentation: http://research-it.wharton.upenn.edu/about/
Subscribe to the Newsletter: http://whr.tn/ResearchNewsletterSubscribe
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug